Back to available project list

Learning to Learn by Denoising Diffusion Optimisers

Overview

The seminal paper Learning to Learn by Gradient Descent By Gradient Descent [1] explores learning optimisers (rather than using SGD or Adam) through an RNN architecture which models time steps as training steps and an RL based loss.

In this project we aim to achieve a similar product but rather than RNNs and RL we would like to explore the closely linked diffusion models [5,6] and stochastic control. In particular there is already theoretical work that motivates the use of these methodologies for global optimisation [2], what now remains is to explore it in practice.

We expect the student to lightly adapt the methods in [3,4] to the optimisation setting in [2] (This simply amounts to exploring low temperatures in the artificial target distribution induced by the loss function). Furthermore, exploring tasks in engineering as in [1] might require being creative about the inductive biases baked into the NN parametrisation that we are working with.

The nature of this project will be mostly ML-engineering and playing around / being creative about NN inductive biases. That said if the student is more mathematically/theory oriented we could explore extending the results in [2] (which apply to the method in [4] only) to the method in in [3] using the mixing rates in [7] and maybe comparing to things like [8] (last bit would be very bonus/extra type of thing).

As always the student should explore simple 1d and 2d toy optimisation examples to assess the validity of the method before moving onto real world examples.

[1] https://arxiv.org/pdf/1606.04474.pdf

[2] https://arxiv.org/abs/2111.00402

[3] https://openreview.net/forum?id=8pvnfTAbu1f

[4] https://arxiv.org/abs/2111.15141

[5] https://arxiv.org/abs/2006.11239

[6] https://arxiv.org/abs/2006.11239

[7] https://arxiv.org/pdf/2209.11215.pdf

[8] https://arxiv.org/abs/1707.06618

FAQs

What will I learn in this Project?

- Latest Advances in Diffusion Generative Models

- A bit on SDEs / Stochastic Calculus

- Using autodiff frameworks such as jax/torch
What is the objective of the project?
In short the objective is to adapt the Denoising Diffusion Sampler or the Path Integral Sampler to instead work as an optimiser (rather than a sampler), additionally we want to explore how baking in inductive biases into the network being trained can help further.
How does this fit into the bigger picture?
As mentioned before denoising diffusion based generative models are taking off, however the question remains if we can apply these to more meaningfully scientific / engineering tasks. This particular setting we aim to explore has direct applications in control, engineering and desing.

Supervisors

Francisco Vargas

PhD Student, Cambridge University

View Profile

Neil D. Lawrence

The DeepMind Professor of Machine Learning, Cambridge University

View Profile