Skip to content

Diffusion model

A diffusion model is mainly used for computer vision applications, including image (and video) generation tasks.

How does it works

The model basically add noise to data and try to recreate it from pure noise.

diffusion

Forward process

The model progressively add Gaussian noise to data from the training dataset until it's 100% noise.

Reverse process

In the reverse process, the model gradually denoise the image. This architecture is typically based on U-Net or Transformers.

Samplers

The sampler determines how noise is removed, step by step, producing a new sample at each stage of the generation. It determines the strategy: how random or deterministic the denoising process should be. Each sampler need to find a trade-off between computational speed and output's quality.

Some examples:

  • Euler Ancestral: Adds random noise at each step, increasing diversity in the generated output.
  • DDPM: A probabilistic model with a slow but stable denoising process.
  • DDIM: An implicit model that skips some steps to speed up the process.
  • DPM2: Differential equation solvers that perform intermediate evaluations during the process to adapt.
  • PNDM: A pseudo-numerical model using statistical approximations to speed up the denoising process.

Schedulers

The scheduler determine when noise is removed through each steps. It defines a denoising curve (e.g. linear, logarithmic, exponential), setting how much noise should be remove at which stage of the denoising process.

Resources

Articles

Papers