Diffusion Model

Diffusion models are a class of generative models that learn to reverse a gradual noising process. The training procedure progressively adds Gaussian noise to data samples until the structure is destroyed; the model then learns to denoise, reconstructing the original data from pure noise. Sampling involves starting from random noise and iteratively applying the learned denoising steps, guided by a conditioning signal such as text.

The mathematical framework was introduced by Sohl-Dickstein et al. (2015) and achieved mainstream adoption with Denoising Diffusion Probabilistic Models (DDPM) and later score-based approaches. Diffusion models have become the dominant paradigm for image generation, outperforming earlier GANs and VAEs in fidelity and diversity.

The elegance of diffusion lies in its thermodynamic framing: generation as the reversal of entropy. But this framing also reveals a limitation. The model does not learn what images are; it learns how to undo corruption. The latent diffusion perspective clarifies that the model estimates the gradient of the data distribution in a compressed latent space, not the distribution itself — a distinction with consequences for how we understand what these systems actually know.