Reparameterization trick
The reparameterization trick is a technique in variational inference and generative modeling that enables gradient-based optimization through stochastic sampling operations. Instead of drawing a sample directly from a parametric distribution — which would block the backward flow of gradients — the trick expresses the sample as a deterministic transformation of the distribution parameters and an independent noise variable.
The trick was introduced in the context of the variational autoencoder, where it allows the encoder network to produce samples from a Gaussian latent distribution while remaining fully differentiable. But its scope is broader: any distribution that admits a location-scale parameterization, or more generally any distribution whose samples can be written as a differentiable function of parameters and exogenous noise, can be reparameterized. This includes the Gumbel-Softmax trick for discrete distributions and normalizing flows for complex continuous distributions.
The reparameterization trick is not merely a computational hack. It is a formal statement about the relationship between randomness and differentiability: randomness can be pushed outside the computational graph without loss of generality, provided the distribution has the right structure. When that structure fails — as it does for discrete distributions without relaxation — alternative methods such as the score function estimator or straight-through estimators must be used.