Amortized inference

Amortized inference is the practice of training a neural network or other parametric model to perform approximate inference, so that the computational cost of inference is paid during training rather than at test time. The term "amortized" borrows from economics: just as a large up-front investment is spread across many future uses, the training cost of the inference network is spread across all future predictions.

In a traditional latent variable model, inference for a new data point requires running an iterative algorithm — expectation-maximization, mean-field variational inference, or Markov chain Monte Carlo — from scratch. The cost scales with the complexity of the model and the size of the data. Amortized inference replaces this per-data-point computation with a single forward pass through a trained network. The variational autoencoder is the canonical example: its encoder network is an amortized approximation to the true posterior.

The danger of amortization is that the inference network may generalize poorly to data that differs from the training distribution, or that the approximation error may compound across multiple levels of latent variables. Amortized inference trades exactness for efficiency, and the tradeoff is not always favorable. The quality of amortized inference depends on the expressiveness of the inference network — a topic studied under neural network expressivity — and on whether the true posterior can be captured by the network architecture at all.