Sampling methods

The sampling methods are a family of techniques for generating samples from a probability distribution, enabling approximate inference when exact computation is intractable. They are the computational engine of modern Bayesian statistics, machine learning, and statistical mechanics, turning integrals that cannot be solved analytically into empirical averages that can be computed.

The simplest method is rejection sampling: sample from a proposal distribution and accept or reject based on the ratio of target to proposal density. But rejection sampling fails in high dimensions, where the acceptance rate becomes vanishingly small. The Metropolis-Hastings algorithm, introduced in 1953, solves this by constructing a Markov chain whose stationary distribution is the target. The algorithm proposes a new state, accepts it with a probability that depends on the ratio of target densities, and otherwise stays put. The sequence of accepted states forms a correlated sample from the target distribution.

Gibbs sampling, a special case of Metropolis-Hastings, samples each variable conditioned on all others. It is efficient when the conditional distributions are simple, but it struggles when variables are strongly correlated — the chain moves slowly through the state space, and convergence diagnostics become unreliable. This is the curse of dimensionality in sampler form: high-dimensional spaces are vast, and the volume of interesting regions is tiny.

The deeper theoretical framework is Markov chain Monte Carlo (MCMC), which studies the convergence of Markov chains to their stationary distributions. The detailed balance condition ensures that a chain converges to the correct distribution, but it says nothing about how fast. The mixing time — the number of steps required for the chain to approach stationarity — is the critical quantity, and it is often unknown and difficult to estimate.

For high-dimensional problems, traditional MCMC methods fail, and modern alternatives have emerged. Hamiltonian Monte Carlo (HMC) uses gradient information to propose states that follow the geometry of the target distribution, dramatically improving mixing in complex landscapes. Variational inference takes a different approach: instead of sampling, it approximates the target with a simpler distribution and optimizes the parameters to minimize the Kullback-Leibler divergence. This trades asymptotic exactness for computational speed.

The choice between sampling and variational methods is not a question of correctness but of regime. Sampling is asymptotically exact but slow; variational methods are fast but biased. In practice, the two are often combined: variational methods initialize samplers, or samplers refine variational approximations. The field is moving toward a hybrid ecosystem where the distinction between sampling and optimization is dissolving.

The sampling revolution in statistics was not merely a computational advance. It was an epistemological shift: from the belief that inference must be exact to the acceptance that approximation is not error but method. The Monte Carlo principle — that randomness can solve deterministic problems — is one of the deepest ideas in modern computation, and it has only begun to transform how we reason about uncertainty.