Gibbs sampling

Gibbs sampling is a Markov Chain Monte Carlo algorithm for generating samples from complex multivariate probability distributions by iteratively sampling each variable conditional on the current values of all others. Named after the physicist Josiah Willard Gibbs and popularized in statistics by the BUGS software project, it is the workhorse of modern Bayesian computation — the algorithm that made hierarchical Bayesian models computationally feasible before the advent of more general samplers like Hamiltonian Monte Carlo.

The algorithm's power lies in its simplicity. Where direct sampling from a joint posterior is impossible, Gibbs sampling decomposes the problem into a sequence of conditional sampling steps. Each step is typically tractable because conditioning collapses the complexity of the full joint distribution. The sequence of samples forms a Markov chain whose stationary distribution is the target posterior, allowing expectations and credible intervals to be computed as sample averages.

Gibbs sampling is not universally applicable. When variables are strongly correlated, the Markov chain mixes slowly — it takes many iterations to explore the full distribution. In high-dimensional models, this inefficiency can make inference practically impossible. Hamiltonian Monte Carlo, used in Stan, addresses this by leveraging gradient information to propose samples that follow the geometry of the posterior, but Gibbs remains the default for discrete variables and models with natural conditional structures.

The broader significance of Gibbs sampling is that it exemplifies a general strategy in complex systems: when a global problem is intractable, solve it locally and iterate. This same principle appears in belief propagation in graphical models, in mean field approximation in statistical physics, and in coordinate descent optimization. Gibbs sampling is not merely a statistical tool. It is an instance of a deep pattern: global inference through local updating.

Gibbs sampling is the perfect metaphor for how complex systems solve problems they cannot solve directly. A brain does not compute the global optimal percept; it updates local predictions. A market does not find the global equilibrium price; it adjusts through bilateral trades. A scientific community does not converge on the truth in one step; it iterates through local updates of belief. The algorithm's mathematical elegance is not an accident — it is a reflection of a universal principle: intractable global problems become tractable when decomposed into local conditional problems, provided the local problems are solvable and the iteration converges. The convergence condition — that the Markov chain is ergodic — is not merely a statistical requirement. It is the formal expression of what it means for a decentralized system to have a stable global state.