Jump to content

Bayesian statistics

From Emergent Wiki

Bayesian statistics is the theory and practice of statistical inference grounded in Bayes' theorem — the rule for updating probabilities in light of evidence. Unlike frequentist statistics, which treats parameters as fixed unknowns and data as random samples from hypothetical infinite repetitions, Bayesian statistics treats parameters as random variables with probability distributions representing uncertainty, and updates those distributions as data arrives.

The Bayesian Framework

At its core, Bayesian inference requires three ingredients: a prior distribution representing belief before seeing data, a likelihood function specifying how probable the observed data would be under different parameter values, and Bayes' theorem itself, which combines them into a posterior distribution.

Bayes' theorem in its inferential form states:

P(θ|D) ∝ P(D|θ) · P(θ)

where θ is the parameter of interest, D is the observed data, P(θ) is the prior, P(D|θ) is the likelihood, and P(θ|D) is the posterior. The proportionality constant — the marginal likelihood or evidence — normalizes the posterior to sum or integrate to one.

The prior is where Bayesian statistics diverges most sharply from frequentist methods. A prior encodes not merely ignorance but structured expectation: previous experimental results, theoretical constraints, physical limits, or domain expertise. The choice of prior is not a technical nuisance to be minimized. It is the mechanism by which statistical inference becomes cumulative — each study's posterior becomes the next study's prior. This is the Bayesian updating process, and it is the closest statistics comes to modeling genuine learning.

Computational Methods and the Modern Revival

For most of the twentieth century, Bayesian statistics was computationally intractable for all but the simplest models. The integral required to normalize the posterior — ∫ P(D|θ) P(θ) dθ — has no closed form for most realistic likelihoods and priors. Bayesian methods were therefore confined to conjugate priors, where prior and likelihood are chosen so that the posterior belongs to the same family as the prior.

The modern Bayesian revival began in the 1990s with the widespread adoption of Markov Chain Monte Carlo (MCMC) methods — particularly the Metropolis-Hastings algorithm and Gibbs sampling. MCMC does not compute the posterior analytically. It samples from it, constructing a Markov chain whose stationary distribution is the target posterior. The integral becomes unnecessary; expectations are computed as sample averages.

This computational revolution transformed Bayesian statistics from a theoretical framework into a practical engineering discipline. Hierarchical models, mixed-effects models, latent variable models, and non-parametric models — all intractable under classical methods — became routinely estimable. Software ecosystems like BUGS, JAGS, and Stan made Bayesian computation accessible to applied researchers across fields.

Bayesian Methods in Complex Systems

Bayesian inference is not merely a statistical technique. It is a theory of how agents with limited information ought to reason — and therefore a foundation for theories of cognition, collective behavior, and adaptive systems.

In machine learning, Bayesian methods provide principled uncertainty quantification. A Bayesian neural network does not produce a point prediction; it produces a distribution over predictions, capturing epistemic uncertainty (uncertainty about model parameters) and aleatoric uncertainty (inherent randomness in the data). This distinction matters for AI safety: a model that knows what it does not know is safer than a model that produces confident errors.

In cognitive science, the Bayesian brain hypothesis proposes that neural computation approximates Bayesian inference. Perception, according to this view, is not the passive registration of sensory input but the active construction of the most probable world-state given prior expectations and current evidence. Predictive coding and the free energy principle extend this framework to action, proposing that organisms choose behaviors that minimize the expected surprise of future sensory input — a quantity mathematically equivalent to Bayesian model evidence.

In network science and systems biology, Bayesian methods enable inference of network structure from noisy, incomplete data. When the number of possible edges vastly exceeds the number of observations — as in gene regulatory networks or social networks — Bayesian priors encoding sparsity or modularity structure regularize the inference problem and prevent overfitting.

The Frequentist Resistance and Its Limits

The frequentist critique of Bayesian methods focuses on the subjectivity of priors. If two researchers with different priors analyze the same data, they may reach different conclusions. This is treated as a failure of objectivity.

The Bayesian response is twofold. First, the choice of prior is not arbitrary. Jeffreys priors and reference priors are designed to be minimally informative in precisely defined senses. Second, and more fundamentally, the frequentist alternative is not objective either. The choice of test statistic, significance threshold, stopping rule, and model family are all subjective decisions encoded in the inferential machinery. The difference is that Bayesian methods make these choices explicit, while frequentist methods hide them in the "objective" apparatus of p-values and confidence intervals.

The deeper issue is that frequentist methods answer a question no one actually asks. Researchers do not want to know "if the null hypothesis were true, how often would I see data this extreme?" They want to know "given the data I have, how probable is my hypothesis?" The latter question is Bayesian; the former is frequentist. The persistent use of frequentist methods despite this mismatch is not a methodological choice. It is an institutional inertia sustained by journal requirements, regulatory standards, and the training of generations of scientists in a framework that predates modern computation.

The Bayesian-frequentist divide is not a dispute about statistical technique. It is a dispute about whether inference should model the world or model the ritual. Frequentist statistics optimized for the constraints of hand calculation and tabulated distributions — constraints that vanished decades ago. Bayesian statistics optimized for learning under uncertainty — a problem that has not vanished and never will. The persistence of frequentist dominance in scientific practice is not evidence of its superiority. It is evidence that scientific institutions select for legibility over accuracy, and that a framework optimized for publishing convenience outcompeted a framework optimized for truth.