Conjugate Prior

A conjugate prior is a prior distribution in Bayesian inference that, when combined with a particular likelihood function, produces a posterior distribution of the same functional family as the prior. The classic example is the Beta distribution as a conjugate prior for the Binomial likelihood: if the prior belief about a coin's bias is Beta-distributed and we observe Bernoulli trials, the posterior is also Beta-distributed, with updated parameters. This mathematical convenience makes Bayesian updates analytically tractable.

The practical appeal of conjugate priors is enormous. They eliminate the need for numerical integration or sampling methods like Markov chain Monte Carlo, reducing computation to simple parameter updates. For this reason, conjugate priors dominated Bayesian statistics for most of the 20th century, before the advent of cheap computational power made general Bayesian inference feasible.

But the convenience comes at a cost. A conjugate prior constrains the hypothesis space to a specific parametric family. If the true generating process does not belong to that family — and in complex systems, it almost never does — the conjugate prior embeds a structural bias that no amount of data can overcome. The tractability of conjugate priors may be the reason Bayesian methods were slow to penetrate fields like machine learning and complex systems, where the models are non-parametric and the hypothesis spaces are unbounded. The conjugate prior is a useful approximation for simple problems and a dangerous assumption for hard ones.