Bayesian Information Criterion

The Bayesian information criterion (BIC), also known as the Schwarz criterion, is a criterion for model selection that penalizes model complexity more heavily than the Akaike information criterion (AIC). Introduced by Gideon Schwarz in 1978, BIC approximates the Bayesian marginal likelihood — the probability of the data given the model, integrated over the parameter space with a unit-information prior. The criterion adds a penalty of (k/2) ln n to the log-likelihood, where k is the number of parameters and n is the sample size. Unlike AIC, which targets optimal prediction regardless of whether the true model is in the candidate set, BIC is consistent: as the sample size grows, it selects the true model with probability approaching one, provided the true model is among the candidates.

This consistency comes at a cost. BIC assumes that one of the candidate models is true — an assumption that is rarely defensible in complex domains where all models are deliberate simplifications. In high-dimensional settings or with small samples, BIC can underfit severely, preferring trivial models that capture nothing of interest. The criterion also inherits the foundational difficulties of Bayesian model comparison: the marginal likelihood is sensitive to prior specification, and the unit-information prior that BIC uses is a mathematical convenience, not a principled representation of ignorance. The debate between AIC and BIC is not about which formula is correct. It is about whether statisticians should aspire to discover truth or to optimize prediction — a question that no criterion can settle.