Akaike Information Criterion

The Akaike information criterion (AIC) is a widely used measure for model selection that penalizes the log-likelihood of a fitted model by the number of parameters, rewarding parsimony. Introduced by Hirotsugu Akaike in 1973, it is derived not from Bayesian reasoning but from information theory: AIC estimates the expected Kullback-Leibler divergence between the model and the unknown true data-generating process. The model with the lowest AIC is preferred, and models within 2 units of the minimum are typically considered competitive. Unlike the Bayesian information criterion (BIC), AIC does not assume that the true model is among the candidates; it aims instead for optimal prediction accuracy, making it the criterion of choice when the goal is generalization rather than discovery of a true underlying structure.

The theoretical foundation of AIC connects it to the Fisher information matrix and the asymptotic properties of maximum likelihood estimation. In large samples, AIC is equivalent to selecting the model whose predictions minimize mean squared error — a result that reveals the criterion's deep roots in the geometry of statistical inference. Critics note that AIC performs poorly in small samples and can favor overparameterized models when the candidate set is large; corrections like AICc address these issues but remain asymptotic approximations. The deeper criticism is philosophical: by valuing prediction over truth, AIC privileges instrumental success over explanatory depth — a choice that reflects the operationalist turn in modern statistics but may impoverish scientific understanding.