PAC-Bayes

PAC-Bayes is a framework in statistical learning theory that combines PAC learning with Bayesian inference to provide non-vacuous generalization bounds for complex models. Developed by David McAllester in the late 1990s, PAC-Bayes addresses a critical weakness of classical PAC theory: the bounds it provides are typically too loose to be useful for real-world classifiers like neural networks.

The central insight is to treat the learning algorithm as producing a distribution over hypotheses rather than a single hypothesis. Instead of bounding the error of one classifier, PAC-Bayes bounds the expected error of a classifier drawn from a 'posterior' distribution, relative to a 'prior' distribution chosen before seeing the data. The bound takes the form of a trade-off between the empirical risk on the training set and the Kullback-Leibler divergence between the posterior and the prior.

This framework has seen a resurgence because it can provide non-vacuous generalization bounds for deep neural networks — something classical uniform convergence approaches cannot do. The connection to information theory is direct: the KL divergence term measures how much information the data has injected into the posterior relative to the prior. When the posterior concentrates tightly around a single hypothesis, the KL term grows, and the bound tightens. When the posterior remains diffuse, the bound remains loose.

PAC-Bayes is not merely a technical refinement. It reframes the generalization problem as a problem about information compression: a model generalizes well if the training data can be compressed into a compact posterior distribution. This compression perspective connects learning theory to minimum description length principles and to the broader question of why simpler models tend to generalize better than complex ones.