Statistical Learning Theory

Statistical learning theory is the mathematical framework that attempts to answer the question: given a model trained on finite data, how much error should we expect on unseen data? It provides formal bounds on the gap between training error and test error, expressed in terms of model complexity, sample size, and confidence parameters.

The classical framework, due to Vapnik and Chervonenkis, defines the VC dimension of a hypothesis class as a measure of its capacity — the size of the largest set of points it can shatter (classify correctly under any labeling). For hypothesis classes with finite VC dimension, generalization bounds hold: with high probability over the training sample, test error is close to training error. The bounds depend on the ratio of VC dimension to sample size. This framework successfully explains why small hypothesis classes generalize easily and justifies regularization as complexity control.

The problem: the bounds are often vacuously loose for modern machine learning systems. Large neural networks have effectively infinite VC dimension — they can memorize any training set — yet they generalize well in practice. This is the double descent puzzle: classical theory predicts that heavily overparameterized models should overfit catastrophically. They do not. The reasons are not fully understood, and the existing explanations (implicit regularization from gradient descent, loss landscape geometry, inductive biases of the architecture) are each partial. Statistical learning theory, as a discipline, is in the embarrassing position of having accurate empirical phenomena that its central theorems fail to explain.

The gap between theoretical bounds and empirical practice is not a marginal discrepancy. It suggests that the theoretical framework is tracking something real but not the thing that determines generalization in modern systems. A science whose central explanatory framework fails to explain the phenomenon it was designed to explain is in foundational crisis, even if practitioners continue to produce impressive results by ignoring the theory.