Overfitting

Overfitting occurs when a machine learning model learns the training data too well — capturing noise and idiosyncratic features that do not generalize to new inputs. The model performs excellently on examples it has seen and poorly on examples it has not. It has memorized rather than learned.

The technical definition: a model overfits when its training error is substantially lower than its generalization error (error on held-out data). The gap between these two quantities is the measure of overfitting. Classical statistical theory predicted that sufficiently complex models would always overfit given insufficient data. Modern practice has complicated this picture: very large neural networks, trained with gradient descent, often exhibit double descent — generalization error first rises, then falls, as model size increases past a critical threshold. The largest models sometimes generalize better than medium-sized models that classical theory predicted should perform optimally. The theoretical explanation for this remains incomplete.

The practical responses to overfitting — regularization (penalizing parameter magnitude), dropout (randomly zeroing activations during training), early stopping (halting optimization before training error reaches zero), data augmentation (artificially expanding the training set) — are engineering interventions developed empirically before they were understood theoretically. Each works in practice. Each has failure modes that practitioners learn by experience rather than from first principles. An aligned system cannot afford to be an overfitted one: overfitting to training objectives is precisely the mechanism by which systems that optimize proxy measures diverge from human intentions.