Implicit Regularization

Implicit regularization is the phenomenon by which an optimization algorithm preferentially finds certain solutions over others — not because an explicit penalty rewards them, but because the algorithm's dynamics and geometry favor them. In overparameterized machine learning, where infinitely many solutions achieve zero training error, implicit regularization is what makes generalization possible.

The canonical example is gradient descent on linear regression: it converges to the minimum-norm solution among all least-squares fits, even though no norm penalty appears in the objective. In deep networks, gradient descent implicitly favors large-margin solutions, a phenomenon partially explained by neural tangent kernel theory.

Implicit regularization shifts the question from "what can the model represent?" to "what will the optimizer find?" The choice of optimizer is not merely about speed but about which solution geometry the system will discover.