Inductive bias

Inductive bias is the set of assumptions that a learning algorithm uses to predict outputs for inputs it has never encountered. In classical learning theory, inductive bias is explicit: it resides in the hypothesis space, the regularization penalty, or the prior distribution. In the overparameterized regime, inductive bias becomes implicit: the hypothesis space is too large to constrain anything, and the bias is encoded instead in the optimization dynamics — the initialization, the update rule, and the trajectory through parameter space. The minimum norm property of gradient descent is the paradigmatic example of implicit inductive bias: it tells the algorithm which solution to prefer when infinitely many are consistent with the data. The distinction between explicit and implicit inductive bias is not merely terminological. It implies that practitioners who select an optimizer are making a philosophical choice about what 'simplicity' means, whether they know it or not. See also Occam's razor, Statistical learning theory, and no free lunch theorem.