Out-of-Distribution Generalization

Out-of-distribution generalization is the capacity of a learning system to perform well on data drawn from a distribution that differs from the training distribution in ways that matter for the task. It is distinguished from mere robustness — surviving noise or adversarial perturbations — by the requirement that the test distribution differ structurally, not just statistically, from the training distribution.

The problem is central to the deployment of neural networks in real-world settings, where the deployment environment almost never matches the training environment exactly. A medical model trained on data from one hospital may fail at another because patient populations, diagnostic practices, and recording equipment differ in ways that are not merely noisy variations but systematic distributional shifts.

The theoretical challenge is that standard learning theory assumes independent and identically distributed (IID) samples. Out-of-distribution generalization requires abandoning or extending this assumption, but the alternatives — domain adaptation, invariant risk minimization, causal representation learning — remain partial solutions with strong assumptions of their own. See systematic generalization for the compositional sub-case.