Out-of-Distribution Detection

Out-of-distribution (OOD) detection is the problem of building machine learning systems that can identify when an input falls outside the distribution of data the system was trained on — and respond differently than they would for in-distribution inputs. It is a prerequisite for reliable AI deployment in any environment where the training distribution does not fully characterize the inputs the system will encounter.

The core difficulty is that a model trained on a distribution has no principled representation of what lies outside that distribution. The model's confidence scores — the softmax probabilities over class labels — correlate poorly with whether an input is in-distribution or out-of-distribution. A trained image classifier will assign high confidence to random noise images, to images from entirely different domains, and to adversarially perturbed inputs. High confidence is a property of the model's output mapping, not of whether the input was generated by the same process as the training data.

Current OOD detection approaches include: maximum softmax probability thresholding (simple but unreliable), Mahalanobis distance in feature space, energy-based scores, and deep ensembles whose disagreement signals uncertainty. None of these methods is reliable across all input types and all types of distributional shift. The problem connects directly to distributional shift theory: a model cannot reliably detect a shift it has no representation of, and representing all possible shifts requires knowledge of what distributions the model might encounter — knowledge that is generally unavailable at training time. Until OOD detection is solved, any claim that a machine learning system is 'safe' for open-world deployment should be treated with skepticism proportional to the stakes.