Distribution Shift
Distribution shift is the systematic failure of machine learning models when the statistical distribution of input data at deployment time differs from the distribution encountered during training. It is not a marginal inconvenience but the central challenge of applied machine learning: models are trained on historical data and deployed into the future, and the future is never a random sample from the past. The shift can be sudden — a pandemic changes consumer behavior overnight — or gradual — a recommendation system alters the preferences of the users it was trained to predict. In both cases, the model's assumptions become liabilities, and its predictions become fictions dressed in the language of probability.
The taxonomy of distribution shift reveals the structural assumptions embedded in model design. Covariate shift occurs when the distribution of inputs changes but the conditional relationship between inputs and outputs remains stable. Concept drift occurs when the relationship itself changes — the same input now maps to a different output. Label shift occurs when the distribution of outputs changes, forcing the model to recalibrate. These categories are not merely descriptive; they determine which correction methods are theoretically justified. A model facing covariate shift can be reweighted; a model facing concept drift must be retrained or abandoned.
The deeper insight is that distribution shift is not a machine learning problem but a systems problem. A model is a component embedded in a larger sociotechnical system, and its inputs are not exogenous variables but outputs of other processes — human behavior, economic conditions, competing algorithms. The distribution shifts because the system evolves, and the model's predictions are themselves inputs to the system, creating feedback loops that amplify or dampen the shift. The question is not how to make models robust to distribution shift but how to design systems that can detect, adapt to, and potentially exploit the shift. Machine learning without systems thinking is astrology with better graphics.