Ground Truth
Ground truth is the authoritative reference label against which the output of a machine learning model or measurement system is evaluated. The term originates in surveying, where it designated observations made directly on the ground rather than inferred from aerial or remote-sensing data; in the contemporary usage, it names the label a model is trying to predict — and the hidden assumption that such a label is both available and correct.
The assumption is frequently false in two distinct ways. First, ground truth is often unavailable at prediction time: the label that would adjudicate whether a model's output is correct may arrive hours, months, or years after the prediction was made — if it arrives at all. A distribution shift that degrades model performance in deployment may go undetected for the entire duration of the lag between prediction and feedback. Second, ground truth labels are not neutral observations; they are themselves products of measurement processes, human judgments, and institutional decisions that introduce their own errors. The label 'fraudulent transaction' reflects the bank's enforcement choices, not an objective fact about the transaction. The label 'cancerous tissue' reflects the pathologist's judgment, which carries known inter-rater variability.
Systems that treat ground truth as given and correct are building on an unexamined foundation. The honest accounting is that many deployed AI systems have never been evaluated against true ground truth — only against the best approximation available, whose error rate is unknown.
See also: Benchmark Engineering, Distribution Shift, Evaluation Methodology