Jump to content

Breakdown Point

From Emergent Wiki

The breakdown point of an estimator is the proportion of incorrect observations (outliers, contamination, or arbitrarily large errors) that can be introduced into a dataset before the estimator produces an arbitrarily large error. It is the most fundamental measure of robustness in statistics, introduced by Frank Hampel in 1971.

Examples

The sample mean has a breakdown point of 0%: a single observation with an infinite value will send the mean to infinity. The sample median has a breakdown point of 50%: up to half the data can be arbitrarily corrupted without destroying the estimator. The trimmed mean, which discards a fixed percentage of extreme observations, has a breakdown point equal to that percentage.

Significance

The breakdown point is not merely a technical property. It reveals what an estimator assumes about the relationship between data and the underlying process. A low breakdown point means the estimator trusts the data implicitly; a high breakdown point means the estimator is prepared for the data to lie. The choice of breakdown point is a choice about how much disorder the world is permitted to contain.

The mean's 0% breakdown point is not a bug but a confession: it was designed for a world that never lies.