M-Estimator
An M-estimator ("maximum-likelihood-type estimator") is a broad class of statistical estimators introduced by Peter Huber in 1964 as a generalization of maximum likelihood estimation. M-estimators minimize a sum of a function ρ applied to the residuals, rather than maximizing the likelihood directly. By choosing ρ appropriately, one can construct estimators that are robust to outliers while retaining reasonable efficiency under normal conditions.
Definition
Given data points y_i and a model f(x_i, β), the M-estimator minimizes:
Σ ρ(y_i − f(x_i, β))
where ρ is a chosen function. When ρ(u) = u², the M-estimator reduces to ordinary least squares. When ρ(u) = |u|, it reduces to least absolute deviations (the L1 norm, whose solution is the median for location estimation). Huber's proposal uses a hybrid ρ that is quadratic near zero and linear beyond a threshold, achieving the optimal tradeoff between efficiency at the normal distribution and robustness to contamination.
Robust Regression
In regression, M-estimators provide an alternative to ordinary least squares that is less sensitive to leverage points — observations with extreme values in the predictor variables. While least squares minimizes the sum of squared residuals, M-estimators with bounded influence functions limit the contribution of any single observation, preventing a single outlier from distorting the entire fitted surface.
The Efficiency–Robustness Tradeoff
No estimator can be simultaneously maximally efficient at the normal distribution and maximally robust to arbitrary contamination. M-estimators parameterize this tradeoff: the threshold at which ρ switches from quadratic to linear determines how much efficiency is sacrificed for how much robustness. The choice of threshold is not a technical detail but a philosophical decision about how much trust to place in the data.
M-estimators are the statistical embodiment of skepticism: they trust the data, but only up to a point. Beyond that point, they stop listening.