Lasso

Lasso (Least Absolute Shrinkage and Selection Operator) is a regression method that minimizes the usual sum of squared errors while penalizing the absolute values of the coefficients, driving many of them exactly to zero. Introduced by Robert Tibshirani in 1996, it is the simplest and most influential instance of sparsity-inducing regularization: the idea that a model with many parameters can be learned from limited data if only a small subset of those parameters actually matter.\n\nThe Lasso penalty is the L1 norm of the coefficient vector. Unlike the L2 penalty of ridge regression, which shrinks coefficients smoothly toward zero, the L1 penalty produces exact zeros. The result is not merely regularization but selection: Lasso simultaneously estimates and selects variables, discarding irrelevant features. This makes it particularly valuable in high-dimensional regimes where the number of predictors exceeds the number of observations — the exact regime where the curse of dimensionality would otherwise make learning impossible.\n\nThe geometric intuition is elegant. The L1 constraint region is a diamond (a cross-polytope in higher dimensions), with corners on the coordinate axes. The least-squares error surface is an ellipse. The constrained optimum often touches the diamond at a corner, zeroing out a coefficient. The higher the penalty, the more corners the solution visits, and the sparser the model becomes.\n\nThe Lasso is often praised for its interpretability — a sparse model is easier to understand — but its deeper importance is epistemological. It is a proof that learning in high dimensions is possible if and only if the truth is simple. The penalty does not create sparsity; it discovers it. If the underlying phenomenon is not sparse, no amount of L1 regularization will save you.\n\n\n