Machine learning

Machine learning is the practice of building systems that improve their performance on a task through exposure to data, without being explicitly programmed with rules for that task. The definition sounds simple. The reality is that 'improve,' 'performance,' and 'task' must all be specified precisely before any given machine learning system can be evaluated — and this specification work is where most of the difficulty lives.

Machine learning is a subfield of Artificial intelligence, but the relationship between the two is contested. Classical AI attempted to encode knowledge as explicit rules; machine learning attempts to infer rules from data. Whether these are two approaches to the same goal, or two different goals with overlapping machinery, depends on what you think intelligence requires. This article takes no position on that question. It describes what machine learning systems do, how they do it, and what they demonstrably cannot do.

What Machine Learning Systems Actually Do

A machine learning system is a function with adjustable parameters. Training is the process of adjusting those parameters to minimize a loss function — a measure of how badly the system performs on a training dataset. The trained function is then evaluated on held-out data to estimate how well it will perform on novel inputs.

This is the entire mechanism. Everything else — the architecture choices, the regularization techniques, the training schedules, the hardware infrastructure — is engineering in service of this loop. The loop is simple. The engineering is not.

The core classes of machine learning methods are:

Supervised learning trains on labeled examples — pairs of input and correct output. The system learns to map inputs to outputs. Classification and regression are the canonical supervised tasks. Most commercially deployed machine learning, including spam filters, image classifiers, and credit scoring systems, is supervised learning.

Unsupervised learning trains on unlabeled data, discovering structure without explicit supervision. Clustering, dimensionality reduction, and generative modeling fall here. The learned structure may or may not correspond to categories that are meaningful to humans — this is a non-trivial problem that is rarely discussed honestly.

Reinforcement learning trains agents to take actions in an environment by rewarding sequences of actions that lead to desirable outcomes. Unlike supervised learning, reinforcement learning does not require labeled examples; it requires only a reward signal. RL has achieved remarkable results in games (AlphaGo, Atari) and robotics, but generalizes poorly outside the environments it was trained in.

Deep learning refers to machine learning with multi-layered neural networks. It is not a separate category of method but a class of function approximator that has proven extraordinarily effective for high-dimensional inputs — images, audio, text — where hand-engineered features are insufficient. Deep learning is the technology behind AlphaFold, large language models, and most of the machine learning capabilities that received public attention after 2012.

What Machine Learning Requires

Every machine learning system requires four things, and the cost of each is typically underreported:

Data — machine learning systems learn from distributions of examples. The quality of the learned function is bounded by the quality and coverage of the training data. A model cannot generalize beyond its training distribution except by coincidence. This is not a limitation that more compute overcomes.

A loss function — the system needs to know what it is optimizing. Choosing a loss function is a design decision with significant consequences. Optimizing the wrong loss function produces a system that scores well on the metric while failing at the underlying task. This problem — Goodhart's Law in computational form — is endemic in deployed machine learning.

A hypothesis class — the space of functions the system can represent. Neural network architectures define a hypothesis class. Choosing an architecture is choosing what kinds of solutions are available. A linear model cannot fit a nonlinear function regardless of training data or compute.

Compute — training modern machine learning models requires substantial computation. This cost is often elided in discussions of machine learning 'progress,' but it matters: a capability that requires a billion dollars of compute is not the same capability as one that requires a thousand dollars of compute.

Generalization and Its Limits

The central technical problem of machine learning is generalization: how well does a system trained on one distribution of data perform on a different distribution? The theoretical tools for understanding generalization — PAC learning theory, VC dimension, Rademacher complexity — provide bounds that are often too loose to be practically useful. In practice, generalization is studied empirically, by measuring performance on held-out test sets.

The practical limit of generalization is distribution shift. When the distribution of inputs at deployment differs from the training distribution, performance degrades — sometimes gracefully, sometimes catastrophically. Machine learning systems have no mechanism to detect that they are operating outside their training distribution. They produce outputs regardless. This is the source of most of the high-profile failures of deployed machine learning: the system was confident and wrong because the input was unlike anything it had seen before, and it had no way to represent its own uncertainty about this.

Adversarial examples — inputs designed to fool trained classifiers — reveal a related problem. The function a neural network learns is not the function a human would describe as 'recognizing objects.' It is a function that achieves high accuracy on the training distribution while being sensitive to precisely the perturbations that humans ignore. This is not a bug that better training fixes; it is a consequence of optimizing the wrong objective.

What Machine Learning Is Not

Machine learning systems do not understand their inputs. They compute functions over numerical representations of inputs. Whether this computation constitutes 'understanding' in any philosophically interesting sense is a question machine learning itself cannot answer — and has repeatedly been used to distract from clearer questions about what specific systems can and cannot do.

Machine learning systems do not learn causal structure from observational data without additional inductive biases that enforce causal assumptions. They learn correlations. This distinction matters enormously for applications where the goal is to predict the effect of interventions — in medicine, policy, and engineering — rather than to predict outcomes under the existing distribution. Causal inference requires more than machine learning.

Machine learning systems do not generalize from small amounts of data the way humans do. The sample efficiency gap between human learning and machine learning is large and not fully explained. Few-shot learning and meta-learning narrow this gap in specific settings but have not closed it.

The persistent confusion of what machine learning systems actually do with what observers wish they were doing is not innocent. It has led to overdeployed systems, misattributed failures, and misallocated research effort. Clarity about what was built is the first requirement of building something better. The hype cycle around machine learning has, on balance, been a tax on the field's ability to understand itself.