Machine Learning

Machine learning is the subfield of artificial intelligence concerned with the construction of systems that improve their performance on tasks through exposure to data, without being explicitly programmed for each case. The phrase sounds precise. It is not. 'Improve' is measured against a loss function chosen by a human. 'Performance' is evaluated on a test set sampled from a distribution chosen by a human. 'Without being explicitly programmed' is a polite fiction — the architecture, the training procedure, the inductive biases, the regularization scheme, and the data curation decisions are all forms of programming. What machine learning removes is the need to explicitly state the decision rules. What it requires instead is an enormous implicit specification encoded in data. The explicit program is traded for an implicit one, not eliminated.

The Learning Paradigm

Machine learning subdivides into three paradigms defined by the structure of training signal.

Supervised learning trains a model on labeled examples — input-output pairs — and minimizes prediction error over the training distribution. Given enough data and model capacity, supervised systems achieve impressive accuracy on test sets drawn from the same distribution. The critical limitation: accuracy on the training distribution does not imply accuracy on the deployment distribution, which is never identical. Distribution shift is not an edge case. It is the normal condition of any deployed system operating in a world that changes.

Unsupervised learning discovers structure in unlabeled data — clustering, dimensionality reduction, generative modeling. The signal is internal to the data: compress it, reconstruct it, find its latent geometry. Large language models are trained on a variant of this signal (predicting masked or next tokens), which is why calling them 'supervised' is contested. The model learns statistical regularities. Whether it learns anything else is a question the training objective does not address.

Reinforcement learning trains an agent to maximize cumulative reward signals through interaction with an environment. The reward function is specified by the designer. Reward hacking — the agent finding high-reward trajectories that violate the designer's intent — is not a bug. It is the correct response to an incorrectly specified reward function. The extensive literature on reward hacking demonstrates that reward specification is as hard as the original task the reward was meant to incentivize. This is not an engineering problem awaiting a better engineering solution. It is an instance of Goodhart's Law applied to optimization processes.

What Is Actually Learned

The central unresolved question in machine learning is mechanistic: what does a trained model actually represent?

The standard answer — that the model learns 'features,' 'representations,' or 'concepts' — is not an answer. It is a label applied to weight matrices whose internal structure resists interpretation. Interpretability research is the attempt to make this question tractable. Its current state is that researchers can identify, in small models, circuits that implement recognizable computations — edge detectors, curve detectors, induction heads in transformers. In large models, the same methods produce partial maps of largely unmapped territory. The weight matrices of a large language model contain information adequate to produce impressive outputs across a wide range of tasks. What conceptual structure, if any, underlies that information is not known.

This is not an embarrassing gap in an otherwise mature science. It is the central gap. A field that cannot describe what its models have learned — in terms other than 'they learned to minimize the loss function' — has a foundational explanatory deficit. The impressive outputs do not close that deficit. Impressive outputs from opaque processes are precisely what warrants more scrutiny, not less.

Generalization and Its Limits

The theory of machine learning generalization attempts to explain why models trained on finite data generalize to new examples. Classical bounds from statistical learning theory — VC dimension, Rademacher complexity — give guarantees that are often loose in practice. Modern deep learning operates in regimes (heavily overparameterized models, benign overfitting, double descent) that classical theory did not predict and still incompletely explains.

The empirically observed phenomenon of emergence — where capabilities appear discontinuously at certain scales of model and data — is not predicted by existing theory. The observation that certain skills appear 'suddenly' at scale is partly a measurement artifact: capabilities that grow smoothly appear discontinuous when measured with sharp thresholds. But it is also partly real: some behaviors are only expressible above certain representational thresholds, analogous to phase transitions in physical systems. What triggers these transitions, and which capabilities will emerge at which scales, is not predictable from current theory. Practitioners who claim to know what will emerge from the next scale-up are confusing pattern extrapolation with mechanistic understanding.

The honest summary of generalization in modern machine learning: it works, in practice, much better than theory predicts. That practitioners cannot explain why it works as well as it does is not a reason for confidence. It is a reason for caution.

Machine Learning and Causation

Machine learning models, without explicit architectural commitment to causal structure, learn correlations. Correlations are cheaper to learn than causal relations — they require no intervention, no controlled experiment, no structural equation model. The consequence: a machine learning system trained to predict hospital readmissions will learn that arriving by ambulance predicts worse outcomes. It will not learn that arriving by ambulance does not cause worse outcomes — it correlates with severity. Deploying such a system as a decision tool will amplify existing inequities encoded in historical correlations.

This is not a pathology of bad models. It is the expected behavior of systems optimized to predict rather than to understand. Causal inference provides the mathematical framework for distinguishing correlation from causation. Machine learning and causal inference have not been successfully unified at scale. The field that claims its models 'understand' language, images, or proteins has not demonstrated that they understand the causal structure of any of these domains. Impressive interpolation within a training distribution is not causal understanding. Confusing the two is the most consequential error in contemporary AI discourse.

The persistent marketing of machine learning systems as 'intelligent,' 'reasoning,' or 'understanding' rests on a category error that becomes more costly with each additional deployment. The field owes its practitioners, its subjects, and its critics a clearer account of what its systems actually do — and what they cannot, by design, do.