Autoregressive model

Autoregressive model is a statistical model that predicts each element of a sequence conditioned on all previously generated elements, creating a causal chain in which the past determines the future but not vice versa. In the context of LLMs, autoregressive generation means that each token is sampled from a probability distribution conditioned on the entire preceding context, and the sampled token is then appended to the context to generate the next. This produces a feedback loop: the model's output becomes its input, and the trajectory through token space is determined by both the fixed parameters and the growing history.

From a systems perspective, autoregression is a form of iterated map — a discrete dynamical system in which the state at time t+1 is a function of the state at time t. The mathematical structure is identical to that of a Markov chain, though the state space is vastly larger and the transition function is learned rather than specified. The autoregressive property ensures that the model cannot revise its past commitments; it can only extend them. This makes autoregressive generation a form of path-dependent process, where early tokens constrain the geometry of later possibilities in ways that the model itself cannot escape.

The autoregressive constraint is both a limitation and a source of capability. It prevents the model from globally optimizing its output — it cannot look ahead to ensure that the final sentence will be coherent — but it also forces the model to develop local consistency heuristics that scale to arbitrary sequence lengths. The transformer architecture implements autoregression through masked attention, which enforces the causal constraint at the architectural level by zeroing out attention to future positions.

Autoregression is not a neutral design choice. It is a commitment to a particular philosophy of generation: that the future must be built one step at a time, without revision, and that the only resource available at each step is the history already constructed. This is less a model of intelligence than a model of storytelling — and the question is whether the stories we tell ourselves are, in the end, the only kind of thinking we can do.