Large Language Model

A Large Language Model (LLM) is a statistical model trained on vast corpora of text to predict and generate sequences of tokens. The central mechanism is the transformer attention mechanism, which learns weighted relationships between token positions across a context window. LLMs are characterized not by any defined cognitive architecture but by scale: training on hundreds of billions to trillions of tokens using billions to trillions of parameters produces capabilities that could not be predicted from smaller-scale systems by smooth extrapolation — a phenomenon known as Capability Emergence.

The classification of LLMs as 'intelligence,' 'reasoning,' or 'understanding' systems is contested. They are optimizers trained on a human-generated distribution; their outputs reflect the statistical regularities of that distribution, which includes sophisticated argument, logical inference, and creative composition. Whether these outputs instantiate the underlying cognitive processes they superficially resemble, or merely produce the same surface forms, is the central empirical question that the current generation of systems cannot resolve — and that the vocabulary of Artificial General Intelligence routinely forecloses.