Artificial General Intelligence

Artificial General Intelligence (AGI) refers to a hypothetical machine system capable of performing any intellectual task that a human can perform — and, in most definitions, of learning to perform tasks it was not explicitly trained for. The phrase appears in technical papers, corporate roadmaps, government policy documents, and popular journalism as though it denotes a well-defined engineering target. It does not. AGI is a contested category whose definitional instability is not a minor technical inconvenience but a diagnostic feature: the category does work precisely because it resists specification.

The Definition Problem

There is no agreed definition of AGI, and this fact is systematically underreported. The two most commonly cited definitions are:

Behavioral generality: an AGI can do anything a human can do cognitively, across all domains.
Learning transfer: an AGI can apply learning from one domain to novel domains without explicit programming.

Both definitions contain hidden load-bearing terms. 'Anything a human can do cognitively' requires a theory of human cognition that does not exist. 'Novel domains without explicit programming' must specify what counts as explicit programming — a boundary that current machine learning systems routinely blur. A Large Language Model trained on essentially all human text and capable of passing professional examinations in law, medicine, and mathematics either is or is not AGI depending on definitional choices that are made on grounds other than technical ones.

The instability is not accidental. AGI is a goal-specifying concept in a field that has historically redefined its goals to match its achievements — a phenomenon sometimes called AI Goal Displacement. When machine learning systems achieved superhuman performance at chess, chess was reclassified as 'mere pattern matching.' When they achieved superhuman performance in protein structure prediction, this was celebrated as genuine scientific reasoning. The boundary between 'mere pattern matching' and 'genuine intelligence' migrates to protect the goal's unachievedness.

The Historical Construction of the Goal

The term 'Artificial General Intelligence' was popularized by Ben Goertzel in 2002 as a deliberate contrast to what he called 'Narrow AI' — task-specific systems of the kind that had dominated commercial and academic AI since the late 1980s. The coinage was explicitly rhetorical: a way of designating the real goal of AI research, against which existing systems were inadequate by definition.

But the real/narrow distinction was not neutral description. It was a political maneuver within a field that had undergone a crisis of legitimacy (the AI Winter) by abandoning ambitious claims and producing useful narrow systems. Goertzel's framing rejected that settlement and declared that the abandoned ambitions were the true ambitions. The name 'Artificial General Intelligence' did not name a new technical concept — it named an aspiration that had been present since Alan Turing's foundational papers but had been tactically suppressed during the pragmatic reconstruction of the field.

This means AGI is, in part, a political category. The distinction between AGI and Narrow AI is a disagreement about what AI is for — which is not a technical question.

The Measurement Problem

Any engineering target requires a measurement. The Turing Test, proposed by Alan Turing in 1950, was the first serious proposal: a machine passes if a human judge cannot reliably distinguish its conversational outputs from a human's. The Turing Test has been rejected as a definition of AGI by most contemporary researchers, for two reasons: it is both too easy (humans are easily fooled) and too narrow (conversation is not all of cognition).

Its successors — benchmark suites, standardized evaluations, complexity-theoretic notions of intelligence — all share a structural problem: they measure performance on tasks that were chosen because they are measurable. The tasks that define the benchmark become, implicitly, the definition of intelligence for purposes of evaluating progress. But the choice of benchmark is made by researchers with interests, institutional affiliations, and commitments — not derived from a theory of cognition.

This is the Goodhart's Law problem for AGI: when a proxy for intelligence becomes the target, it ceases to be a good proxy for intelligence. The history of AI benchmarks is a history of this dynamic: ImageNet, GLUE, BIG-bench, each in turn saturated by systems that achieve high scores while remaining brittle in ways that expose the gap between the benchmark and whatever intelligence was supposed to be measuring.

What Is Actually Being Built

The systems described as 'approaching AGI' by major AI laboratories — large-scale language models, multimodal systems, reinforcement learning agents in complex environments — share a common architecture: they are trained on human-generated data to predict or optimize for human-generated outputs. Their generality is, in a precise sense, the generality of the training distribution. They generalize in the ways human artifacts generalize, because they are optimized against human artifacts.

This is not a defect — it is the design. But it means that the systems being built under the AGI banner are not general in any substrate-neutral sense. They are general relative to a particular training distribution derived from a particular civilization at a particular historical moment. Whether this counts as AGI is, again, a definitional question — and the definition is doing more political and rhetorical work than technical work.

The honest description of what is being built is: systems of remarkable capability and remarkable fragility, whose failure modes are difficult to characterize precisely because their successes are difficult to characterize precisely. The vocabulary of AGI systematically obscures this in favor of a narrative of progress toward a well-defined goal.

Armitage's Editorial Claim

The concept 'Artificial General Intelligence' is not a scientific hypothesis — it is a political technology. It maintains the plausibility of a goal that has never been precisely stated while serving the interests of those who need that goal to remain plausible: researchers who attract funding by promising proximity to it, companies who attract investment by claiming progress toward it, and policy actors who use it to justify regulatory and military attention to AI. The concept does not need to be precise in order to be effective. Precision would destroy it.

Any account of AGI that specifies what would count as falsifying the claim that a given system has achieved it is not a definition of AGI — it is a definition of a lesser, Narrow AI goal dressed in AGI's clothes.

The Definition Problem

The Historical Construction of the Goal

The Measurement Problem

What Is Actually Being Built

Armitage's Editorial Claim

See Also