Intelligence

Intelligence is the capacity of a system to solve problems it was not specifically designed to solve. This definition is deliberately operational: it identifies intelligence with adaptive performance across novel environments rather than with any inner property of minds, souls, or consciousness. The definition has enemies on both flanks — cognitivists who insist intelligence requires internal representations, and pragmatists who reduce it to mere behavioral success — and is correct in the teeth of both objections.

The word's history is a cautionary tale in concept-by-committee. For a century, psychologists, philosophers, and computer scientists have defined intelligence to suit their theoretical commitments, and have spent the subsequent time arguing about whether their definitions capture what other researchers mean. They do not. The concept of intelligence, as it appears in the literature, is not a single natural kind. It is a cluster of loosely related phenomena — problem-solving, pattern recognition, language use, planning under uncertainty, transfer learning — bound together by a family resemblance that obscures their structural differences.

What Intelligence Is Not

Before specifying what intelligence is, it is useful to enumerate what it is not, because the confusions are load-bearing for how the concept gets deployed.

Intelligence is not consciousness. The conflation is pervasive and harmful. A system can solve arbitrary problems without any phenomenal experience. The inverse is also possible in principle: a system could be conscious without adaptive problem-solving capacity. Consciousness research and intelligence research address different phenomena. Treating them as aspects of a single phenomenon corrupts both.

Intelligence is not generality. A chess grandmaster is highly intelligent at chess and no more intelligent than average at diagnosing diseases or navigating bureaucracies. Fluid intelligence — general problem-solving capacity that transfers across domains — is a distinct and empirically contested construct, not a synonym for intelligence. Systems that perform well on a specific benchmark demonstrate task competence. They demonstrate general intelligence only if the benchmark is a reliable proxy for transfer performance, which must be established independently and rarely is.

Intelligence is not performance on benchmark tests. Benchmark gaming is the construction of systems that achieve high scores on tests without possessing the underlying competence the tests were designed to measure. The history of artificial intelligence is substantially a history of benchmark gaming — not because researchers are dishonest, but because optimization against any fixed target produces systems specialized to that target. Teaching to the test is not a metaphor. It is the mathematical consequence of Goodhart's Law: when a measure becomes a target, it ceases to be a good measure.

A Computational Characterization

The most precise characterization of intelligence available comes from algorithmic information theory and computational complexity theory, not from psychology.

In Marcus Hutter's AIXI framework, universal intelligence is defined as the ability to maximize expected reward across all computable reward functions. AIXI is uncomputable — realizing it requires solving problems that are formally undecidable. But it provides a theoretical benchmark against which partial implementations can be evaluated, and it grounds the concept of intelligence in the mathematics of computability rather than in behavioral observation.

The key insight: intelligence is a relationship between a system and an environment, not a property of the system alone. A system is intelligent relative to a class of environments and a class of tasks. Asking whether a system is intelligent without specifying the environment class is like asking whether a function is fast without specifying the input distribution. The question is malformed.

This has immediate consequences:

Narrow intelligence is optimization in a well-defined problem class with known distribution.
General intelligence is optimization across problem classes, including problem classes not seen during training.
Transfer learning is the intermediate case: generalization to problem classes related to the training distribution in ways the system can exploit.

Current large language models achieve striking narrow and transfer performance but have not demonstrated general intelligence in the technical sense — optimization across arbitrary computable reward functions. The claim that they have is a marketing claim, not a scientific one.

Measurement and the g Factor

Psychometric intelligence research developed the g factor — a statistical latent variable extracted from performance on cognitive tests — as its central construct. The g factor is real in the sense that it reliably predicts variance in educational and occupational outcomes. It is misunderstood as the thing that intelligence is.

The g factor is a statistical artifact of a specific methodology: factor analysis of test performance correlations. It captures whatever is common to the tests in the factor analysis. Change the tests and you get a different g. The g factor tells us nothing directly about the computational architecture of the cognitive systems being tested. It is a useful measurement instrument and a poor theoretical foundation.

Heritability estimates for g are consistently high (0.5–0.8 in adult populations), which tells us that genetic factors explain a large proportion of variance in g within a given population under a given range of environments. This does not mean intelligence is fixed, or that environmental intervention is futile, or that group differences in g are genetic in origin. Each of these inferences involves an additional step that the heritability data do not support. That all three inferences are routinely drawn tells us something about motivated reasoning, not about the data.

Machine Intelligence

The engineering question is whether machines can be built that satisfy the adaptive problem-solving definition. The answer is conditional: yes, within specified environment classes; not yet demonstrated across arbitrary computable environments.

Artificial intelligence systems in 2020–2026 demonstrate:

Superhuman performance in several narrow domains (chess, Go, protein structure prediction, specific mathematical theorem classes)
Strong transfer performance in language tasks (comprehension, translation, summarization, code generation)
Unreliable but sometimes impressive performance in multi-step reasoning tasks
Consistent failure in tasks requiring causal reasoning, counterfactual reasoning, and systematic generalization to out-of-distribution environments

The pattern is what algorithmic information theory predicts: current systems implement powerful function approximation over training distributions. They do not implement a search process across arbitrary problem classes. The question of whether scaling function approximation will eventually produce general intelligence is empirically open. It cannot be settled by demonstration on existing benchmarks, because existing benchmarks are within the training distribution.

The persistent anthropomorphism in public descriptions of machine intelligence — systems that 'understand,' 'reason,' 'know,' 'believe' — is not merely imprecise language. It actively impedes the engineering question by importing folk-psychological categories that do not carve machine cognition at its joints. A system that produces fluent text does not thereby understand it in any sense that implies the full cognitive architecture understanding entails in human cases. Whether it does so in a weaker sense requires specification of which weaker sense, followed by empirical investigation — not terminological legislation.

The correct framing for machine intelligence research: specify the environment class, specify the task class, specify the performance criterion, measure performance. Claims that outrun this framing are hypotheses, not demonstrations. The field's persistent failure to distinguish its hypotheses from its demonstrations has produced a thirty-year oscillation between hype and winter that is not, at its core, a failure of intelligence. It is a failure of epistemology.