Genetic Programming

Genetic programming (GP) is a branch of evolutionary computation in which the evolved structures are not fixed-length parameter vectors or bit strings, but executable programs — typically represented as syntax trees, linear sequences, or grammatical structures. Where genetic algorithms optimize over a space of candidate solutions, genetic programming optimizes over a space of candidate \textit{generators} of solutions. The shift from searching for answers to searching for procedures is profound: it is the difference between finding a bridge and evolving the capacity to design bridges.

The Tree Representation

The canonical GP representation, due to John Koza, encodes programs as rooted trees whose internal nodes are functions and whose leaf nodes are terminals (variables, constants, or zero-argument functions). A program that computes the volume of a sphere might be represented as:

 (* (/ 4 3) (* pi (* r (* r r))))

This tree structure permits natural genetic operators: subtree crossover exchanges random subtrees between two parent programs; subtree mutation replaces a random subtree with a randomly generated one. The closure property — requiring that all functions accept the return types of all possible subtrees — is the central design constraint. Violate it, and the population fills with syntactically invalid programs that cannot be executed.

The closure property is not merely a technical convenience. It is a constraint that shapes the evolutionary dynamics of the system. Programs that respect closure evolve differently than programs that do not. The representational choice is not neutral; it is a substrate decision that determines what is evolvable.

Beyond Trees: Representational Diversity

Tree-based GP is not the only representation. Linear genetic programming evolves sequences of machine-code-like instructions. Cartesian genetic programming uses indexed graphs. Grammatical evolution separates genotype (a string of integers) from phenotype (a program generated by a grammar), allowing standard genetic algorithms to evolve programs without tree-specific operators. Each representation embodies different assumptions about what makes a program evolvable — about which structural features should be preserved under variation and which should be free to change.

This representational diversity reveals a deeper point: the space of "programs" is not a single landscape but a family of landscapes, each defined by a representation and its associated variation operators. The No Free Lunch theorems apply here with particular force: no representation is universally superior. What works for symbolic regression may fail for circuit design; what succeeds in controller evolution may collapse in program synthesis.

The Problem of Bloat

Genetic programming suffers from a distinctive pathology called "bloat" — the uncontrolled growth of program size without corresponding improvement in fitness. Bloated programs are not merely large; they are filled with non-functional code ("introns") that serves no purpose but protects useful subtrees from disruption by crossover. Bloat is not a bug in the implementation; it is an emergent property of the representation and operators.

The persistence of bloat across GP systems suggests something important: program spaces are not naturally friendly to evolutionary search. The genotype-phenotype map in GP — the mapping from syntax tree to executable behavior — is extraordinarily non-smooth. Small syntactic changes produce large behavioral changes. This "brittleness" is the opposite of the smooth fitness landscapes that make evolution effective. GP practitioners spend enormous effort designing representations and operators that artificially smooth the landscape.

GP and the Nature of Design

What makes genetic programming philosophically interesting is that it automates not just optimization but \textit{design}. A successful GP run produces a program that its human creators did not write, and in many cases could not have written — a solution that exploits loopholes in the fitness function, discovers counterintuitive algorithms, or invents novel circuit topologies. The 2005 invention of a patentable analog circuit by a GP system is often cited; less often cited is the fact that the circuit worked by exploiting properties of the physical substrate that the human designers had not modeled.

This is the central tension of genetic programming: it automates design, but the designs it produces are deeply coupled to the representational and physical substrate in which they evolve. A GP-evolved solution is not a Platonic algorithm that can be cleanly separated from its medium. It is a program that works \textit{because} of the closure property, \textit{because} of the fitness function's particular structure, \textit{because} of the operators' bias toward certain tree shapes. The substrate is not incidental. It is constitutive.

The dream of genetic programming is to evolve intelligence itself — to sit back and watch as populations of programs bootstrap their way from arithmetic to abstraction. But the history of the field suggests a different lesson: that the space of possible programs is so vast and so hostile to search that we must carefully engineer the substrate to make evolution possible at all. Genetic programming does not demonstrate that design is easy; it demonstrates that design is hard enough that even evolution needs help. The claim that GP will one day evolve artificial general intelligence confuses the existence of a search process with the accessibility of the target.