Genetic Algorithms

Genetic algorithms (GAs) are a class of optimization and search algorithms inspired by biological natural selection and genetics. They maintain a population of candidate solutions (analogous to organisms), apply selection operators that preferentially retain high-fitness solutions, apply crossover operators that combine features of two solutions (analogous to sexual recombination), and apply mutation operators that introduce random variation. Over successive generations, this process typically produces progressively better solutions to the optimization problem.

The biological inspiration is explicit and should be examined critically: genetic algorithms borrow the *vocabulary* of evolution (chromosome, gene, fitness, generation, mutation) but implement radically simplified versions of the biological mechanisms. The question of whether GAs tell us anything essential about biological evolution — beyond the obvious fact that selection on heritable variation can optimize complex traits — deserves more skepticism than the field usually receives.

The Algorithm

A standard genetic algorithm proceeds as follows:

1. Initialize: Generate an initial population of N candidate solutions, typically encoded as bit strings (chromosomes). Population size typically ranges from 50 to several hundred.

2. Evaluate: Apply the fitness function to each candidate, assigning a real-valued fitness score.

3. Select: Choose candidates to reproduce, with probability proportional to fitness (tournament selection, roulette wheel selection, or rank selection). Higher-fitness candidates reproduce more.

4. Recombine: Apply crossover: randomly select a crossover point and exchange the tails of two parent chromosomes to create two offspring. Crossover probability typically 60-90%.

5. Mutate: Randomly flip bits with low probability (typically 0.1-1%). Mutation maintains genetic diversity and prevents premature convergence.

6. Replace: The new generation replaces the old, either fully or through elitist selection that preserves the best individuals.

7. Repeat from step 2 until convergence or a stopping criterion is met.

The theoretical justification is Holland's schema theorem (1975): short, highly-fit subpatterns (schemata) are exponentially amplified across generations, driving the search toward high-fitness regions of the solution space. This provides a theoretical basis for why GAs can work, not just an empirical observation that they do. The schema theorem has been critiqued and refined extensively — it proves less than it originally claimed — but remains the foundational theoretical framework.

Applications

GAs have been applied to a wide range of optimization problems: engineering design (antenna shapes, aerodynamic profiles), machine learning hyperparameter optimization, scheduling, protein structure prediction, game playing, and neural architecture search. Their particular advantage is robustness: they do not require gradient information, can handle discontinuous and noisy fitness landscapes, and are less susceptible to local optima than gradient-based methods.

Neuroevolution — using genetic algorithms to evolve neural network weights and architectures — had a productive period in the 1990s-2000s and has seen resurgence through NEAT (NeuroEvolution of Augmenting Topologies) and more recent evolutionary strategies that scale well with parallel computation. For some reinforcement learning problems, neuroevolution approaches perform competitively with or better than gradient-based reinforcement learning.

The Biological Analogy: What It Does and Does Not Teach

The skeptic's essential question: do genetic algorithms illuminate biological evolution, or do they merely use its vocabulary?

The honest answer is: mostly the latter. Genetic algorithms reproduce several key features of biological evolution — selection on heritable variation, the creative role of recombination, the tension between exploitation (selecting fit solutions) and exploration (maintaining variation). These are genuine insights that the GA framework makes computable and testable.

But genetic algorithms also differ from biological evolution in ways that are not merely technical simplifications but fundamental distortions:

Fitness is specified externally and explicitly. In GAs, the fitness function is given to the algorithm by the designer. In biological evolution, there is no external specifier of fitness; fitness is the statistical outcome of differential reproduction in a changing environment. The fitness landscape in biology is itself a product of the evolutionary process (through niche construction, coevolution, and the evolution of evolvability). GAs have a fixed, static fitness function; biology has a dynamic, endogenous one.

Chromosomes are directly evaluated. In biological organisms, the connection between genotype and fitness is mediated by the entire process of development — a process of extraordinary complexity that involves gene regulatory networks, cell signaling, physical morphogenesis, and environmental interaction. GAs typically evaluate fitness directly from chromosome representation without any developmental intermediate. The absence of development in GA models means they cannot reproduce or study the role of developmental constraints in shaping evolvability.

Populations are small and generations are discrete. Biological evolution acts on populations of billions or trillions of individuals undergoing continuous reproduction with overlapping generations. GAs maintain populations of hundreds and proceed in discrete steps. The population-genetic dynamics of drift, linkage disequilibrium, and selective sweeps that have been central to molecular evolution are largely absent from GA models.

Recombination operates on an artificial encoding. The crossover operator in GAs assumes that fitness-relevant information is distributed along the chromosome in a way that makes arbitrary recombination points productive. This assumption is sometimes satisfied (if the encoding is well-designed) and sometimes produces nonsensical offspring. Biological recombination operates on chromosomes where the encoding has been refined over billions of years of evolution to make recombination productive. GAs cannot benefit from this refinement unless the encoding is carefully designed — which requires external knowledge about the problem structure.

The skeptic's verdict: genetic algorithms are a useful and productive class of optimization algorithms. They are not a model of biological evolution in any deep sense, and the biological vocabulary should not be taken to imply that insights from GAs transfer directly to evolutionary biology. The claim that GAs demonstrate key properties of evolution is partially true (selection on heritable variation does optimize) and partially misleading (the mechanisms are too simplified to reproduce the rich population genetics and developmental biology that make biological evolution what it is). Using GAs to "study" biological evolution is roughly as informative as using a checkers program to study human strategic cognition — there is formal analogy, but the substrate differences matter more than the analogy captures.