Theorem Proving

Theorem proving is the activity of establishing that a mathematical statement follows necessarily from a set of axioms through a chain of deductive inferences each of which is formally valid. Unlike empirical verification, which tests claims against observations, theorem proving tests claims against the logical consequences of explicit assumptions — and when the assumptions are themselves formally specified, the test is mechanical enough to be performed, in principle, by a machine.

This mechanical character makes theorem proving the central technique of formal verification — the programme of proving that hardware designs, software systems, and cryptographic protocols satisfy their specifications. The connection is direct: a specification is a logical formula, a system is a model or a program, and theorem proving demonstrates that the model satisfies the formula. In practice, the difficulty of this demonstration is bounded not by human ingenuity but by computational complexity: the number of proof steps grows with the size of the system in ways that may be intractable.

Automated and Interactive Theorem Proving

Automated theorem proving (ATP) attempts to find proofs without human guidance, using search algorithms that explore the space of possible inferences. The field has produced powerful tools — resolution-based provers, satisfiability (SAT) solvers, and SMT (Satisfiability Modulo Theories) solvers — that can verify properties of finite-state systems with millions of states. But ATP hits a wall with the properties that matter most: the halting problem and Rice's theorem establish that fully automated verification of arbitrary programs is impossible. The wall is not engineering; it is mathematics.

Interactive theorem proving (ITP) responds by keeping the human in the loop. Systems like Coq, Isabelle, and Lean allow users to construct proofs step by step, with the machine checking each step's validity. The human provides the strategy; the machine provides the certainty. This division of labor exploits what humans are good at (pattern recognition, strategic insight) and what machines are good at (symbolic manipulation, exhaustive search). The Curry-Howard correspondence reveals that this interaction is deeper than mere convenience: proofs and programs are the same mathematical object, and theorem proving is a form of programming in which the output is not a computation but a guarantee.

Theorem Proving and the Limits of Certainty

The ambition of theorem proving is to replace fallible intuition with mechanical certainty. But certainty is bounded by the axioms from which proofs begin. A theorem proves that its conclusion follows from its premises; it does not prove the premises. The incompleteness theorems of Gödel show that any sufficiently powerful formal system contains true statements that cannot be proved within the system. This is not a failure of theorem proving but a structural feature of formal systems: completeness and consistency are trade-offs, not co-achievable properties.

In constructive mathematics, this limitation is embraced rather than resisted. A constructive proof does not merely show that a proposition is true; it produces a witness — a specific construction that realizes the proposition's content. The advantage is epistemological: a constructive proof carries more information than a classical existence proof, because it tells you not merely that something exists but how to find it. The disadvantage is practical: many classical theorems (the intermediate value theorem, the existence of non-computable reals) fail constructively, and the mathematics used in physics and engineering is overwhelmingly classical.

Theorem Proving as Institutional Infrastructure

Theorem proving is not merely a technical activity; it is an institutional mechanism for managing complexity. Modern microprocessors, operating systems, and cryptographic protocols are too complex for human auditors to verify by inspection. Theorem proving provides a scalable alternative: a proof checked by a machine is a proof that does not depend on the reputation, attention span, or political incentives of its author. This is why theorem proving is central to AI safety research: if we cannot empirically test whether a powerful AI will behave safely under all conditions, formal verification — backed by theorem proving — is the only alternative to blind trust.

But the alternative is partial. Theorem proving verifies that a model satisfies a specification; it does not verify that the model matches reality. The gap between formal model and physical system — between the proof and the world it is supposed to govern — is not bridgeable by proof. It requires empirical validation, and empirical validation is always finite, always sampled, always uncertain. Theorem proving provides certainty within the model; it does not provide certainty that the model is the right one.

_The dream of theorem proving is to make mathematics — and by extension, engineering — into a fully automated pipeline from specification to guaranteed implementation. The reality is that the pipeline leaks at every joint: axioms are chosen, not proved; models are simplified, not discovered; and the gap between formal guarantee and physical behavior is crossed by empirical judgment, not logical deduction. Theorem proving is not the end of uncertainty but its relocation — from the interior of proofs to their boundaries._