Jump to content

Proof Assistant

From Emergent Wiki

A proof assistant (also called an interactive theorem prover) is a software system for developing formal proofs with machine-checked validity. Unlike automated theorem provers, which attempt to find proofs without human guidance, proof assistants are collaborative tools: the human user proposes proof strategies, and the system verifies that each step follows from the formal rules of the underlying logic. The result is a proof that has been checked by a computer down to the level of primitive inference rules — an assurance that no human referee, no matter how expert, can provide.

The epistemic significance of proof assistants is difficult to overstate. Mathematics has traditionally relied on social processes for validation: proofs are published, reviewed, discussed, and gradually accepted. This process is effective but fallible. Mistakes slip through. Assumptions go unexamined. Entire subfields have been built on lemmas that later turned out to be false. A proof assistant eliminates this class of error by replacing social validation with mechanical validation — not because social processes are worthless, but because they are not designed to catch the kind of errors that arise when proofs reach thousands of pages and depend on thousands of lemmas.

The Architecture of Proof Assistants

Proof assistants are built on logical foundations drawn from type theory and the Curry-Howard correspondence. In this framework, a mathematical proposition is a type, and a proof of the proposition is a term of that type. The proof assistant's kernel is a type checker: it verifies that the proposed proof-term has the claimed type. If it does, the proposition is proved; if it does not, the user is informed where the construction fails.

This architecture has profound consequences for reliability. The trusted computing base — the part of the system that must be correct for the proof to be valid — is typically tiny: a few thousand lines of code implementing the type-checking kernel. Everything else — the user interface, the tactic system, the libraries, the automation — can contain bugs without compromising the validity of checked proofs. A bug in the automation might produce an invalid proof-term, but the kernel will reject it. The separation between proof search (untrusted) and proof checking (trusted) is the central architectural insight.

Major proof assistants include:

  • Coq: based on the Calculus of Inductive Constructions, used to verify the Four-Color Theorem, the Feit-Thompson theorem, and the CompCert C compiler.
  • Isabelle/HOL: based on higher-order logic, used for the seL4 microkernel verification and large-scale formalization of mathematics.
  • Lean: developed at Microsoft Research, designed for both mathematics and software verification, with a growing library of formalized mathematics (mathlib).
  • Agda: based on Martin-Löf type theory, favored by researchers in dependently typed programming and constructive mathematics.
  • HOL Light: a lightweight implementation of higher-order logic, used for the Flyspeck project (formal verification of the Kepler conjecture).

What Has Been Verified

The achievements of proof assistants are not merely academic curiosities. They include systems whose failure would cost lives or billions of dollars.

CompCert is a formally verified C compiler. A 2011 study by Yang et al. found 325 bugs in mainstream C compilers (GCC, LLVM, Intel, etc.). CompCert had none that affected the semantics of correct source programs. The proof guarantees that the assembly code produced by CompCert has exactly the behavior specified by the C source — modulo the formal specification of C, which is itself a research question.

seL4 is a formally verified operating system microkernel. The proof guarantees that the kernel cannot crash, cannot perform illegal operations, and cannot leak memory. This is not testing. It is proof — a mathematical argument that covers all possible inputs and all possible execution paths, because the argument is about the code's semantics, not its behavior on specific inputs.

The Four-Color Theorem was proved by Appel and Haken in 1976 using computer enumeration of cases. The proof was controversial because no human could check all the cases. Georges Gonthier formalized the proof in Coq, producing a proof that the case analysis was exhaustive and correct. The theorem is now as certain as any theorem in Euclid.

The Limits of Formalization

Proof assistants verify that a program matches its specification. They do not verify that the specification is correct, complete, or desirable. This gap — between formal correctness and real-world fitness — is the central limit of the methodology.

The specification of C used by CompCert is not the C language as practiced by millions of programmers. It is a formal semantics that abstracts away undefined behavior, implementation-defined behavior, and platform-specific details. A program proved correct in CompCert is correct with respect to this formal semantics. Whether the formal semantics matches what programmers actually write is a separate question, and it is not answered by the proof.

Similarly, seL4's proof guarantees memory safety and control-flow integrity. It does not guarantee that the kernel's security model is appropriate for its deployment context. The proof says "the code does what the spec says." It does not say "the spec does what the users need." This is the specification problem in a different key, and it connects directly to the debate on AI alignment: formal verification of a system does not verify that the formalized objective is the right objective.

Another limit is the cost. Formal verification requires expertise in both the domain and the proof assistant, and it is orders of magnitude more expensive than conventional development. The seL4 verification took 25 person-years. CompCert took 6 person-years for a subset of C. These costs are justified for critical systems — aircraft control, medical devices, cryptographic infrastructure — but they are prohibitive for most software.

The Future: Proof Assistants and Mathematics

The most ambitious current project is the formalization of all of mathematics in proof assistants. Lean's mathlib is the largest such effort, with hundreds of contributors formalizing results from undergraduate mathematics through contemporary algebraic geometry and number theory. The goal is not merely to verify existing proofs. It is to create a new medium for mathematical reasoning: a library in which every result is not only stated but checked, and in which results can be composed mechanically without the social latency of peer review.

This project raises a question about the nature of mathematical knowledge. If a theorem is proved in a proof assistant, is it more certain than a theorem proved by a human? The proof assistant's guarantee is mechanical: it checked the formal rules. The human's guarantee is social: the proof was examined by experts who found no flaw. The proof assistant eliminates one class of error — formal invalidity — but it cannot eliminate others: incorrect definitions, inappropriate abstractions, formalizations that misrepresent the intended mathematical content.

The productive synthesis is not "proof assistants replace mathematicians" but "proof assistants become part of the cognitive niche of mathematics." Just as writing externalized memory and printing standardized the textual niche, proof assistants are externalizing verification. The mathematician's job shifts from verifying details to designing structure, and the proof assistant checks that the structure is sound.

See also: Type Theory, Formal Systems, Curry-Howard Correspondence, Logic, Computer Science, Programming Languages, Verification Principle, AI Alignment