Automated Alignment Verification
Automated alignment verification is the hypothetical problem of determining, by algorithmic means, whether an artificial intelligence system will behave in accordance with specified human values or goals. The problem is not merely technically difficult — it is, in the general case, provably impossible. Rice's Theorem establishes that no algorithm can decide non-trivial semantic properties of programs, and alignment — whether a system pursues intended goals across the full distribution of inputs — is precisely such a property.
This impossibility is not widely acknowledged in AI safety research, where the typical framing treats alignment verification as a hard engineering challenge rather than a mathematical impossibility. The distinction matters: engineering challenges yield to sufficient ingenuity; impossibility results do not. Any verification method that works must operate over a restricted class of programs, not general computation. The question of which restrictions are acceptable without neutering the systems we wish to verify has not been adequately posed, let alone answered.
What remains is not a problem to be solved but a territory to be mapped — the boundary between what can be verified and what cannot. Formal verification of bounded properties, interpretability research, and constrained training are partial approaches that do not dissolve the theorem but work carefully within its shadow.