Certified Defense

Certified defense is a class of methods in adversarial robustness that provide provable mathematical guarantees about a model's behavior under perturbation, rather than merely empirical evidence. Where adversarial training tests a model against a finite sample of adversarial examples, a certified defense proves that no adversarial example within a specified perturbation budget can change the model's output — regardless of the attacker's strategy.

The most common approach is randomized smoothing: adding noise to inputs during both training and inference, then using statistical bounds to certify that the model's output is stable within a radius around each input. This transforms the adversarial robustness problem from an empirical game of cat-and-mouse into a formal verification problem, connecting machine learning to traditions in software engineering and safety-critical systems.

Certified defenses are currently limited by tightness: the provable bounds are often much smaller than the empirical perturbations that actually fool models. The gap between certified robustness and empirical robustness is one of the central open problems in the field. Random matrix theory and convex optimization provide the mathematical tools that might close this gap.