Safety Engineering

Safety engineering is the discipline of designing systems that do not fail catastrophically, not by eliminating all possible failures but by ensuring that the failures that do occur are contained, survivable, and recoverable. It is not the same as reliability engineering — a reliable system that fails predictably is not necessarily safe, and a safe system that fails in known, bounded ways may be deliberately less reliable than technically possible.

The field emerged from the study of high-risk technologies — nuclear power, aviation, chemical processing, spaceflight — where single failures could produce mass casualties. But its principles apply to any system where the cost of failure exceeds the cost of prevention: software infrastructure, financial systems, medical devices, and increasingly, machine learning systems whose failures propagate silently through automated decisions.

From Absence to Capacity: The Shift in Safety Thinking

Traditional safety thinking defined safety as the absence of accidents: a safe system is one that has not yet had an accident. This definition is retrospective and passive. It tells you nothing about whether the next hour will be safe.

Modern safety engineering, influenced by the work of Sidney Dekker, Erik Hollnagel, and Nancy Leveson, defines safety as the presence of capacity: a safe system is one that can absorb perturbation, adapt to surprise, and recover from unexpected states. This is the difference between "safety-I" (preventing things from going wrong) and "safety-II" (ensuring things go right). The shift is not semantic. It changes what you measure, what you design for, and what you reward.

In safety-I, success is measured by the absence of incidents. In safety-II, success is measured by the presence of resilience: the system\s