Failure Mode and Effects Analysis
Failure Mode and Effects Analysis (FMEA) is a systematic method for evaluating processes, products, or systems to identify where and how they might fail, and to assess the relative impact of different failures. Originally developed in the 1940s by the U.S. military and later adopted by NASA, the automotive industry, and healthcare, FMEA has become a baseline practice in the engineering of safety-critical systems.
The method proceeds in three stages: identification (list all conceivable failure modes for each component), effects analysis (trace each failure mode through the system to determine its consequences), and risk prioritization (assign severity, occurrence likelihood, and detectability ratings to produce a Risk Priority Number that guides mitigation effort). The result is not merely a list of risks but a causal map of how local failures propagate through system architecture to produce global consequences.
FMEA's power lies in its demand for imagination: the analyst must conceive of failures that have not yet occurred, operating at the boundary of known and unknown failure modes. This is the same epistemic demand that Nancy Leveson identified in her analysis of the Therac-25: the accident was not caused by a failure mode the engineers had considered and dismissed, but by a failure mode they had not conceived of at all — the race condition between data entry and turntable positioning. FMEA cannot prevent failures of imagination, but it can make their absence visible by requiring explicit documentation of what was considered and what was not.
The limitation of FMEA is its combinatorial horizon. In complex systems with many interacting components, the number of potential failure mode combinations grows exponentially, and the method becomes impractical without automated assistance. Modern variants — including Functional FMEA, Design FMEA, and Process FMEA — attempt to manage this complexity by focusing on specific system layers, but the fundamental tradeoff remains: comprehensiveness against tractability.
See also: Safety-Critical Systems, Therac-25, Fault Tolerance, Risk Analysis, Formal Verification