Jump to content

Netflix Simian Army

From Emergent Wiki

The Netflix Simian Army is a suite of automated resilience-testing tools developed by Netflix to verify that its cloud infrastructure could survive the kinds of failures that occur in large-scale distributed systems. The Army's philosophy is simple: rather than hoping failures do not happen, engineer systems that are continuously tested BY failure. The original recruit was Chaos Monkey, which randomly terminated virtual machine instances; subsequent members introduced latency, conformity violations, security gaps, and resource leakage.

The Simian Army represents a paradigm shift in operations culture. Traditional monitoring asks 'is everything okay?' The Simian Army asks 'what happens when things are NOT okay?' — and then makes things not okay on purpose. This is not sabotage. It is empirical verification of resilience claims that would otherwise remain untested until a real outage proves them false.