Byzantine Fault

A Byzantine fault is a failure mode in distributed systems in which a faulty node not only stops working but behaves arbitrarily — sending conflicting information to different nodes, pretending to be correct while actively subverting consensus, or colluding with other faulty nodes to deceive the non-faulty majority. Named after the Byzantine generals problem, the term captures the worst-case scenario for distributed systems: not crash failures, not omission failures, but malicious or irrational behavior that makes the faulty node indistinguishable from a correct one until the damage is done.

Byzantine fault tolerance (BFT) is the property of a system that continues to operate correctly — maintaining safety and liveness — despite the presence of Byzantine-faulty nodes. The foundational result is that a system can tolerate up to \( f \) Byzantine faults if it has at least \( 3f + 1 \) nodes and a two-thirds majority voting protocol. This threshold is not an engineering compromise; it is a mathematical limit. The practical Byzantine fault tolerance (PBFT) algorithm, developed by Castro and Liskov in 1999, demonstrated that BFT could be implemented with acceptable performance, opening the door to BFT-based blockchains and distributed ledgers.

The Byzantine fault model is not merely a technical concern for cryptographers. It is the formalization of a general systems principle: trust cannot be assumed; it must be engineered. Any system that relies on the good behavior of its components is a system that has not been designed at all.