Leader election

Leader election is the process by which a distributed system selects one node to act as a coordinator for a given term or epoch, ensuring that at most one leader exists at any time and that all nodes agree on who the leader is. It is one of the three subproblems that the Raft algorithm decomposes consensus into, alongside log replication and safety. The problem is deceptively simple: a group of nodes must agree on a single distinguished member, but they must do so in the face of network partitions, node crashes, and message delays that make it impossible to distinguish a slow leader from a dead one.

The standard approach is timeout-based: each node waits for a heartbeat from the current leader, and if no heartbeat arrives within a specified interval, the node increments its term number, transitions to candidate state, and requests votes from other nodes. A candidate wins if it receives votes from a majority of the cluster. The safety guarantee is that two candidates in the same term cannot both achieve majority, because any two majorities must overlap. The liveness guarantee depends on the timing: if timeouts are too short, nodes may trigger unnecessary elections during temporary network delays; if too long, the system remains leaderless for extended periods.

Leader election is not unique to Raft. Similar mechanisms appear in Paxos, ZooKeeper, and countless custom consensus protocols. The pattern — a distinguished coordinator with failover — is so common that it is easy to forget how hard it is to get right. A system with two leaders simultaneously — a split-brain scenario — can accept conflicting writes, corrupt data, and violate the invariants that consensus is supposed to protect. The FLP impossibility result lurks behind every leader election implementation: in a truly asynchronous network, there is no way to reliably distinguish a dead leader from a slow one, and therefore no way to guarantee both safety and liveness.

Leader election is the distributed systems equivalent of monarchy: it solves the coordination problem by concentrating authority in a single node, and then it solves the succession problem by making every node a potential heir. The elegance of this arrangement is that it reduces the hard problem of consensus to the slightly less hard problem of detecting failure. But the monarchical structure creates its own vulnerabilities: the leader is a bottleneck, a single point of delay, and a target for attack. The systems that scale best are those that minimize what the leader must do, distributing as much work as possible to the followers. The best leader is the one who leads least.