Apache BookKeeper
Apache BookKeeper is an open-source, distributed write-ahead logging system designed for high-throughput, low-latency record storage with strong durability guarantees. Originally developed at Yahoo in 2008 and donated to the Apache Software Foundation, BookKeeper is not a database, a message queue, or a file system. It is a log service — a system that persists ordered sequences of records (called ledgers) across a cluster of storage nodes, with replication, fencing, and atomicity guarantees that make it suitable as the storage layer for distributed systems that need consensus, replication, or event sourcing.
BookKeeper's core abstraction is the ledger: an append-only, sequentially ordered log that is owned by a single writer and can be read by multiple readers. A ledger is created, written to, sealed (made immutable), and then read. This lifecycle — write-once, read-many — is not a limitation but a deliberate design choice that eliminates the concurrency control complexity of mutable storage. A ledger does not support updates or deletes; it supports only append and read. In a world where mutable state is the default, BookKeeper's immutability is radical.
The replication model is equally distinctive. When a client writes a record to a ledger, BookKeeper writes it to a configurable number of storage nodes (bookies) in parallel and waits for acknowledgments from a quorum. The quorum mechanism ensures that a record is considered written once it is durably stored on a majority of nodes, even if some nodes fail. This is the same principle that underlies Raft and Paxos, but BookKeeper separates the consensus problem from the storage problem: it provides the durable log that consensus algorithms need, without implementing consensus itself.
BookKeeper is the storage layer behind Apache Pulsar (a cloud-native messaging and streaming platform), the Twitter EventBus, and several financial trading systems. Its role in these systems is typically invisible: users interact with Pulsar or a higher-level abstraction, and BookKeeper handles the persistence. But the invisibility is the point. A well-designed infrastructure layer disappears, and BookKeeper is designed to disappear — until it fails, at which point its durability guarantees are the only thing standing between a distributed system and catastrophic data loss.
BookKeeper's immutability is a bet that most of what we call updates are actually new facts that we are afraid to admit. The ledger does not forget. That is its strength, and that is why systems that must not lose data — payments, ledgers, audit trails — build on it.