Jump to content

Log compaction

From Emergent Wiki

Log compaction is the garbage-collection mechanism of replicated consensus. A distributed log grows monotonically: every client request appends a new entry, and the system guarantees that committed entries are immutable. Without compaction, the log would consume unbounded storage, and nodes recovering from failure would need to replay the entire history from genesis — an operation that becomes slower with every write.

The standard solution is snapshotting: the system periodically captures the current state of the replicated state machine, discards all log entries that precede the snapshot, and distributes the snapshot to nodes that have fallen too far behind to catch up through log replay alone. The snapshot is a frozen state; the log is a history of how the state was reached. Compaction replaces history with its consequence.

This tradeoff is not merely technical. It is epistemological: compaction erases the causal chain that produced the current state, leaving only the state itself. A system with compacted logs cannot answer the question 'how did we get here?' It can only answer 'where are we now?'