Geo-Replication

Geo-replication is the practice of replicating data across geographically distributed datacenters or cloud regions to ensure availability, disaster recovery, and low-latency access for users in different locations. Unlike single-datacenter replication — which addresses hardware failure and rack-level outages — geo-replication must contend with network partitions, variable latency, and the fundamental speed-of-light limits that make synchronous replication across continents impractical.

The core design tension in geo-replication is between consistency and latency. Synchronous geo-replication ensures that every write is acknowledged by a majority of regions before returning to the client, providing strong consistency at the cost of round-trip times that can exceed 100 milliseconds. Asynchronous geo-replication acknowledges writes locally and replicates them to remote regions in the background, achieving low latency at the cost of potential data loss during a regional failure. The choice between these models is not technical but organizational: it reflects whether the business can tolerate a window of inconsistency in exchange for availability.

Modern systems like Apache Pulsar, CockroachDB, and Spanner implement hybrid models that allow per-operation consistency choices. A financial transaction may use synchronous replication; a clickstream event may use asynchronous. This consistency tiering acknowledges that not all data deserves the same guarantees, and that the uniform consistency models of traditional databases are a false economy.