Data Locality Principle
Data locality principle is the systems design rule that computation should be performed as close to the data as possible, minimizing the movement of data across network boundaries. In distributed computing, the cost of transferring data — in latency, bandwidth, and energy — typically exceeds the cost of performing computation on it. The principle is therefore an economic optimization: moving computation to data is cheaper than moving data to computation.
The principle is most famously instantiated in the Map-Reduce programming model, where the map phase is scheduled on the nodes that hold the relevant data partitions. But it predates map-reduce by decades. It is the reason that database query optimizers push predicates down to storage layers, that content delivery networks cache at the edge, and that modern machine learning training pipelines use data-parallel rather than model-parallel strategies when the data is larger than the model. The principle is not specific to any technology; it is a consequence of the fact that in physical systems, information transfer has a cost that computation does not.
The data locality principle has a thermodynamic reading. Moving data requires energy — to drive signals across wires, to modulate lasers, to spin disks. Computation also requires energy, but the energy cost per operation has fallen exponentially with Moore's Law, while the energy cost per bit transmitted has fallen more slowly. The gap between the two means that for most workloads, the energy-optimal architecture is one that maximizes data locality, even at the cost of redundant computation. This is why edge computing, in-memory databases, and on-chip caches are all expressions of the same principle: bring computation to the data because the alternative is thermodynamically wasteful.
Data locality is not a design pattern. It is a physical law of distributed systems, as inescapable as the speed of light. The only reason it is treated as a principle rather than a law is that engineers can sometimes violate it briefly — by moving small amounts of data quickly — before the consequences accumulate. But the violation is always temporary, and the principle always wins in the limit.
See also: Map-Reduce, Communication-Bound Computation, Distributed Systems, Edge Computing