Cache

A cache is a hardware or software component that stores data so that future requests for that data can be served faster. The stored data might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data is found in the cache; a cache miss occurs when it is not. The hit ratio — the fraction of requests served from cache — is the fundamental metric of cache effectiveness, and optimizing it is one of the central engineering problems in computer systems design.

The principle behind caching is not technical but temporal: exploit the predictability of access patterns to reduce the latency of retrieval. Programs do not access memory uniformly. They exhibit locality of reference: temporal locality (recently accessed data is likely to be accessed again) and spatial locality (data near recently accessed data is likely to be accessed soon). A cache is a bet that the future resembles the past, and it is a bet that wins often enough to justify the hardware.

The Memory Hierarchy

Modern computers do not have a single memory. They have a memory hierarchy — a stack of storage layers from registers (fastest, smallest, most expensive per byte) through L1, L2, and L3 caches, main memory (DRAM), and finally disk storage (slowest, largest, cheapest per byte). Each layer acts as a cache for the layer below it. The L1 cache caches lines from the L2 cache; the L2 cache caches lines from the L3 cache or main memory; main memory caches pages from disk. This hierarchy is not an implementation detail. It is the defining architectural feature of modern computing, and every program that ignores it pays a performance penalty that no amount of algorithmic cleverness can recover.

The gap between layers is staggering. A CPU register access takes roughly 1 cycle. An L1 cache hit takes 3-4 cycles. An L2 cache hit takes 10-20 cycles. Main memory access takes 100-300 cycles. A disk seek takes millions of cycles. A program that fits in L1 cache can be 100x faster than one that thrashes main memory. This is not a marginal concern. It is the dominant factor in performance for most applications, and the reason that cache-aware and cache-oblivious algorithms are active research areas.

Cache Design

A cache is defined by three parameters: its size (how much data it can hold), its block size (the unit of transfer between cache and memory), and its associativity (how many places a given block can reside). A fully associative cache allows any block to go anywhere; a direct-mapped cache allows each block exactly one location; a set-associative cache divides the cache into sets, with each block mapping to a specific set but free to occupy any line within it. Higher associativity reduces conflict misses (where two frequently accessed blocks map to the same location) but increases lookup latency and hardware cost.

The replacement policy determines which block to evict when the cache is full. Least Recently Used is the theoretical ideal but expensive to implement exactly. Pseudo-LRU, FIFO, and random replacement are common approximations. The choice of replacement policy can affect performance by 10-20% on real workloads, making it a significant design decision.

Write policies are equally consequential. A write-through cache updates both the cache and main memory on every write, ensuring consistency at the cost of bandwidth. A write-back cache updates only the cache line, marking it "dirty," and writes to main memory only when the line is evicted. Write-back is faster but requires mechanisms to handle coherence when multiple caches hold copies of the same data.

Cache Coherence

In multiprocessor systems, each core typically has its own private L1 and L2 caches, with a shared L3 cache or main memory. When two cores modify the same cache line independently, the system must ensure that all observers see a consistent ordering of writes. This is the cache coherence problem, and it is one of the hardest problems in parallel computing.

The most widely implemented solution is MESI (Modified, Exclusive, Shared, Invalid), a state machine that tracks whether each cache line is clean or dirty, shared or exclusive. When a core writes to a shared line, it must invalidate copies in other caches, generating coherence traffic on the interconnect. This traffic is invisible to the programmer but devastating to performance. False sharing — when two unrelated variables happen to reside on the same cache line and are modified by different cores — can reduce performance by orders of magnitude without changing a single line of source code.

The cache coherence problem is a microcosm of a deeper systems principle: local optimization creates global complexity. Each core's private cache is a local performance win. The coherence protocol required to maintain consistency across caches is a global cost that grows with the square of the core count. At sufficient scale, coherence traffic dominates, and the system ceases to scale. This is why large-scale parallel systems increasingly abandon hardware coherence in favor of software-managed memory or message-passing models.

Software Caching

Caches are not only hardware constructs. Every layer of the software stack implements caching: the operating system caches disk blocks in memory (the buffer cache); databases cache query results and index pages; web browsers cache HTTP responses; content delivery networks cache static assets at network edges; DNS resolvers cache address mappings. The principle is identical: store frequently accessed data closer to the consumer to reduce retrieval latency.

Software caches introduce their own complexities. Cache invalidation — determining when cached data has become stale — is famously one of the two hard problems in computer science (along with naming things and off-by-one errors). A DNS cache that holds an expired record can direct users to the wrong server. A database query cache that returns stale results can produce inconsistent application state. A CDN cache that serves an old version of a JavaScript file can break websites. The difficulty is not in caching but in knowing when to stop.

The cache is the most successful abstraction in computer systems because it is invisible when it works and catastrophic when it fails. Programmers who do not understand the memory hierarchy write code that performs beautifully in simulation and disastrously in production. The cache is not an optimization; it is the ground on which all optimization stands.