NUMA

Non-Uniform Memory Access (NUMA) is a memory architecture for multiprocessor systems in which each processor has local memory that it can access faster than memory attached to other processors. In a NUMA system, memory is physically distributed but logically shared, creating a performance topology where the cost of a memory access depends on which processor issues the request and which memory bank holds the data. This breaks the uniform-memory abstraction that most software assumes, and optimizing for NUMA locality — keeping a thread's data on its local memory node — has become a critical performance tuning task in high-performance computing and large-scale servers.

NUMA is the Memory Wall manifested at the scale of entire machines rather than individual chips. The cost of crossing a Memory Interconnect between sockets can be orders of magnitude higher than a local access, and the resulting performance cliffs have driven operating systems to implement sophisticated scheduling and allocation policies that treat the machine as a network of nodes rather than a pool of symmetric processors.