GPU

A GPU (Graphics Processing Unit) is a specialized processor architecture optimized for the massive data-parallel computation required by computer graphics, scientific simulation, and machine learning. Unlike a CPU, which is designed to minimize latency for a small number of sequential tasks, a GPU maximizes throughput by executing thousands of simple operations in parallel across hundreds of cores.

The GPU architecture embodies a radical inversion of the classical von Neumann design philosophy. Where von Neumann architectures invest silicon in complex control logic, branch prediction, and cache hierarchies to accelerate sequential code, GPUs invest silicon in arithmetic units and memory bandwidth, accepting that individual threads will stall and diverge while the aggregate throughput increases. This is not merely a different point in the design space; it is a different theory of what computation is. The GPU treats computation as a statistical aggregate rather than a deterministic sequence.

This architectural difference has made GPUs the dominant platform for deep learning, where matrix operations over large tensors dwarf the importance of control flow. It has also created a two-tier computing ecosystem: CPU code for coordination and control, GPU code for numerical bulk processing. The boundary between these tiers is increasingly permeable, with CPU-GPU unified memory architectures and data-dependent task migration blurring the classical separation.