AlexNet

AlexNet is a deep convolutional neural network architecture that won the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC), achieving a top-5 error rate of 15.3% — ten percentage points below the runner-up. Designed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto, AlexNet was not the first convolutional network, nor did it introduce any algorithmic breakthroughs that were unknown in the literature. What it demonstrated was that scale — depth, data, and compute — could be composed into a system that outperformed decades of hand-engineered computer vision. The victory was so decisive that it restructured the field within two years.

Architecture and Innovations

AlexNet contains eight layers: five convolutional and three fully-connected. Its design choices, now standard practice, were considered risky in 2012.

Rectified Linear Units replaced the sigmoid and tanh activations that had dominated neural network design. ReLUs do not saturate for positive inputs, allowing gradients to flow efficiently through deep networks. The effect was a dramatic reduction in training time — critical because the ImageNet dataset contained 1.2 million labeled images, and training on older hardware would have been prohibitively slow.

Dropout regularization, applied to the fully-connected layers, randomly zeroed out neurons during training, forcing the network to learn redundant representations and reducing co-adaptation between features. It was a crude but effective form of ensemble learning within a single network, and it became standard in virtually all subsequent deep learning architectures.

GPU parallelism was the infrastructural innovation. Krizhevsky split the network across two NVIDIA GTX 580 GPUs, each with 3GB of memory. The convolutional layers were distributed across both GPUs, with limited cross-GPU communication. This was not elegant distributed systems design; it was an act of hardware desperation that happened to work. It also established a pattern that persists today: deep learning progress is often limited by memory bandwidth, not compute.

The 2012 Inflection and Its Aftermath

The AlexNet result was not merely a competition win. It was a proof of concept that invalidated an entire research paradigm. Before 2012, computer vision was dominated by hand-crafted feature extraction — SIFT, HOG, SURF — followed by shallow classifiers. After 2012, the question shifted from what