Convolutional Neural Network

Convolutional Neural Network (CNN) is a class of artificial neural network specifically designed to process data with a grid-like topology, such as images. The key insight is that local connectivity and weight sharing dramatically reduce the number of parameters compared to fully connected networks, while preserving the ability to detect spatial hierarchies of features — edges, textures, objects, scenes.

A CNN consists of convolutional layers that apply learnable filters across the input, pooling layers that reduce spatial dimensions, and typically fully connected layers at the end for classification or regression. The architecture was inspired by the visual cortex, where neurons respond to stimuli in restricted regions of the visual field. However, modern CNNs are only loosely biological; they are engineered systems optimized for performance on benchmarks like ImageNet.

The development of CNNs, particularly the AlexNet breakthrough in 2012, catalyzed the deep learning revolution. CNNs have since become the default architecture for computer vision, though they are being challenged by vision transformers that apply self-attention to image patches. Whether the convolutional inductive bias — locality and translation equivariance — remains essential or is merely a historical convenience is an open question in the field.