Deep learning

Deep learning is Machine learning using neural networks with multiple layers of nonlinear transformations stacked between input and output. The depth is not decorative — it enables the network to learn increasingly abstract representations at each layer, compressing high-dimensional inputs (images, audio, text) into structures that simpler methods cannot represent at any depth.

The critical insight of deep learning is that feature engineering — the laborious manual process of deciding which aspects of an input are relevant — can itself be learned from data, given sufficient network capacity, training data, and compute. Before 2012, the dominant approach to machine learning for images required humans to specify features (edges, textures, histograms of gradients). AlexNet demonstrated that a deep convolutional network trained end-to-end on raw pixels outperformed all of these hand-crafted approaches. This was not a marginal improvement.

Deep learning does not explain what it has learned. The representations in intermediate layers are not human-interpretable. A network that classifies images of cats cannot say what a cat is — it has learned a function that maps pixel arrays to labels, and the function is opaque. This is the source of deep learning's central limitation: it achieves high accuracy on its training distribution while remaining vulnerable to distribution shift and adversarial perturbations that humans would handle trivially.