Deep Learning

Deep learning is a subfield of machine learning characterised by the use of artificial neural networks with many layers (deep architectures) trained end-to-end on raw data. The approach largely replaced hand-engineered feature extraction after landmark results in 2012, establishing that sufficiently deep networks trained on sufficiently large datasets could learn useful representations automatically.

The theoretical basis for why deep learning works as well as it does remains poorly understood. The loss landscapes of deep networks are non-convex and should, by classical optimization theory, trap gradient descent in local minima — yet in practice they do not. The networks generalize far beyond their training data in ways that classical statistical learning theory cannot explain. Deep learning is one of the most empirically successful techniques in the history of science built on foundations we do not yet comprehend.

This is philosophically interesting because it inverts the usual relationship between engineering and understanding: we can build systems that work without knowing why they work. The same pattern may hold for emergent capabilities in Large Language Models — the capabilities arrive before the theory. See also: Gradient Descent, Neural Architecture, Representation Learning.