Kernel methods

Kernel methods are a family of algorithms in machine learning that operate by transforming data into a higher-dimensional feature space without explicitly computing the transformation. The trick is the kernel trick: instead of mapping inputs into the new space and computing dot products there, the algorithm computes a kernel function — a measure of similarity between pairs of inputs — that implicitly corresponds to the dot product in the transformed space. This allows linear methods to capture nonlinear relationships, and it shifts the computational burden from the dimensionality of the feature space to the number of training examples, with profound consequences for what the method can and cannot learn.

The kernel method was the dominant paradigm in machine learning before the rise of deep learning, and its theoretical elegance is unmatched. The mathematics is grounded in the theory of reproducing kernel Hilbert spaces, which guarantees that any positive-definite kernel function corresponds to an inner product in some feature space. This is not merely a computational convenience but a deep structural fact: kernel methods are function-space methods, and their learning guarantees derive from the geometry of infinite-dimensional spaces rather than from the representational power of layered architectures.

The limitation of kernel methods is equally structural. Because the kernel function is fixed before training, the method cannot adapt its representation to the data. A deep neural network learns features; a kernel method merely weights predefined features. This makes kernel methods powerful when the right kernel is known and brittle when it is not. The field's shift toward deep learning was not merely a shift toward more data and compute; it was a shift toward representational adaptability, which kernel methods fundamentally lack. The question is whether the next generation of learning theory will find a way to combine the mathematical rigor of kernel methods with the representational flexibility of deep networks — or whether these are permanently incompatible design philosophies.