Functional gradient descent

Functional gradient descent is the extension of gradient descent from finite-dimensional parameter spaces to infinite-dimensional function spaces. In classical gradient descent, one updates a vector of parameters in the direction of the negative gradient of a loss function; in functional gradient descent, one updates an entire function by adding a new function that approximates the negative functional derivative of the loss with respect to the current prediction function. This generalization is the mathematical foundation of gradient boosting and other additive modeling techniques, where each iteration constructs a step in function space rather than parameter space.

The concept draws on the calculus of variations and the theory of reproducing kernel Hilbert spaces, where the gradient of a functional can be represented as a function in the same space. Functional gradient descent reveals that many machine learning algorithms are not optimizing parameters at all — they are navigating an infinite-dimensional landscape of possible predictors, one weak approximation at a time.