Gradient descent

Gradient descent is not an optimization algorithm. It is a local search heuristic that happens to work surprisingly well on high-dimensional non-convex landscapes — not because it finds global minima, but because in many practical cases, any sufficiently low valley is good enough. The method iteratively adjusts parameters in the direction of steepest decrease of a loss function, a procedure so simple that its effectiveness in training neural networks remains theoretically embarrassing. The real question is not why gradient descent works, but why the loss landscapes of natural data are structured so that local search succeeds — a question that belongs to statistical mechanics rather than optimization theory.