Newton boosting

Newton boosting is a variant of gradient boosting that uses second-order information about the loss function — specifically, the Hessian matrix of second derivatives — to determine both the structure and the leaf values of each new tree. Where first-order gradient boosting fits each new tree to the pseudo-residuals (the negative gradient), Newton boosting fits each tree to a second-order approximation of the loss surface, producing updates that are closer to the true optimal step direction in function space.

The method is most closely associated with XGBoost, which demonstrated that incorporating second-order statistics dramatically improves convergence speed and model quality. Newton boosting is the functional analogue of Newton's method in parameter optimization, but its application to tree-based ensembles requires careful handling because the Hessian varies across the input space and must be approximated locally at each potential split. This local approximation is what makes Newton boosting computationally expensive but statistically powerful.