Decision Tree

A Decision Tree is a supervised learning model that partitions the input space into a hierarchy of regions, making predictions by following a path from the root to a leaf node. Each internal node tests a feature against a threshold; each branch represents the outcome; each leaf stores a prediction. The tree is built recursively by selecting, at each node, the feature and split point that most reduce the impurity of the resulting subsets — typically measured by Gini impurity or information gain.

The decision tree is the simplest nonlinear model: it requires no parametric assumptions, handles mixed data types naturally, and produces human-readable rules. But its simplicity is also its weakness. A deep tree can memorize any training set, making it a high-variance, low-bias model that generalizes poorly. This vulnerability is the motivation for random forests and bagging, which replace a single deep tree with an ensemble of them. The decision tree is not a solution to prediction problems; it is a component that only works when organized into a collective.