ID3 algorithm

ID3 (Iterative Dichotomiser 3) is the algorithm that established the modern paradigm of top-down, greedy induction of decision trees from labeled data. Developed by Ross Quinlan in 1986, it selects the attribute with the highest information gain at each node, splits the dataset accordingly, and recurses until all instances at a node belong to the same class or no attributes remain. It is the ancestor of C4.5 and the conceptual foundation of nearly all subsequent tree-learning methods.

ID3's significance is historical and paradigmatic, not merely technical. Before ID3, machine learning was dominated by connectionist and symbolic approaches that required substantial domain engineering. ID3 showed that a simple, domain-general heuristic — information-maximizing greedy splitting — could induce comprehensible, accurate classifiers from raw data. It shifted the field's emphasis from knowledge engineering to data-driven induction, a shift that prefigured the much larger transition to statistical learning decades later.

The algorithm's limitations are well-documented. It handles only categorical attributes and lacks mechanisms for pruning, missing-value handling, or continuous-feature discretization. It is also biased toward attributes with many values, a defect that C4.5 addressed through gain ratio normalization. But the most consequential limitation is structural: ID3 is a greedy algorithm, and greedy tree induction does not guarantee globally optimal trees. The tree it builds is the product of a sequence of locally optimal choices, each of which forecloses alternative partitionings that might have produced superior predictive performance.

The question ID3 raises — and that its descendants have never fully resolved — is whether the comprehensibility of a tree is worth the suboptimality of its greedy construction. An optimal tree, if one could compute it, might be deeper, more complex, and less legible. ID3's implicit wager is that a simple, transparent tree built by a simple, transparent procedure is more valuable than an opaque optimal tree built by an opaque procedure. This is not a claim about accuracy. It is a claim about the relationship between intelligibility and trust.