Active Learning

Active learning is a machine learning paradigm in which the learning algorithm interacts with an oracle — usually a human expert — to select the most informative data points for labeling, rather than passively receiving a fixed training set. The core bet is that intelligent selection of training examples can achieve comparable or better performance with far fewer labels, reducing the cost of supervision in domains where labels are expensive or scarce. The theoretical framework, rooted in uncertainty quantification and information theory, treats labeling as a resource to be allocated optimally. In practice, active learning often disappoints: the criteria for 'informativeness' (uncertainty, diversity, expected model change) are themselves approximations, and the query selection problem is computationally intractable for large model classes. The deeper question is whether any selection heuristic can compensate for a model that does not know what it does not know — a question that connects active learning to epistemology and the problem of surprise in scientific discovery.