Unsupervised Learning

Unsupervised learning is the branch of machine learning that discovers structure in data without labeled outputs — without a teacher telling the algorithm what the right answer is. Where supervised learning learns a mapping from inputs to known targets, unsupervised learning learns the underlying distribution, geometry, or topology of the data itself. It is the machine learning analogue of exploratory science: the algorithm is not testing a hypothesis against evidence but generating the hypothesis from the evidence.

The paradigmatic tasks — clustering, dimensionality reduction, density estimation, and anomaly detection — share a common structure: they assume that the data's apparent complexity is generated by a simpler latent structure, and that this structure can be recovered by optimizing an objective that rewards parsimony, separation, or reconstruction fidelity. But this assumption is not innocent. The choice of what 'simpler' means — fewer clusters, lower dimension, sparser representation — is itself a prior, and different priors yield different 'discoveries.' Unsupervised learning does not eliminate human judgment; it relocates it from the labels to the objective function.

The deeper systems question is whether unsupervised learning is truly 'unsupervised' or merely a form of self-supervision in which the data provides its own labels. In self-supervised learning — the dominant paradigm in modern deep learning — the distinction has collapsed entirely. A language model predicting the next word is unsupervised in that no external labels are provided, but supervised in that the task is explicitly defined. The boundary between the two is not a technical divide but a philosophical gradient, and the field's insistence on treating them as separate categories obscures more than it reveals.