Dimensionality Reduction

Dimensionality reduction is the task of transforming high-dimensional data into a lower-dimensional representation that preserves structure relevant to a downstream task. Rather than discarding dimensions arbitrarily, effective reduction discovers the intrinsic geometry of the data — the manifold or subspace on which the data actually lives.

Classical methods include principal component analysis (PCA), which finds linear subspaces of maximum variance. Modern nonlinear methods — t-SNE, UMAP, Isomap — attempt to preserve local neighborhood structure, revealing clusters and manifolds that linear methods miss.

The choice of reduction method encodes an assumption about what "structure" means. The reduction is only as good as the structural assumption it embeds. For scientific applications, feature extraction and dimensionality reduction are often inseparable: the reduced dimensions themselves become the objects of study.