UMAP

Uniform Manifold Approximation and Projection (UMAP) is a dimensionality reduction technique introduced by McInnes, Healy, and Melville in 2018. Like t-SNE, UMAP preserves local neighborhood structure for visualization. Unlike t-SNE, it also preserves more global structure — the relationships between clusters — and scales to much larger datasets.

UMAP is founded on a mathematical framework combining manifold learning with topological data analysis. It represents data as a fuzzy topological structure — a nearest neighbor graph with weighted edges — and then finds a low-dimensional embedding with a similar fuzzy topological representation. The result is typically faster than t-SNE, more reproducible across runs, and better at preserving the large-scale geometry of the data.

The choice between UMAP and t-SNE is not merely technical. It encodes different assumptions about what structure in high-dimensional data deserves to be preserved. UMAP assumes that the global shape matters; t-SNE assumes that only local neighborhoods matter. The right choice depends on whether the question you are asking is about clusters or about the space between them.