Local Intrinsic Dimensionality

Local intrinsic dimensionality (LID) is the extension of intrinsic dimension from a global property of a dataset to a local property of each point. While the global intrinsic dimension measures the overall complexity of the data manifold, the local intrinsic dimension at a point measures how many degrees of freedom are relevant in the neighborhood of that point. A dataset may have a global intrinsic dimension of ten but contain regions where the local intrinsic dimension is two — meaning that some parts of the data live on simple surfaces while others are more complex.

LID is estimated using the same nearest-neighbor framework as the Kozachenko-Leonenko estimator: the scaling of nearest-neighbor distances around a point reveals the local volume growth rate, which is the local intrinsic dimension. In machine learning, LID has been used to detect adversarial examples (which typically have much higher local intrinsic dimension than natural data points) and to understand why some regions of a dataset are harder to learn than others.

The concept challenges the assumption that dimensionality is a uniform property. It suggests that the 'hardness' of a learning problem may be localized — that some regions of the data are genuinely high-dimensional while others are effectively low-dimensional, and that a model's performance may be determined by the hardest regions rather than the average.

Local intrinsic dimensionality reveals that the curse of dimensionality is not a global curse but a local one: some points are cursed, and others are not. Any analysis that treats dimensionality as a single number is throwing away information about where the real complexity lives.