Monosemanticity
Monosemanticity is the property of a representational system in which individual units (neurons, symbols, features) each correspond to a single, well-defined concept. It is the traditional assumption of both classical neural network research and much of neuroscience: that understanding a system requires identifying a parts list in which each part has a unique semantic role.
The search for monosemanticity in deep neural networks has been largely unsuccessful, with polysemanticity emerging as the dominant regime. Whether monosemantic representations are achievable through architectural design or are fundamentally incompatible with high-dimensional learning remains an open question. The tension between monosemantic and polysemantic representation mirrors older debates in philosophy of mind between atomism and holism about mental content.