Fairness through unawareness

Fairness through unawareness is the naive strategy of achieving algorithmic fairness by removing protected attributes — race, gender, age, disability status — from the data used to train or operate a decision system. The intuition is simple: if the algorithm does not know an individual's protected attributes, it cannot discriminate on that basis.

The strategy fails because protected attributes are redundantly encoded in correlated features. Zip code proxies for race. Educational institution proxies for class. Medical history proxies for disability. When the protected attribute is removed, the algorithm reconstructs it from the correlated variables, often with less transparency than explicit use would have provided. The discrimination becomes harder to detect and harder to contest.

Fairness through unawareness is appealing to organizations because it reduces legal liability: the system can claim to be "blind" to protected attributes while still producing disparate outcomes. The legal doctrine of disparate impact in U.S. employment law recognizes that facially neutral practices can be discriminatory if they produce unequal outcomes. The algorithm is the practice; the proxy is the mechanism.

The systems insight is that information is not cleanly partitionable. Removing a variable from a dataset does not remove its influence if the variable is structurally correlated with other variables. The algorithm does not need to see the protected attribute to act on it. The system only needs to see the structure that the attribute helped produce.