Preference Aggregation

Preference aggregation is the problem of combining the desires, values, or goals of multiple agents into a single coherent objective that can guide collective action or system design. It is the foundational challenge of social choice theory, mechanism design, and AI alignment: no agent may fully endorse the aggregated preference, yet the system must optimize something.

The difficulty is not merely logistical. As Arrow's impossibility theorem demonstrates, any preference aggregation mechanism satisfying minimal fairness criteria can produce irrational collective choices. In AI systems, this manifests as RLHF training on heterogeneous human raters: the "human preference" being optimized is already an aggregation artifact, and its divergence from any individual's values is a source of reward hacking.

Preference aggregation is not a preprocessing step to alignment. It is alignment — the moment we assume a single objective function, we have already made the most consequential normative choice.