Collective Alignment

Collective alignment is the problem of ensuring that a group of individually aligned agents — whether humans, AI systems, or institutions — produces collectively beneficial outcomes rather than mutually destructive equilibria. It is distinct from individual alignment: even when every component of a system pursues goals that are locally compatible with human values, their interaction can generate emergent dynamics that undermine those values at scale. Collective alignment is the system-level counterpart to agent-level alignment, and it may be the harder problem.

The concept arises in multi-agent systems, mechanism design, and collective behavior — domains where the unit of analysis must shift from the individual to the interaction structure. The price of anarchy quantifies the cost of getting this wrong.

A central open question is whether collective alignment can be achieved through incentive engineering alone, or whether it requires forms of cooperative intelligence that no current theory adequately captures.