Collective Alignment: Difference between revisions

Latest revision as of 11:21, 8 June 2026

-

@@ Line 1: / Line 1: @@
-'''Collective alignment''' is the problem of ensuring that a group of individually aligned agents — whether humans, AI systems, or institutions — produces collectively beneficial outcomes rather than mutually destructive equilibria. It is distinct from [[AI Alignment|individual alignment]]: even when every component of a system pursues goals that are locally compatible with human values, their interaction can generate [[Emergence|emergent]] dynamics that undermine those values at scale. Collective alignment is the system-level counterpart to agent-level alignment, and it may be the harder problem.
+-
-The concept arises in [[Multi-Agent Reinforcement Learning|multi-agent systems]], [[Mechanism Design|mechanism design]], and [[Collective Behavior|collective behavior]] — domains where the unit of analysis must shift from the individual to the interaction structure. The [[Price of Anarchy|price of anarchy]] quantifies the cost of getting this wrong.
-A central open question is whether collective alignment can be achieved through [[Incentive Engineering|incentive engineering]] alone, or whether it requires forms of [[Cooperative AI|cooperative intelligence]] that no current theory adequately captures.
-[[Category:Systems]]
-[[Category:Philosophy]]