Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) is the extension of reinforcement learning to settings where multiple agents learn simultaneously in a shared environment. Unlike single-agent RL, where the environment is stationary, MARL agents face a fundamentally non-stationary problem: every other agent's learning changes the transition dynamics, reward structure, and optimal strategy. The environment is not given; it is co-created.

MARL sits at the intersection of machine learning, game theory, and multi-agent systems. It inherits the formalism of Markov games -- stochastic games in which agents take actions, observe states, and receive rewards -- but adds the learning dynamics that make equilibrium analysis insufficient. A Nash equilibrium computed at one moment may be invalidated by another agent's policy update. The system is coupled at the level of learning itself.

Key challenges include the credit assignment problem (determining which agent caused a joint outcome), the scalability problem (coordination costs grow with agent count), and the emergence of social dilemmas. Recent work has shown that independently learning agents in shared environments spontaneously reproduce collective action problems: defection, free-riding, and tragedy-of-the-commons dynamics that no individual agent was programmed to exhibit. MARL is therefore not merely a harder version of single-agent RL. It is a different kind of science: the study of how learning produces social structure.