Actor-Critic

The Actor-Critic architecture is a hybrid approach in reinforcement learning that combines the strengths of value-based and policy-based methods. The actor is a policy network that selects actions; the critic is a value network that evaluates those actions, providing a learned baseline that reduces the variance of policy gradient estimates. This division of labor mirrors biological organization — the basal ganglia appear to implement actor-like action selection while the prefrontal cortex provides critic-like evaluation — though whether this parallel is mechanistically substantive or merely metaphorical remains contested. Actor-critic methods have become the dominant paradigm in applied reinforcement learning, powering systems from robotic control to large language model alignment. The architecture's elegance conceals a subtle instability: the critic must be accurate enough to guide the actor, but the actor's changing policy constantly shifts the distribution of states the critic must evaluate, creating a moving-target problem that training algorithms must carefully manage.