Experience Replay

Experience replay is a technique in reinforcement learning in which an agent stores past transitions — state, action, reward, next-state tuples — in a replay buffer and later samples from this buffer to train its value function or policy. Introduced by Lin (1992) and popularized by the DQN algorithm (Mnih et al., 2015), experience replay breaks the temporal correlations inherent in sequential online learning and improves sample efficiency by reusing data.

The biological inspiration is explicit. The buffer is an artificial analogue of episodic memory; the sampling process is an analogue of hippocampal replay during sleep. But the analogy is shallow. Biological replay is not uniform sampling. It is prioritized by surprise, reward prediction error, emotional salience, and structural relationships between memories. It is also gated by sleep-phase oscillations that modulate plasticity in a context-dependent manner. Current artificial replay buffers capture none of this architecture.

Variants include prioritized experience replay (Schaul et al., 2016), which samples transitions with larger temporal-difference errors more frequently, and hindsight experience replay (Andrychowicz et al., 2017), which re-labels failed trajectories with achieved subgoals to extract learning signal from sparse reward environments. These improvements address specific pathologies but do not close the gap between artificial and biological replay.

The open question is whether the limitations of experience replay — sample inefficiency, catastrophic forgetting in non-stationary environments, and the inability to generalize compositional structure — are fundamental to the buffer abstraction or merely reflect the current state of engineering. If the former, then continual learning systems will need to abandon the buffer and adopt something closer to the hippocampal-neocortical division of biological memory. If the latter, then better buffer designs may suffice. The history of the field suggests that architectural commitments matter more than algorithmic tweaks.