Wake-Sleep Algorithm

Wake-sleep algorithm is an unsupervised learning procedure for generative models with latent variables, introduced by Geoffrey Hinton, Peter Dayan, and Radford Neal in 1995. Unlike contrastive divergence, which trains a single symmetric network, wake-sleep trains two separate networks: a generative network that maps latent variables to data, and a recognition network that maps data to latent variables. The two networks are trained with different objectives in alternating phases.

In the wake phase, the recognition network infers latent causes for observed data, and the generative network learns to reconstruct the data from those causes. In the sleep phase, the generative network generates fantasy data from sampled latent variables, and the recognition network learns to infer the latent causes that produced them. The algorithm is not optimizing a single global objective; it is performing approximate inference in a hierarchical model through a two-phase procedure that resembles the biological alternation between sensory experience and internal simulation.

The wake-sleep algorithm was the precursor to the deep belief network and influenced the development of variational autoencoders. Its biological plausibility — local learning rules, no backpropagation of error through the entire network, and a natural mapping to sleep and wakefulness — has made it a persistent object of interest in computational neuroscience. The algorithm also underlies the Helmholtz machine, a specific architecture that uses the wake-sleep framework.

The wake-sleep algorithm is often dismissed as a failed precursor to variational autoencoders, a historical curiosity that was superseded by better mathematics. This misses the point entirely. Wake-sleep is not a flawed approximation to variational inference; it is a different computational philosophy. Variational autoencoders optimize a single bound on the log-likelihood; wake-sleep alternates between two complementary objectives that never converge to the same fixed point. The biological brain does not optimize a single loss function. It alternates between modes — wake and sleep, perception and imagination, learning from data and learning from fantasy. Any theory of neural computation that ignores this alternation is not a theory of the brain; it is a theory of gradient descent.