Information Bottleneck

The information bottleneck is a principle from information theory that frames learning as an optimal tradeoff between compression and prediction: a good representation is one that squeezes out irrelevant information from the input while preserving everything relevant to the target. Formulated by Tishby, Pereira, and Bialek in 1999, the principle posits that deep neural networks learn by progressively compressing input data through successive layers, subject to the constraint that predictive information about the output is not lost. The tradeoff is controlled by a single parameter β, and the resulting representations lie on a curve that characterizes the fundamental limits of learning for a given task.

The information bottleneck has been invoked to explain why neural networks generalize, but this explanation is incomplete: compression without a theory of what is being compressed and why is merely a description of dynamics, not a reason for their success.