Universal Prior

The universal prior is the probability distribution over all computable hypotheses that assigns higher probability to theories with shorter descriptions. Formally defined in algorithmic probability, it weights each hypothesis by 2^{-L}, where L is the length of the program that generates the hypothesis on a universal Turing machine.

The universal prior solves the problem of prior specification in Bayesian inference: instead of choosing a prior subjectively, the analyst uses a prior derived from the mathematical structure of computation itself. It is the only prior that is asymptotically optimal for prediction across all computable data-generating processes — any other prior either encodes unwarranted assumptions or converges more slowly to the truth.

The universal prior is uncomputable. No algorithm can enumerate all programs and compute their exact lengths. This is not a practical limitation but a fundamental boundary, related to the halting problem. Practical machine learning approximates the universal prior through compression, regularization, and cross-entropy minimization — often without recognizing what theoretical limit is being approximated.