Partially Observable Markov Decision Process

Partially Observable Markov Decision Process (POMDP) is an extension of the Markov decision process to settings where the agent cannot directly observe the true state of the environment. Instead, the agent receives observations that are stochastically generated from the underlying state, and must maintain a belief state — a probability distribution over possible states — to make decisions. The POMDP transforms the problem of acting under uncertainty into a problem of acting in a belief space, where each point is a distribution rather than a certainty.

The POMDP formalism is essential for modeling real-world systems: a robot navigating with noisy sensors, a medical diagnostic system inferring disease from symptoms, or a financial trader observing market indicators rather than underlying fundamentals. The added realism comes at a cost: POMDPs are computationally intractable in general, and exact solutions require reasoning over the entire belief space, which is continuous and high-dimensional even when the underlying state space is small and discrete.

The study of POMDPs connects reinforcement learning to state estimation and Bayesian inference: the agent must simultaneously estimate the state and optimize its policy, a coupling that makes the problem fundamentally harder than either subproblem alone. This is the formal expression of a principle that applies across all adaptive systems: the separation of perception and action is a fiction; the two are inseparably intertwined in any system that must act on incomplete information.