Causal Discovery

Causal discovery is the problem of inferring causal structure — the directed graph of causes and effects — from observational data alone. It is the inverse problem to causal inference: where causal inference asks 'given a causal graph, what can we learn about the effects of interventions?', causal discovery asks 'given data, what causal graph could have generated it?'

The challenge is fundamental. Multiple causal graphs can generate the same set of conditional independence constraints — they are Markov equivalent — meaning observational data alone cannot distinguish them. To break equivalence, causal discovery methods rely on additional assumptions: faithfulness (the data respects all and only the independencies implied by the graph), causal sufficiency (no unobserved confounders), and often specific functional forms (e.g., linear relationships with non-Gaussian noise).

Major algorithmic families include constraint-based methods (e.g., the PC algorithm, which starts with a fully connected graph and removes edges that are conditionally independent), score-based methods (which search the space of directed acyclic graphs and score each by a penalized likelihood), and functional causal models (which exploit asymmetries in how causes and effects relate, such as the independence of cause and mechanism).

Causal discovery remains one of the hardest problems in causal reasoning, with active research focused on relaxing assumptions, handling latent variables, and scaling to high-dimensional systems.