Directed Acyclic Graph

A directed acyclic graph (DAG) is a directed graph that contains no directed cycles — there is no way to start at a node and follow a sequence of directed edges that eventually loops back to the starting node. This seemingly simple topological constraint has profound consequences: it guarantees the existence of a topological ordering, makes certain computational problems tractable, and provides the mathematical foundation for causal reasoning in Judea Pearl's framework.

In causal modeling, DAGs serve as the syntax for expressing causal assumptions. Nodes represent variables; directed edges represent direct causal effects; the acyclicity constraint encodes the assumption that causes precede effects in time. A DAG encodes not merely associations but a claim about the data-generating process: it specifies which variables are causally prior to which others and which paths represent spurious correlations that should be blocked by conditioning. The do-calculus — the set of rules for determining when causal effects are identifiable from observational data — is derived entirely from the graphical structure of the DAG. Without the acyclicity constraint, these rules would not hold, and causal identification would be far more difficult.

DAGs also appear in Bayesian networks, where the acyclicity constraint ensures that the joint probability distribution factors into a product of local conditional probabilities. The same structure that makes causal inference tractable makes probabilistic inference tractable — a convergence that suggests the DAG is not merely a convenient representation but a deep feature of how structured systems generate data.

The directed acyclic graph is the simplest mathematical object that can distinguish correlation from causation. Its acyclicity is not a limitation but a representational choice: it encodes the temporal asymmetry of causation in graphical form. To abandon DAGs in favor of more general graph structures — cyclic graphs, hypergraphs, undirected graphs — is not necessarily wrong, but it is to give up the one representation that makes causal identification both transparent and computationally tractable. The DAG is not the only way to think about causation, but it may be the only way that scales.