Bayesian Network

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph. Each node in the graph corresponds to a variable; each directed edge represents a direct probabilistic influence; and each node is associated with a conditional probability table that quantifies the probability of the node's states given the states of its parents. The structure encodes the factorization of the joint probability distribution: the probability of the entire system is the product of the conditional probabilities of each node given its parents.

This factorization is powerful. A system with n binary variables has a joint distribution requiring 2^n - 1 independent probabilities to specify — intractable for all but the smallest systems. A Bayesian network with bounded in-degree reduces this to O(n * 2^k) parameters, where k is the maximum number of parents per node. The graph structure does not merely visualize relationships; it makes inference computationally feasible.

Inference and Learning

Inference in a Bayesian network means computing the posterior probability of some variables given evidence about others. If I observe symptoms, what is the probability of each disease? If I measure a node's value, how does that update my beliefs about its unobserved causes? These queries are answered by propagating evidence through the graph.

For trees and polytrees, exact inference can be performed efficiently using belief propagation or message-passing algorithms. For general graphs with cycles, exact inference is NP-hard in the worst case, and practitioners resort to approximate methods — variational inference, loopy belief propagation, or Markov chain Monte Carlo sampling. The computational cost of inference is determined by the graph's treewidth: the size of the largest clique formed when the graph is moralized and triangulated. High-treewidth networks are structurally intractable regardless of the algorithm.

Learning a Bayesian network from data involves two distinct problems: learning the graph structure and learning the conditional probability parameters. Parameter learning is straightforward given a fixed structure: maximum likelihood or Bayesian estimation of the conditional probability tables. Structure learning — determining which edges exist — is the harder problem. It requires searching the superexponential space of directed acyclic graphs and evaluating candidate structures against data. The causal discovery problem adds a further layer: not merely finding edges that model correlation well, but finding edges that represent genuine causal direction.

From Probability to Causality

A Bayesian network is a model of probability, not causality. Its directed edges need not represent causal influence; they may merely encode conditional independence structure in a convenient factorization. But when the edges do represent causal mechanisms, the Bayesian network becomes a causal graph — the foundation of the do-calculus and modern causal inference.

The distinction is subtle but consequential. In a purely probabilistic Bayesian network, observing that a variable takes a value updates beliefs about its parents (diagnostic reasoning) and its children (predictive reasoning) symmetrically. In a causal Bayesian network, intervening to set a variable — the do-operator — breaks the incoming edges and propagates the effect forward only. The asymmetry between observation and intervention is what separates correlation from causation, and Bayesian networks equipped with causal semantics make this asymmetry precise.

The practical difficulty: causal semantics cannot be read off from observational data alone. Two Bayesian networks with different edge directions can encode the same set of conditional independencies — they are Markov equivalent — and no amount of observational data can distinguish them. Causal direction requires either experimental intervention, background causal knowledge, or strong assumptions about the functional form of relationships.

Bayesian Networks in Complex Systems

In complex systems, Bayesian networks face a fundamental tension. The networks assume a fixed structure and fixed parameters, but complex systems are characterized by feedback, adaptation, and structural change. A Bayesian network of a financial market assumes a stable causal graph; the reality is that the graph restructures in response to the inferences agents draw from it. A Bayesian network of an ecosystem assumes fixed conditional probabilities; the reality is that species interactions evolve.

This does not make Bayesian networks useless in complex systems. It makes them local approximations — valid for short timescales and bounded subsystems, but not for the long-run dynamics of the whole. The Markov blanket of a node — the minimal set of nodes that renders it conditionally independent of the rest of the network — is a powerful tool for carving local models out of global complexity. But the blanket itself moves as the system restructures.

The Bayesian network is the best formalism we have for representing uncertainty in structured systems, and its limitations are the limitations of the probabilistic epistemology itself. It can model what we believe given what we observe. It cannot model how the world restructures itself in response to our beliefs — and that is the boundary where probabilistic reasoning ends and the dynamics of complex systems begin.