# Graphical Models

We first would like to draw your attention to the fact that this topic has nothing to do with computer graphics and graffiti. Also the order of the two words are important.

A graphical model is a probabilistic model for which a graph denotes the conditional independence structure between random variables. They are commonly used in probability theory, statistics, coding theory and machine learning.

These models offer several useful properties [1]:

1. They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models.
2. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph.
3. Complex computations, required to perform inference and learning in sophisticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly.

A graph comprises nodes (also called vertices) connected by links (also known as edges or arcs). In a probabilistic graphical model, each node represents a random variable (or group of random variables), and the links express probabilistic relationships between these variables. The graph then captures the way in which the joint distribution over all of the random variables can be decomposed into a product of factors each depending only on a subset of the variables. We shall begin by discussing Bayesian networks, also known as directed graphical models, in which the links of the graphs have a particular directionality indicated by arrows. The other major class of graphical models are Markov random fields, also known as undirected graphical models, in which the links do not carry arrows and have no directional significance. Directed graphs are useful for expressing causal relationships between random variables, whereas undirected graphs are better suited to expressing soft constraints between random variables. For the purposes of solving inference problems, it is often convenient to convert both directed and undirected graphs into a different representation called a factor graph.

Indeed, one of the powerful aspects of graphical models is that a specific graph can make probabilistic statements for a broad class of distributions. By application of the product rule of probability, we can write the joint distribution in the form

A directed graphical model representing the joint probability distribution over three variables $a$, $b$, and $c$.

$$\Pr(a,b,c)=\Pr(a)\Pr(b|a)\Pr(c|a,b)$$

We now represent the right-hand side in terms of a simple graphical model as follows. First, we introduce a node for each of the random variables $a$, $b$, and $c$ and associate each node with the corresponding conditional distribution on the right-hand side. Then, for each conditional distribution we add directed links (arrows) to the graph from the nodes corresponding to the variables on which the distribution is conditioned.

By repeated application of the product rule of probability, this joint distribution can be written as a product of conditional distributions, one for each of the variables $$\Pr(x_1, x_2, \dots, x_n) = \Pr(x_1) \Pr(x_2|x_1) \dots \Pr(x_n|x_1,x_2,\dots,x_{n-1}).$$

we can again represent this as a directed graph having $n$ nodes, one for each conditional distribution on the right-hand side of the above expression with each node having incoming links from all lower numbered nodes. We say that this graph is fully connected because there is a link between every pair of nodes. If there is a link going from a node $a$ to a node $b$, then we say that node $a$ is the parent of node $b$, and we say that node $b$ is the child of node $a$.

So far, we have worked with completely general joint distributions, so that the decompositions, and their representations as fully connected graphs, will be applicable to any choice of distribution. As we will see shortly, it is the absence of links in the graph that conveys interesting information about the properties of the class of distributions that the graph represents.

We can now state in general terms the relationship between a given directed graph and the corresponding distribution over the variables. The joint distribution defined by a graph is given by the product, over all of the nodes of the graph, of a conditional distribution for each node conditioned on the variables corresponding to the parents of that node in the graph. Thus, for a graph with $n$ nodes, the joint distribution is given by

$$\Pr(X) = \prod_{k=1}^n \Pr(x_k|\pi_{x_k}),$$

where $\pi_{x_k}$ denotes the set of parents of $x_k$ and $X=(x_1, x_2, \dots, x_n).$ A simple example with four variables $x_1,x_2,x_3,x_4$ is shown below.

Example of a directed acyclic graph describing the joint distribution over variables $x_1,x_2,x_3,x_4$.
where the joint distribution is given by

$$\Pr(x_1)\Pr(x_2)\Pr(x_3)\Pr(x_4|x_1,x_2,x_3).$$

## References

1. Bishop, Christopher M., "Pattern Recognition and Machine Learning," Springer, 2006.