Mutual information

Mutual information is a measure of information between two elements. In information theory, it is most often applied to pairs of discrete or continuous random variables but it can be generalized to pairs of vectors, sequences, and even processes. For two discrete random variables, the mutual information is defined as:


 * $$ I(X;Y) = \sum_{y \in Y} \sum_{x \in X}

p(x,y) \log{ \left( \frac{p(x,y)}{p_1(x)\,p_2(y)}                             \right) }, \,\! $$

where p(x,y) is the joint distribution of X and Y, and $$p_1(x)$$ and $$p_2(y)$$ are the marginal probability distribution functions of X and Y respectively.

In the case of a continuous 2-dimensional random vector with joint density $p$ and marginal densities $p_1$ and $p_2$, the summation shold be replaced by double integral:


 * $$ I(X;Y) = \int_Y \int_X

p(x,y) \log{ \left( \frac{p(x,y)}{p_1(x)\,p_2(y)}                             \right) } \; dx \,dy,

$$

The most general definition is given by


 * $$ I(X,Y) = D(P\Vert P_1 \times P_2 ) $$

where P denotes the joint distribution of $(X,Y)$ and $P_1$ and $P_2$ denote the marginal distribuions of $X$ and $Y$.

Conditional mutual information is defined as


 * $$ I(X;Y\mid Z) = E[D(P(\cdot \mid Z)\Vert P_1 (\cdot \mid Z) \times P_2 (\cdot \mid Z) ) $$

where $E$ denotes expectation with respect to $Z$.

Properties
Positivity $I(X;Y\mid Z)\geq 0$ with equality if and only if $X$ and $Y$ are conditionally independent given $Z$.

Symmetry $I(X;Y\mid Z)= I(Y;X\mid Z)$.

The Chain Rule $I(X;(Y,Z)\mid W) = I(X;Y\mid W) + I(X;z\mid (Y,W)).$

Entropy $I(X;X) = H(X)$.

Conditional entropy $I(X;X\mid Y) = H(X\mid Y)$.