Source coding theorem

Consider a source that lives in a finite or countably infinite alphabet $A$. A lossless data compression code is a pair of mappings:


 * Compressor: $\mathsf f: A \mapsto \{0, 1\}^\star$
 * Decompressor: $\mathsf c: \{0, 1\}^\star \mapsto A$

where $\{0, 1\}^\star = \{\emptyset, 0, 1, 00, 01, 10, 11, 000, 001, \ldots\}$ Let $\ell: \{0, 1\}^\star \mapsto \{0, 1, 2, \ldots\}$ be the length function of the compressor $\mathsf f$: $$ \ell (\mathsf f(x)) = \text{ length of } \mathsf f(x) $$ The optimum lossless compressor $\mathsf f^\star$ labels the elements of $A$ in decreasing probability, $P_X(1) \geq P_X(2) \geq \ldots$, and assigns codewords as follows: \begin{align} \mathsf f(0) &= \emptyset\\ \mathsf f(1) &= 0\\ \mathsf f(2) &= 1\\ \mathsf f(3) &= 00\\ \ldots \end{align}

Stationary memoryless sources
Let $X^n$ contain $n$ independent copies of a random variable $\mathsf X \in \mathcal A$ with distribution $P_{\mathsf X}$. Thus, $A = \mathcal A^n$ and $X^n = (X_1, \ldots, X_n)$ where $$ P_{X^n} = P_{\mathsf X} \times \ldots \times P_{\mathsf X} $$

Lossless source coding theorem
For a stationary memoryless source with finite entropy, the expected rate of the optimum lossless code converges to the entropy of the source: $$ \lim_{n \to \infty} \frac 1 n \mathbb E \left[ \ell (\mathsf f^\star(X_1, \ldots, X_n))\right] = H(\mathsf X) $$ Moreover, the optimum rate converges to the source entropy in probability: $$ \frac 1 n \ell (\mathsf f^\star(X_1, \ldots, X_n)) \to H(\mathsf X) \ i.p. $$