Source coding with side information

$\newcommand{\sets}[1]{\{#1\}}$

Introduction
Source coding with side information is a topic in source coding. It concerns the communication of information from a sender to a receiver who has related data. Here are three scenarios where it can arise:
 * A video camera transmits an image to a base that has the previous frame.
 * A user downloads the new version of an existing file.
 * A distributed database wishes to synchronize similar files stored in different locations.

The formal definition is equally simple. A random-variable pair $(X,Y)$ is distributed according to a known probability distribution $p(x,y)$. A sender knows $X$ and wants to communicate its value to a receiver who knows $Y$. The typical questions asked are how many bits must be transmitted, and how can this communication be performed efficiently.

Many variations of the problem have been considered, depending on whether the receiver needs to determine $X$ exactly, with high probability, or just approximately.

Diminishing-error
This Shannon-inspired variety is by far the most extensively studied version of the problem. The assumption here is that $(X,Y)$ is a sequence of i.i.d. random-variable pairs, and that the receiver needs to determine the sequence $X$ with arbitrary low error probability. The well-known Slepian-Wolf theorem shows that the number of bits required is essentially the same as when the sender knows the receiver's information in advance.

Exact calculation
Here, the receiver would like to determine $X$ without any error.

Example (temperatures): Informal description: The temperatures in New York and New Jersey may vary significantly, but at any given time they differ from each other by at most $d$ degrees. How many bits does a sender who knows the temperature in New York need to transmit to communicate it to a receiver who knows the temperature in New Jersey?

Formal description: $\cal X=\cal Y=\ZZ$, and for $d\ge 0$, \[ S=\{(x,y):|x-y|\le d\}. \] If the sender knew the receiver's information in advance, he or she could simply communicate the difference $X-Y$. Since $-d\le X-Y\le d$, \[ \hat L'=\lceil\log(2d+1)\rceil. \] It is easy to see that this number of bits suffices even when the sender does not know the receiver's information. The sender sends $X\bmod(2d+1)$. Since all $X$'s in the range $Y-d,\ldots,Y+d$ have different residue mod $2d+1$, the receiver can determine $X$. Hence, \[ \hat L = \lceil\log(2d+1)\rceil = \hat L'. \]

Example (Cyclic shifts, due to Tom Cover, add citation): $X,Y\in\{0,1\}^n$ are cyclic shifts of each other.

If the sender knew $Y$ in advance, he or she could simply transmit the number of times $Y$ needs to be cyclically shifted to the right to derive $X$. For example, if $n=6$, $X=101100$ and $Y=110010$, the sender would transmit 2. Hence, \[ \hat L' = \ceil{\log n}. \]

The same number of bits suffices also when the sender does not know $Y$. He or she can consider all cyclic shifts of $X$, find the lexicographically largest, and transmit the number of times it needs to be right-shifted to obtain $X$. Since the largest cyclic shift of $X$ is also the largest for $Y$, the receiver can follow the same operations to determine $X$. For example, if $X=01011$ and $Y=01101$, the receiver finds that the largest cyclic shift of $X$ is 11010 and sends 3, the number of right-shifts needed to derive $X$. The receiver can cyclically shift $Y$ to derive 11010, and therefore deduce $X$.

This example can be easily generalized.

Lemma: For any equivalence relation $\sim$ on $\cal X=\cal Y$, let \[ S=\{(x,y):x\sim y\}. \] Then letting $A$ be the size of the largest equivalence class of $\sim$, \[ L=L'=\lceil\log_2 A\rceil. \]

Example (Hamming distance): Informal description:' $\cal X=\cal Y=\sets{0,1}^n$, \[ S=\sets{x,y:d_H(x,y)\le d}. \] For $d=1$

Example (insertions): $\cal X=\sets{0,1}^{n+1}$, $\cal Y=\sets{0,1}^n$, \[ S=\sets{x,y:y\text{ can be obtained from }x\text{ by a single insertion}}. \] It is easy to see that every $n$-bit sequence has $n+2$ supersequences of length $n+1$. For example, 001 has five 4-bit supersequences: 0010, 0001, 0011, 0101, 1001. If the sender knew the $Y$ in advance, needs to just describe which of these sequences it is. Hence \[ \Lpwc=\ceil{\log(n+2)}. \] the number of sequences obtained from any

Open problem For some problems, the numbers of bits required is larger than that needed when the sender knows the receiver's information in advance. Example (league): As this example shows, for zero-error source coding with side information, interaction can help reduce communication. This is further studied in interactive communication.

Source Coding with side information
Let $(X,Y)$ be a two dimensional DMS and $d(x,\hat{x})$ is the distortion measure. The encoder generates a description of the source $X$ and sends it to a decoder who has side information $Y$ and wishes to reproduce X with distortion D (e.g., X is the Mona Lisa and Y is Leonardo da Vinci). There are many possible scenarios of side information availability ◦ Side information may be available at the encoder, the decoder, or both ◦ Side information may be available fully or encoded ◦ At the decoder, side information may be available noncausally (the entire side information sequence is available for reconstruction) or causally (the side information sequence is available on the fly for each reconstruction symbol)

No Side Information: Shannon’s Lossy Source Coding Theorem
With no side information theory, the problem is reduced to that of Shannon's source coding theorem in which a Discrete Memoryless Source $X$ is encoded by the encoder at rate R and the decoder receives the message and generates and estimate $\hat{X}$ of the source due to some distortion D. The rate–distortion function for a DMS $(X, p(x))$ and a distortion measure $d(x,\hat{x})$ is $R(D)= min_{p(\hat{x}|x):E(d(x,\hat{x})\leq D} I(X;\hat{X})$. Moreover, if we want $X$ to be losslessly decoded, the corresponding rate is $H(X)$. It should be mentioned that $R(D)$ is non-increasing, convex, and continuous function in D.

Side information only at the encoder
In the setting mentioned above, providing side information only to the encoder will not help and the rate is the one given in the Shannon's rate-distortion theorem.

Non-causal side information at decoder: Wyner-Ziv Theorem
If the side information is available noncausally at the decoder (the whole sequence $Y$ is known at the decoder,and the latter is interested in recovering $X$ with some distortion $D$, the problem has already been solved by Wyner-Ziv Theorem.

Causal side information available at decoder
The rate–distortion function for $X$ with side information $Y$ available causally at the decoder is $R(D)= min_{p(u|x)} I(X;U)$.

Side information available both at encoder and decoder
The rate–distortion function for $X$ with side information $Y$ available both at the encoder and decoder is $R(D)= min_{p(u|x,y),\hat{x}(u,y)} I(X;U|Y)$. The lossless version of this problem is also addressed in the literature of lossless source coding theorem in which $H(X|Y)$ is the corresponding rate.

test page