Arbitrarily varying channel

An arbitrarily varying channel (AVC) is an information-theoretic model for communication in the presence of an adversary. The capacity of these channels depends on the error criterion and whether or not the encoder and decoder can jointly randomize using common randomness.

In the AVC literature, the form of the capacity formula is generally determined by two factors: the allowable coding strategies and the error criterion. In a randomized code, the encoder and decoder share a source of common randomness with which they may randomize their coding strategy, whereas a deterministic code uses a fixed mapping from messages to codewords. The state sequence may depend on different quantities -- the message, the transmitted codeword, or both. Furthermore, we may relax the definition of correct decoding to allow the decoder to output a list of candidate codewords. All of these changes affect the way in which we define the error criterion.

= Channel models =

The AVC is modeled by a set of channels $\mathcal{W} = \{W(y | x, s) : s \in \mathcal{S}\}$ with finite input alphabet $\mathcal{X}$ and finite output alphabet $\mathcal{Y}$. If $\mathbf{x} = (x_1, x_2, \ldots, x_n)$, $\mathbf{y} = (y_1, y_2, \ldots, y_n)$ and $\mathbf{s} = (s_1, s_2, \ldots, s_n)$ are length $n$ vectors, the probability of observing the output $\mathbf{y}$ given the input $\mathbf{x}$ and state $\mathbf{s}$ over the AVC $\mathcal{W}$ without feedback is given by: \begin{align} W(\mathbf{y}| \mathbf{x}, \mathbf{s}) = \prod_{i=1}^{n} W(y_i | x_i, s_i). \end{align} The interpretation of is that the channel state can change arbitrarily from time to time. The state $\mathcal{S}$ is considered to be controlled by a malicious jammer who wishes to stymie the communication between the transmitter and receiver. As a result, the capacity definitions are taken to be worst-case over the behavior of the jammer.

Deterministic and randomized codes
An $(n,N)$ deterministic code $\mathcal{C}$ for the AVC $\mathcal{W}$ is a pair of maps $(\phi, \psi)$ with $\phi : [N] \to \mathcal{X}^n$ and $\psi : \mathcal{Y}^n \to [N]$. The rate of the code is \begin{align} R = n^{-1} \log N. \end{align} The decoding region for message $i$ is $D_{i} = \{\mathbf{y} : \psi(\mathbf{y}) = i\}$.

An $(n,N)$ randomized code $\mathbf{C}$ for the AVC $\mathcal{W}$ is a random variable taking on values in the set of deterministic codes. If $\mathbf{C} = (\Phi, \Psi)$ is uniformly distributed on a set of $K$ codes, then we call this an $(n,N,K)$ randomized code with key size $\log K$. Note that the realization of the code is shared by the encoder and decoder, so the key is known by both parties. The rate of the code is $R = n^{-1} \log N$. The decoding region is a random variable $\mathbf{D}_i = \{\mathbf{y} : \Psi(\mathbf{y}) = i\}$ and under under key $k$ we write $D_{i,k} = \{\mathbf{y} : \Psi_k(\mathbf{y}) = i\}$. For a randomized code we require that the decoding error be small for each message averaged over key values. Randomization allows several different codewords to represent the same message.

Error criteria
The maximal error for an $(n,N)$ deterministic code over an AVC $\mathcal{W}$ is given by	\begin{align} \varepsilon_d = \max_{i} \max_{\mathbf{s} \in \mathcal{S}^n} \left( 1 			- W\left( \mathbf{D}_i | \phi(i), \mathbf{s} \right) 		\right), \end{align} The maximal error for an $(n,N)$ randomized code over an AVC $\mathcal{W}$ is given by	\begin{align} \varepsilon_r = \max_{i} \max_{\mathbf{s} \in \mathcal{S}^n} \mathbb{E}\left[ 1 - W\left( \mathbf{D}_i | \Phi(i), \mathbf{s} \right) \right], \end{align} where the expectation is over the randomized code.

The average error for an $(n,N)$ deterministic code over an AVC $\mathcal{W}$ is given by	\begin{align} \bar{\varepsilon}_d = \max_{\mathbf{s} \in \mathcal{S}^n} \frac{1}{N} \sum_{i=1}^{N} \left( 1 			- W\left( \mathbf{D}_i | \phi(i), \mathbf{s} \right) 		\right), \end{align} The average error for an $(n,N)$ randomized code over an AVC $\mathcal{W}$ is given by	\begin{align} \bar{\varepsilon}_r = \max_{\mathbf{s} \in \mathcal{S}^n} \frac{1}{N} \sum_{i=1}^{N} \mathbb{E}\left[ 1 - W\left( \mathbf{D}_i | \Phi(i), \mathbf{s} \right) \right], \end{align} where the expectation is over the randomized code.

= Capacity results and bounds =

A rate $R$ is called achievable if for every $\epsilon > 0$ there exists a sequence of $(n, N)$ codes (deterministic or randomized) of rate $R_n \ge R - \delta$ whose probability of error (maximal or average) is at most $\epsilon$. For a given error criterion, the supremum of achievable rates is the capacity of the arbitrarily varying channel. We will write $C_r$ for the randomized coding capacity under maximal error, $\bar{C}_r$ for the randomized coding capacity under average error, $C_d$ for the determinstic coding capacity under maximal error, and $\bar{C}_d$ for the deterministic coding capacity under average error.

Many of the capacity results have to do with the following quantity: \begin{align} C = \max_{P(x)} \min_{Q(s)} I(X ; Y) \end{align} where $I(X; Y)$ the mutual information between $X$ and $Y$ corresponding to the following joint distribution on $\mathcal{X} \times \mathcal{S} \times \mathcal{Y}$: \begin{align} \bar{P}(x,s,y) = P(x) Q(s) W(y | x, s). \end{align} The interpretation of this is that the jammer chooses a distribution $Q$ to create the average DMC $\sum_{s} W(y | x, s) Q(s)$. The quantity $C$ is the worst-case capacity over all such DMCs that can be created by the jammer. This quantity turns out to be the capacity of the AVC using randomized coding, and also (in some cases) for deterministic coding and average error.

If the jammer has access to the transmitted codeword, then a related quantity is given by \begin{align} \hat{C} = \max_{P(x)} \min_{V(s|x)} I(X ; Y). \end{align} where $I(X; Y)$ the mutual information between $X$ and $Y$ corresponding to the following joint distribution on $\mathcal{X} \times \mathcal{S} \times \mathcal{Y}$: \begin{align} \bar{P}(x,s,y) = P(x) U(s|x) W(y | x, s). \end{align} The interpretation of this is that the jammer chooses a distribution $U(s|x)$ to create the average DMC $\sum_{s} W(y | x, s) U(s|x)$. The quantity $\hat{C}$ is the worst-case capacity over all such DMCs that can be created by the jammer. This quantity is a natural upper bound on the capacity under deterministic coding and maximal error, since it suffices for the adversary to create this effective DMC for a single codeword.

Randomized codes, maximal error
The first result on arbitrarily varying channels was in the seminal paper by Blackwell, Breiman, and Thomasian using a game-theoretic approach. In their model, the jammer is allowed to choose its state $S_t$ at time $t$ with knowledge of the inputs $X_i$ and outputs $Y_i$ for times $i = 1, 2, \ldots, t-1$. Communication over the AVC was modeled as a two person zero-sum game between the jammer and the encoder/decoder. The first player chooses a jamming strategy for selecting $S_t$, and the second player chooses a deterministic code. The payoff is $1$ to the jammer if a decoding error is made. Mixed strategies for the second player correspond to randomized codes. Using mixed strategies the maximal probability of error under randomized coding is the value of the game.

Theorem.  The capacity of the AVC under maximal error and randomized coding is \begin{align} C_r = C. \end{align} Furthermore, $C_r$ is the capacity when the state $S_t$ at time $t$ can depend on all inputs $X_i$ and outputs $Y_i$ for $i = 1, 2, \ldots, t-1$.

Randomized codes, average error
Theorem. The capacity of the AVC under average error and randomized coding is \begin{align} C_r = C. \end{align}

This follows directly from the result for maximal error.

Deterministic codes, maximal error
Because maximizing the error over messages is the same as maximizing over codewords, for deterministic codes under the maximal error criterion we may assume that the jammer knows the codeword being transmitted. One strategy for the jammer against message $i$ is to choose a channel $U(s|x) \in \mathcal{U}$ and generate its input $\mathbf{s}$ by taking a codeword $\mathbf{x}(i)$ corresponding to message $i$ and passing it through the channel $U$. If the encoder transmits message $i$, then for this choice of $\mathbf{s}$ the channel has the distribution of a DMC $V(y|x)$ given by	\begin{align} V(y | x) = \sum_{s} W( y | x, s) U(s | x)~. \end{align} For AVCs with binary output alphabets Ahlswede and Wolfowitz showed that the capacity under maximal error and deterministic coding $C_d$ is equal to $\hat{C}$. Extensions of this result to other classes of AVCs were found by Ahlswede and Kambo and Singh. The best results to date are due to Csisz&aacute;r and K&ouml;rner. They define a relation $x \stackrel{W}{\sim} x'$ between $x$ and $x' \in \mathcal{X}$ if there are distributions $Q_1, Q_2$ on $\mathcal{S}$ such that \begin{align} \sum_{s \in \mathcal{S}} W(y | x, s) Q_1(s) = \sum_{s \in \mathcal{S}} W(y | x', s) Q_2(s) \qquad \forall{y}. \end{align} Their result is an achievable rate using deterministic coding.

Theorem. For an unconstrained AVC $\mathcal{W}$, the following bound holds on the deterministic coding capacity under maximal error: \begin{align} C_d \ge \min \left( \hat{C}, D(P) \right)~, \end{align} where \begin{align} \mathcal{D} &= \left\{ F(x,x') \in \mathcal{P}(\mathcal{X}\times\mathcal{X}) : F \left(			\{ X \stackrel{W}{\sim} X' \}		\right) = 1,\ 		X \sim P,\ X' \sim P		\right\} \\ D(P) &= \min_{F \in \mathcal{D}} I(X;X')~. \end{align}

One explanation for the difficulty in establishing the deterministic coding capacity for general AVCs is that this problem has connections to a difficult open problems in information theory. Ahlswede showed that finding $C_d$ for certain AVCs is equivalent to finding the zero-error capacity of a corresponding DMC. Although Lov&aacute;sz solved a special case of the zero-error capacity problem, finding the zero-error capacity in general is still a major open problem in information theory.

Deterministic codes, average error
Randomization relaxes the stringent requirements of deterministic coding for maximal error by making the encoder and decoder more powerful. Another relaxation is to change the error criterion from the maximum over all messages to the average. That is, instead of demanding the error under a state sequence $\mathbf{s}$ be small for every message $m$, we can require instead that the error be small for a vanishingly small fraction of messages. Under this coding model, Ahlswede proved that the capacity for unconstrained AVCs exhibits a \textit{dichotomy} -- the capacity is either $0$ or equal to the randomized coding capacity. Csisz&aacute;r and Narayan showed that the idea of symmetrizability, introduced by Ericson, characterizes this dichotomy.

A central idea in the study of AVCs under average error is that of symmetrizability. We call a channel $V(y | x_1, x_2)$ from $\mathcal{X}^2$ to $\mathcal{Y}$ symmetric if 	\begin{align} V(y | x_1, x_2) = V(y | x_2,x_1 ) \qquad \forall (x_1, x_2, y)~. \end{align} An AVC $\mathcal{W}$ is symmetrizable if there exists a distribution $U(s | x)$ such that \begin{align} V(y | x, x') = \sum_{s \in \mathcal{S}} W(y | x, s) U(s | x') \end{align} is symmetric. That is, \begin{align} \sum_{s \in \mathcal{S}} W(y | x, s) U(s | x') = \sum_{s \in \mathcal{S}} W(y | x', s) U(s | x) \qquad \forall (x,x',y) \in \mathcal{X} \times \mathcal{X} \times \mathcal{Y}~. \end{align} The intuitive meaning of this is that the jammer can simulate the transmitter by choosing a codeword $\mathbf{x}'$ and passing it through the channel $U$ to get a state sequence $\mathbf{s}$. The decoder will be unable to tell if the transmitted codeword was $\mathbf{x}$ or $\mathbf{x}'$ because the average channel is symmetric between its two inputs.

Theorem. For the AVC under deterministic coding and average error, \begin{align} \bar{C}_d = \left\{ \begin{array}{ll} 0 & \textrm{if $\mathcal{W}$ is symmetrizable} \\ C & \textrm{otherwise} \end{array} \right. \end{align}

In order to prove this result, Ahlswede used a subsampling argument known as the elimination technique. Beginning with a randomized code $\mathbf{C}$ that achieves a rate below $C_r$, he showed that a new randomized code consisting of $n^2$ iid codebooks sampled from $\mathbf{C}$ has small average probability of error. If $\bar{C}_d > 0$ then the encoder sends a codeword consisting of two parts. It first chooses one of the $n^2$ codebooks uniformly at random. Because the deterministic coding capacity $\bar{C}_d$ is positive, there exists a deterministic code which can transmit the choice of codebook with small average probability of error. This requires only $2 \log n$ bits and so the blocklength required is negligible. The second part of the encoder's codeword encodes the message using the selected codebook. Thus the overall codeword consists of a short prefix containing the choice of codebook followed by the encoded message.

= Variants =


 * Gaussian AVCs
 * AVCs with constraints