Channel coding over unknown channels

Classical information theory deals with characterizing the fundamental limits of communication (e.g. maximum rates and error exponents) when the channel is known to the encoder and the decoder, and the communication system is designed based on this knowledge. However, in practice, often the channel is unknown to the encoder and decoder. There are several models that address this scenario. The two dominant ones are the compound channel and the arbitrarily varying channel.

In the compound channel model, the channel law is arbitrarily chosen from a set of possible channel laws, and then remains stationary for the duration of transmision, whereas in the more stringent arbitrarily varying channel model, the channel can be non-stationary, i.e. can change individually in every time step. The main problem is to determine the capacity of the compound channel or that of the arbitrarily varying channel, which is the supremum of rates which can be transmitted reliably over any such possible channels. In the arbitrarily varying channel case, it is as if an adversary is present and can determine the channel state sequence in order to fail (or jam) communication.

To simplify the discussion, we restrict ourselves to memoryless models with finite input and output alphabets.

Compound channels
Let $\mathcal{W}$ be a finite set of possible channels. Each channel $W \in \mathcal{W}$ is a probability distribution $W(y|x)$ of the channel output letter $y \in \mathcal{Y}$ given a channel input letter $x \in \mathcal{X}$. The problem is to characterize the best rate, capacity, among schemes that can reliably communicate over every possible channel in $\mathcal{W}$ in the sense that the error probability tends to $0$ as the block length $n$ tends to $\infty$.

The compound channel capacity is shown to be equal to $$ C_{compound}(\mathcal{W}) = \max_{P} \quad \inf_{W \in \mathcal{W}} I(P,W) $$ where $I(P,W)$ is the mutual information obtained with prior $X \sim P$ and channel $W(y|x)$. The converse is conceputually simple since by the standard channel coding converse $I(P,W)$ is the maximum rate that can be reliably sent over the channel $W$. For a given $P$, the worst case channel determines the maximum rate, and the result can be optimized with respect to $P$. For the direct part, universal decoding is required, i.e., the decoder has to be able to achieve this rate without knowing $W$. Intuitively, one can employ a training phase prior to reliable communication of the messages. As there are only finitely many possible channel which is needed to be learned, the ``real'' channel can be detected with arbitrarily small error by a sufficiently long training phase after which point a capacity achieving codes for the learned channel can be employed to achieve the mentioned rate.