Notions of wireless channel capacity

Channel Capacity for Discrete Memoryless Channel
From classical Shannon Theory, the capacity of discrete memoryless channel ( DMC, $p_{X \mid Y}(x \mid y)$ ) is shown as below.
 * $$\ C = \sup_{p_X(x)} I(X;Y)\, $$

where $p_X(x)$ is the distribution of the input symbols. This result can be extended when the channel is not discrete. The results in this page are mainly from.

Capacity of AWGN Channel
It is easily to extended the result of the capacity of DMC into the scenario when the channel is continuous. The simplest case is the AWGN channel model, which is defined as follows.
 * $$\ y[m] = x[m] + z[m], $$

where $m$ is the time index, $y[m]$ is the received signal, $x[m]$ is the transmitted signal and $z[m]$ is the noise which is distributed as $N(0, \sigma^2)$ and independent over time. All the signals are real.

The capacity of AWGN channel can be proved as
 * $$\ C_{AWGN} = \frac{1}{2}\log\left(1+\frac{P}{\sigma^2}\right), $$

The unit is bits per channel use. All the logarithm in this page is based on $2$.

To better understand the concept of the capacity of AWGN channel, we take a look at the graphic explanation of the capacity. Basically, characterizing the capacity of AWGN channel is a sphere packing problem. We consider that the received $N$-dimensional vector $\bf{y} = \bf{x}+\bf{z}$ will lie within a sphere of radius $\sqrt{N(P+\sigma^2)}$ with high probability due to law of large numbers. On the other hand, we can find that $\lim_{N\rightarrow \infty}\frac{1}{N}\sum_{m=1}^Nz^2[m]=\sigma^2$, which is due to law of large numbers. Thus, the received vector $\bf{y}$ lies near the surface of a sphere with radius of $\sqrt{N}\sigma$ around the transmitted codeword. Reliable communication means that the noise sphere around the transmitted codewords do not overlap, therefore, the maximum number $NUM$ of the noise spheres that can be packed inside the whole sphere is given by the ratio of two spheres shown below.
 * The Techniques used for the Proof for the Capacity AWGN Channel
 * Converse
 * Fano's inequality
 * Data Processing inequality
 * Jensen's inequality
 * Achievable Scheme: the capacity achievable coding scheme is Random Coding scheme.
 * Codebook generalization: each element of a codeword is distributed as $N(0,P-\epsilon)$ and i.i.d over time (i.i.d. Gaussian Code).
 * Encoding: The codebook is revealed at both transmitter and receiver. To send a message, the transmitter assign a codeword in the codebook to this message. For different messages, we assign different codewords.
 * Decoding: Typicality Decoding scheme.
 * Graphic Explanation of the Capacity for AWGN Channel
 * $$\ NUM = \frac{\left(\sqrt{N(P+\sigma^2)}\right)^N}{\left(\sqrt{N\sigma^2}\right)^N} $$

Thus, the maximum number of bits per symbol is just $\frac{1}{N}\log\left(NUM\right) = \frac{1}{2}\log\left(1+\frac{P}{\sigma^2}\right)$. One simple coding scheme is so-called repetition code which just repeat the same codeword $K$ times. The performance of this code is poor. We can show that if we want to have a arbitrarily low error probability, we need to arbitrarily large block length, which means that the data rate will go to zero as the block length increases. This shows that this code is not capacity achieving. The reason behind this is that repetition code does not pack the sphere sufficiently. It just pack the codeword along one dimension in the whole sphere, which is quite inefficient. To achieve the capacity, we need to pack the sphere more efficiently. The random coding scheme and sphere packing argument just show the achievability of the capacity of AWGN Channel. However, how to construct good codes is another story.
 * Repetition Code.
 * Capacity Achieving Codes for AWGN Channel

Generally, the codes design criterion corresponds to the ML decoding scheme which choose the codeword as the decoded codeword with with the highest probability when the prior information is not available. In AWGN channel, it intuitively means that we should choose the nearest codeword to the received vector as the decoded codeword. The real problem of designing the codes is the complexity of encoding and decoding should be low for practice. The linear algebraic codes have this properties but somehow they cannot achieve the capacity. Later, as the invention of Turbo and LDPC codes, we can have the capacity achieving codes with low encoding and decoding complexity after careful design.

Capacity of Band limited AWGN Channel
In the last section, we discuss about the capacity per channel use in AWGN channel. In practice, usually we have a band limited system. By assuming the complex baseband channel:
 * $$\ y[m] = x[m] + z[m], $$

where the signals are now complex and $z[m]$ is CN(0,N_0) and i.i.d over time. The bandwidth is assumed to be $W$. The capacity can be described as:
 * $$\ C_{AWGN} = W\log\left(1+\frac{P}{N_0W}\right), $$

whose unit is bits/s.

The capacity of band limited AWGN channel is more complicated. If we fix the SNR, i.e. $\frac{P}{N_0W}$, the capacity will grow linearly as the bandwidth $W$. On the other hand, if we fix the power $P$, and increase the bandwidth $W$, the capacity can be computed as:
 * $$\ W\log\left(1+\frac{P}{N_0W}\right) \approx W\left(\frac{P}{N_0W}\right)\log_2e = \frac{P}{N_0W}\log_2e. $$

From this, we can see that the capacity is proportional to the received power $P$ and is not sensitive to the bandwidth $W$.

As the bandwidth $W$ goes to infinity, the capacity is:
 * $$\ C_{\infty} = \frac{P}{N_0W}\log_2e bits/s. $$

Moreover, if we normalize with the bandwidth of the AWGH channel capacity, we can get
 * $$\ C_{AWGN} = \log\left(1+\frac{P}{N_0W}\right), $$

whose unit is bits/s/Hz and this is usually called spectral efficiency.

Capacity of Frequency Selective Channels
We consider a time invariant channel. Under this condition, the frequency selective channel is the channel model that considers the multipath effect. It is defined as:
 * $$\ y[m] = \sum_{l=0}^{L-1} h_lx[m-l] + w[m], $$

where $L$ is the taps due to the multipath effect.

The method to deal with this channel is to use OFDM to transform the signal from time domain into frequency domain (This operation needs to add cyclic prefix of length $L-1$ on the signals in time domain), then we have a set of parallel Gaussian channels. For simplicity, we ignore the time index for a momonent, and by assuming the number of sub-carriers is $N_c$, we have:
 * $$\ \tilde{y}_n = \tilde{h}_n\tilde{x}_n + \tilde{w}_n, \quad n=1,\cdots,L-1$$

all of which are the DFT of the inputs, noise and the outputs.

Now we have parallel independent Gaussian channels, the capacity of this channel is given by:
 * $$\ C_{PGC} = \sum_{n=0}^{N_c-1} \log\left(1+\frac{P_n|\tilde{h}_n|^2}{N_0}\right), $$

where $P_n$ is the power allocated to the $n$th sub-carrier.

To compute the optimum power allocation, we have the following optimization problem.
 * $$\ \max_{P_0,\cdots, P_n} \sum_{n=0}^{N_c-1} \log\left(1+\frac{P_n|\tilde{h}_n|^2}{N_0}\right), $$

subject to:
 * $$\ \sum_{n=0}^{N_c-1}P_n = N_c, \quad \forall P_n \geq 0. $$

This problem can be solved by using the Lagrangian method. The power allocation solution is:
 * $$\ P_n^* = \left(\frac{1}{\lambda} - \frac{N_0}{|h_n|^2}\right)^+, $$

where the Lagrangian multiplier $\lambda$ can be computed as:
 * $$\ \sum_{n=0}^{N_c-1}\left(\frac{1}{\lambda} - \frac{N_0}{|h_n|^2}\right)^+=N_cP. $$

This solution structure is so-called Waterfilling power allocation.

As $N_c$ goes to infinity, the optimal power allocation converges to
 * $$\ P^*(f) = \left(\frac{1}{\lambda} - \frac{N_0}{|H(f)|^2}\right)^+, $$

where $\lambda$ can be computed as
 * $$\ \int_0^WP^*(f)df = P. $$

Then, the capacity of independent coding over frequency as $N_c$ goes to infinity become:
 * $$ C = \int_0^W \log\left(1+\frac{P^*(f)|H(f)|^2}{N_0}\right)df, $$

which is also the capacity of the band limited channel with full channel state information.

Capacity of Fading Channels
From this section, we start discussing about the capacity of fading channels. In most cases, we consider the complex baseband flat fading channel, which is shown below.
 * $$\ y[m] = h[m]x[m] + z[m], $$

where all the definitions are the same before, the only difference is that now $h[m]$ is a random process with $\mathbf{E}\left[|h[m]|^2\right]=1$.

Fast Fading
We first take a look at the fast fading channel case which means that the length of the codeword spans many coherence periods of the channel. In all the cases, we assume the channel state information is available at receiver. We consider the following fading model.
 * The Capacity without Channel State Information at Transmitter
 * $$\ y[m] = h[m]x[m]+z[m], $$

where $m$ is the time index, and $h[m] = h_l$ remains the same over the $l$th coherence period of $T_c$ symbols and i.i.d. over different coherence periods. This is usually referred as block fading model. We could consider this model as a Gaussian parallel channels, so for fix channel coefficient, the average capacity over $L$ blocks is $\frac{1}{L}\sum_{l=1}^L\log\left(1+|h_l|^2SNR\right)$. Since $h_l$s are random variables, as $L$ goes to infinity, the quantity $\frac{1}{L}\sum_{l=1}^L\log\left(1+|h_l|^2SNR\right)$ becomes $\mathbf{E}\left[\log\left(1+|h_l|^2SNR\right)\right]$ by law of large numbers. Therefore, the capacity of fast fading channel is given by:
 * $$\ C = \mathbf{E}\left[\log\left(1+|h_l|^2SNR\right)\right]bits/s/Hz $$

Since the logarithm function is a concave function, then from Jensen's inequality, $\mathbf{E}[f(x)]\leq f\left(\mathbf{E}(x)\right)$ where the equality holds when $x$ is deterministic, we can see that we can always get hurts from the fast fading. Now let take a more detailed look at the capacity of fast fading channel.

When SNR is high, the capacity is given by
 * $$\ C \approx \mathbf{E}\left[\log\left(|h|^2SNR\right)\right] = \log SNR + \mathbf{E}\left[\log\left(|h|^2\right)\right]. $$

We can see there is a constant difference with the capacity of AWGN channel at high SNR. While at low SNR, the capacity the fast fading channel is
 * $$\ C \approx \mathbf{E}\left[|h|^2SNR\right)]\log_2e = SNR\log_2e \approx C_{AWGN}.$$

From this we can see that the loss of fast fading becomes negligible, which is because $\log(1+x) \approx x$ when $x$ is small.

To achieve the capacity of fast fading channel, we may need codewords with very long block length since the symbols in each codeword have to experience all the $L$ channel states. By using interleaving which means the symbols in a codeword are interleaved so that they lie in different coherence periods, we can use much shorter codewords to achieve the same capacity.

If the channel state information is available at the transmitter, the story is different. Similar as the time invariant channel case, we can use water-filling power allocation solve this problem.
 * The Capacity with Channel State Information at Transmitter

Now the power allocation becomes
 * $$\ P^*(h) = \left(\frac{1}{\lambda} - \frac{N_0}{|h|^2}\right)^+,$$

where $\lambda$ can be computed as
 * $$\ \mathbf{E}\left[\left(\frac{1}{\lambda} - \frac{N_0}{|h|^2}\right)^+\right] = P. $$

From this scheme, we can see that we do power allocation over time but we do not need the future channel state information. $\lambda$ depends on the statistics of the channel and the instant power allocation depends on the instant channel realization.

Therefore, the capacity of fast fading channel with channel state information at transmitter is given by:
 * $$\ C = \mathbf{E}\left[\log\left(1+\frac{P^*(h)|h|^2}{N_0}\right)\right]bits/s/Hz. $$

Notice that in the high SNR regime, the capacity is insensitive to the received power per channel use and allocating different amount of power to different channel states yields a small gain. However, when the SNR is low, the capacity of fast fading channel can be much larger than that of AWGN channel with the same average SNR. This can be explained as when the channel condition is bad, then with fading, there is a chance that the channel can be very good at some time. we can take the advantage of these peaks to transmit. This idea leads to the opportunistic communication in the multiuser system.

When the channel is frequency selective fading, the story is not quite different as the time invariant frequency selective fading. In the case of time invariant frequency selective fading, the channels of different sub-carriers are independent and we need to perform waterfilling power allocation algorithm over sub-carriers. Now, in the frequency selective fading channels case. Now in one coherence period of the channel, we have a parallel independent Gaussian channels so we need to perform waterfilling algorithm over sub-carriers. Then, for different coherence period, we also have a parallel independent Gaussian channels, thus we need also perform waterfilling algorithm over time.
 * Frequency Selective Fading Channels

Slow Fading
Slow fading channel means that the channel gain is random but remain constant all the time which means that $h[m]=h$ for all $m$. This models the situation that the length of the codeword is short compared to the channel coherence time. When the channel state information is not available at the transmitter, by given $h$, we can see that the maximum reliable communication rate is given by $\log\left(1+\frac{|h|^2P}{N_0}\right)$ which is the AWGN capacity with SNR $\frac{|h|^2P}{N_0}$. Since $h$ is a random variable, therefore, $\log\left(1+\frac{|h|^2P}{N_0}\right)$ is a random variable. Thus, for any positive rate $R$, the probability that $\log\left(1+\frac{|h|^2P}{N_0}\right) < R$ is not zero, in another word, for any given positive rate $R$, we cannot make the error probability arbitrarily small. Therefore, the capacity of slow fading channel is zero. However, it does not mean we cannot communicate in this channel. The capacity may not be an appropriate measure in this case.
 * The Capacity of Slow Fading Channels without channel state information at the transmitter

For a given rate $R$, the probability that
 * $$\ P_{out}(R) = P\left\{\log\left(1+\frac{|h|^2P}{N_0}\right) < R\right\}$$

is called outage probability, and the event that $\log\left(1+\frac{|h|^2P}{N_0}\right) < R$ is called outage. From this definition, an alternative performance measure is the $\epsilon$-outage capacity $C_{\epsilon}$, which is defined as the largest possible rate $R$ that can be supported in the slow fading channel such that $P_{out}(R) < \epsilon$. Then, mathematically, the outage capacity is given by
 * $$\ C_{\epsilon} = \log\left(1 + F^{-1}(1-\epsilon)\frac{P}{N_0}\right), $$

where $F$ is the complementary CDF of $|h|^2$.

Now let take a more detailed look at the $\epsilon$-outage capacity for the slow fading channel. We denote $\frac{P}{N_0}$ as SNR.

When SNR is high, we have
 * $$\ C_{\epsilon} \approx \log SNR +\log\left(F^{-1}(1-\epsilon)\right) = C_{AWGN} + \log\left(F^{-1}(1-\epsilon)\right). $$

There is a constant difference between the $\epsilon$-outage capacity and the AWGN channel capacity.

When the SNR is low, the $\epsilon$-outage capacity is approximately given by
 * $$\ C_{\epsilon} \approx F^{-1}(1-\epsilon)\log SNR = F^{-1}(1-\epsilon)C_{AWGN} $$

From this formula, we can see that there is larger effect of the slow fading on the $\epsilon$-outage capacity when SNR is low. If $\epsilon$ is small, the $\epsilon$-outage capacity can be only a small fraction of the capacity of the AWGN channel.

One way to improve the reliable communication rate in slow fading channel is to use diversity. We can use receive diversity (i.e. SIMO channel, coherent detector), transmit diversity(i.e. MISO channel, Alamouti Scheme) and time diversity (i.e. SISO channel, repetition coding).

When the channel state information is available at the transmitter side, then the story is different. One thing we can do is called channel inversion, which means that we can always invert the channel at the transmitter such that a constant SNR at the receiver can be guaranteed. By doing this, a zero outage probability can be achieved. However, the price to pay is a huge power consumption. Sometimes, it is not possible to invert the channel. For instance, the system can have a peak power constraint. A reasonable choice is the combination of channel inversion and diversity (i.e. MISO channel, transmit beamforming) to achieve a target rate.
 * The Capacity of Slow Fading Channels with channel state information at the transmitter

Now we consider the case that instead of coding over one coherence period, we can code over $L$ such periods. By doing this, we have a model of parallel Gaussian Channels. The maximum rate of reliable communication is given by:
 * Parallel Fading Channels
 * $$\ \sum_{l=1}^L\log\left(1+|h_l|^2SNR\right)bits/s/Hz,$$

where $h_l$ is the fading gains for the $l$th coherence period. If the target rate is $R$, then the outage event is:
 * $$\ \sum_{l=1}^L\log\left(1+|h_l|^2SNR\right) > LR. $$

By doing this, we can achieve an $L$-fold diversity gain since each terms of $\sum_{l=1}^L\log\left(1+|h_l|^2SNR\right)$ has to be small when $\sum_{l=1}^L\log\left(1+|h_l|^2SNR\right)$ is small.

Another important thing to notice is that to achieve the rate $\sum_{l=1}^L\log\left(1+|h_l|^2SNR\right)$, we just need to use an capacity achieving AWGN code for each term of $\sum_{l=1}^L\log\left(1+|h_l|^2SNR\right)$. However, this requires channel state information at transmitter. Nevertheless, it turns out that without channel state information at transmitter, this rate is still achievable, while the coding schemes are quite different in both cases. When the channel state information is available at the transmitter separate coding and rate allocation over different sub-channels are enough. When the channel state information is not available at the transmitter, coding across different sub-channels is necessary. Intuitively, it is because that once a sub-channel in a deep fade, the information can be still recovered if other sub-channels are good.