Fano's inequality

For $X$ and $Y$ taking values on a set with cardinality $|A| = M$, it holds that $$ H(X|Y) \leq \mathbb P \left[ X \neq Y \right] \log (M - 1) + h_b \left( \mathbb P \left[ X \neq Y\right]\right) $$ where $H(X|Y)$ is the conditional entropy, and $h_b(\cdot)$ is the binary entropy function.

Proof
Define the binary random variable $Z$ to be $0$ if $X=Y$ and to be $1$ if $X\neq Y$. By the chain rule for entropy, $$ H(X|Y) = H(X,Z|Y)-H(Z|X,Y). $$ The second term on the right-hand side is equal to zero since $Z$ is a function of $X$ and $Y$. The first term can be rewritten, using again the chain rule, as \begin{align} H(X,Z|Y) & = H(Z|Y)+H(X|Y,Z) \\ & = H(Z|Y)+\mathbb P [Z = 0]H(X|Y, Z=0)+\mathbb P [Z = 1]H(X|Y, Z=1). \end{align} Since conditioning reduces entropy, we can upper bound the first term as $$ H(Z|Y) \leq H(Z) = h_b\left( \mathbb P \left[ X \neq Y\right]\right). $$ Since $X=Y$ whenever $Z=0$, $H(X|Y, Z=0)$ in the second term is equal to zero. Since $X$ takes values in $A\setminus\{Y\}$ when $Z=1$, the third term can be upper bounded by $$ \mathbb P [Z = 1]H(X|Y, Z=1) \leq \mathbb P [Z = 1]\log(M-1) = \mathbb P \left[ X \neq Y\right]\log(M-1). $$ Hence, $$ H(X|Y) = H(X,Z|Y) \leq h_b\left( \mathbb P \left[ X \neq Y\right]\right) +\mathbb P \left[ X \neq Y\right]\log(M-1), $$ completing the proof.