An information-theoretic framework is presented for bounding the number of samples needed to achieve small Bayes risk in supervised learning. This framework is inspired by an analogy with rate-distortion theory: the maximum {em a posteriori} classifier is viewed as a random source, labeled training data are viewed as a finite-rate encoding of the source, and the Bayes risk---measured by the $ell_1$ or $ell_infty$ distance---is viewed as the average distortion. A strict bound on the Bayes risk, expressed in terms of the differential entropy of the posterior, the Fisher information of the model parameters, and the number of available training samples. The proposed framework makes accurate predictions even when the training set is small, and it naturally accommodates multi-class settings. The effectiveness of this framework is demonstrated in both a binary and multi-class Gaussian setting.