We consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data. We show that it is often possible to accurately estimate this ``learnability'' even when given an amount of data that is too small to reliably learn any model that fits the distribution. Our first result applies to the setting where the data is drawn from a d-dimensional distribution with isotropic covariance, and the label of each datapoint is a linear function of the data plus independent noise of unknown variance. In this setting, we show that the magnitude of the noise can be accurately approximated given $O(sqrt{d})$ samples, which is optimal. Note that even if there is no noise, a sample size linear in the dimension, $d$, is required to learn any function correlated with the underlying linear model. We then extend our estimation approach to the setting where the data distribution has an (unknown) arbitrary covariance matrix, allowing for these techniques to be applied to settings where the model class consists of a linear function applied to a nonlinear embedding of the data. Finally, we demonstrate the practical viability of these approaches on synthetic and real data. This is joint work with Weihao Kong.