There is a long history to the study of artificial neural networks, including work done by the information theory pioneer Thomas Cover. The recent interest in applications of Deep Learning has generated renewed interest in multi-layered architectures comprised of non-linear units. A number of theoretical questions about why such networks are useful, and what their fundamental learning capacities are, remain unanswered. In the mid-1990s many researchers looked at exactly these issues and derived several results that explicitly explored the role of depth in neural architectures and why deep learning might work better. In this talk we will review many of these results and put some of the claims in recent literature in perspective.