Distributed training algorithms are susceptible to straggler nodes and adversarial attacks. In this talk, I will explain why these pose a fundamental challenge for scaling up, and will provide insights on how to overcome them using simple coding theoretic ideas. I will show experiments where codes can be used to build faster and more reliable distributed training algorithms, and will conclude with several open problems that lie in the intersection of machine learning and distributed systems.