We are now facing a new data regime where, instead of having a small number of relatively accurate measurements, we often have a huge amount of data that is highly noisy and incomplete, or even maliciously manipulated. New learning algorithms are required to extract useful information from such "big but dirty" data. We discuss in this talk several important problems in this area, including high dimensional sparse regression, low-rank matrix recovery and graph clustering; different corruption models are considered. We introduce algorithms that are robust to corruption in the data, require minimal knowledge of the nature of the corruption, and have good statistical performance.