We are now facing a new data regime where, instead of having a small number of
relatively accurate measurements, we often have a huge amount of data that is
highly noisy and incomplete, or even maliciously manipulated. New learning
algorithms are required to extract useful information from such "big but dirty"
data. We discuss in this talk several important problems in this area, including
high dimensional sparse regression, low-rank matrix recovery and graph
clustering; different corruption models are considered. We introduce algorithms
that are robust to corruption in the data, require minimal knowledge of the
nature of the corruption, and have good statistical performance.