We consider an algorithmic optimization-based approach to learning binary classifiers from noisy data under worst-case assumptions. We model the learning process as a decision-maker who seeks to minimize generalization error, given access only to possibly maliciously corrupted data. Two models for the nature of the noise are considered: noise in the labels of a certain fraction of the data, and noise that also affects the position of the data points. We provide distribution dependent bounds on the amount of error as a function of the noise level for the two models, and describe the optimal strategy of the decision-maker, as well as the worst-case noise.