Given a task of predicting Y from X, a loss function L, and a set of probability distributions Gamma on (X,Y), what is the optimal decision rule minimizing the worst-case expected loss over Gamma? We address this question by introducing a generalization of the principle of maximum entropy. Applying this principle to sets of distributions with marginal on X and cross-moments E[YX] constrained to be the empirical marginal and cross-moments from the data, we develop a general minimax approach for supervised learning problems. While for some loss functions such as squared-error and log loss the minimax approach rederives well-known regression models, for 0-1 loss it leads to a new linear classifier which we call the maximum entropy machine. The maximum entropy machine efficiently minimizes the worst-case 0-1 loss over the structured set of distribution, and by our numerical experiments can outperform other widely-used linear classifiers such as SVM.