We study learnability of hypotheses classes in online prediction models. The analogous question in the PAC learning model of Valiant was addressed by Vapnik and others, who showed that the VC dimension characterizes the learnability of a hypothesis class. In his influential work, Littlestone described a combinatorial characterization of hypothesis classes that are learnable in the online model. However, Littlestone only dealt with binary classification in the realizable case. Namely, the learner should predict a binary value and it is assumed that there exists a hypothesis in the class that perfectly explains the entire data. In this work, we study online learnability in more general cases. We start with binary classification in the agnostic case, namely, no hypothesis perfectly predicts the entire data, and our goal is to minimize regret. We describe several models of non-realizable data and derive upper and lower bounds on the achievable regret. In particular, for hypotheses classes with finite Littlestone dimension, we derive square root regret bounds for the case of adaptive adversarial noise and logarithmic regret bounds for the case of a bounded stochastic noise. Interestingly, the latter result is somewhat similar to fast rates under low noise conditions developed in the PAC learning model. Finally, we discuss the more challenging problem of online multiclass categorization in the bandit setting, in which the learner only receives partial feedback.