We study collaborative filtering in the online setting where items are recommended to users over time. We propose a simple model guided by the observation that in a variety of datasets the number of types of users is small relative to the total number of users. The model allows to lower bound the number of bad recommendations given by any algorithm in terms of system parameters, e.g., number of users, number of items, and noise. We give a new and nearly optimal algorithm, building upon the classical epsilon-greedy algorithm for the multi-armed bandit problem and weighted majority voting for collaborative filtering. A main insight is the benefit of two different kinds of exploration: one to explore similarity between users, and the other to explore the space of items.