Linear classification is a fundamental problem of machine learning, in which positive and negative examples of a concept are represented in Euclidean space by their feature vectors, and we seek to find a hyperplane separating the two classes of vectors. We give the first sublinear-time algorithms for linear classification and other related problems in machine learning, including the kernelized versions of these problems. These new algorithms are based on a primal-dual approach, and use a combination of novel sampling techniques and the randomized implementation of online learning algorithms. We give lower bounds which show our running times to be nearly best possible in the unit-cost RAM model. Joint work with Ken Clarkson and David Woodruff, preliminary version appeared in FOCS 2010.