We show that gradient descent on an unregularized logistic regression problem (including their softmax multiclass generalization) with separable data converges to the max-margin solution and can thus ensure good generalization even in underdetermined settings. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. We further show how our framework can also be used to characterize the implicit bias of generic steepest decent, natural gradient descent, mirror descent, AdaGrad and gradient descent on factorized model and linear neural networks.