We show that gradient descent on an unregularized logistic regression
problem (including their softmax multiclass generalization) with
separable data converges to the max-margin solution and can thus
ensure good generalization even in underdetermined settings.
Furthermore, we show this convergence is very slow, and only
logarithmic in the convergence of the loss itself. This can help
explain the benefit of continuing to optimize the logistic or
cross-entropy loss even after the training error is zero and the
training loss is extremely small, and, as we show, even if the
validation loss increases.  We further show how our framework can also
be used to characterize the implicit bias of generic steepest decent,
natural gradient descent, mirror descent, AdaGrad and gradient descent
on factorized model and linear neural networks.