It is known that the iterates of the gradient descent algorithm for least squares linear regression can be viewed as generalized ridge regression estimators. This is a formal link between the implicit regularization of gradient descent and the explicit regularization of penalized estimation, and it provides a useful perspective for understanding early stopping. We show that an analogous link can be made between penalized estimation of generalized linear models and the natural gradient descent algorithm. The corresponding penalties involve Bregman divergences between the model parameter and its initialization, encouraging shrinkage toward the initialization. This makes the implicit regularization of early stopping explicit.