Nonconvex loss functions are quickly becoming the norm in machine learning practice, but the theory is not as well-developed as in the convex case. We study approximate empirical risk minimization with a nonconvex loss. A popular scheme is the so-called stochastic Langevin gradient descent (SLGD), where properly scaled isotropic Gaussian noise is added to the usual stochastic gradient update in order to escape local minima. We show that, for loss functions with certain smoothness and dissipativity properties, SLGD produces a hypothesis with excess risk of $O(epsilon)$ after a number of iterations that scales polynomially in $(1/epsilon) log(1/epsilon)$. The analysis exploits the fact that, under our assumptions, the discrete-time SLGD recursion can be approximated in 2-Wasserstein distance by a continuous-time diffusion.