Geometry of Non-Convex Landscapes: Deep Learning, Matrix Completion, and Saddle-points

We show that saddlepoints are easy to avoid for even Gradient Descent -- arguably the simplest optimization procedure. We prove that with probability 1, randomly initialized Gradient Descent converges to a local minimizer. The same result holds for a large class of optimization algorithms including proximal point, mirror descent, and coordinate descent.

Next, we study the problems of learning a two-layer ReLU network and the matrix completion problem. Despite the non-convexity of both problems, we prove that every local minimizer is a global minimizer. By combining with the previous algorithmic result on gradient descent, this shows that simple gradient-based methods can find the global optimum of these non-convex problems.