Many problems of contemporary interest in signal processing and machine learning involve highly non-convex optimization problems. While nonconvex problems are known to be intractable in general, simple local search heuristics such as (stochastic) gradient descent are often surprisingly effective at finding global optima on real or randomly generated data. In this talk, I will discuss some results explaining the success of these heuristics. I will discuss results regarding learning the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). I will discuss this problem in the high-dimensional regime where the number of observations are fewer than the ReLU weights. I will show that projected gradient descent on a natural least-squares objective, when initialization at 0, converges at a linear rate to globally optimal weights with a number of samples that is optimal up to numerical constants. I will then discuss how this result can be generalized to over-parameterized one-hidden layer neural networks.