Many problems in statistics and machine learning involve fitting a low-rank matrix to noisy data. Recent work has identified gradient descent over the low-rank space as a computationally efficient approach to the resulting rank-constrained optimization problem. We develop a unified framework characterizing the computational and statistical properties of this non-convex approach. We obtain near-optimal estimation error bounds, and show robustness against adversarially corrupted data, matching the statistical power of the best existing convex approaches. Computationally, we prove geometric convergence rates and near-linear running time. Our general theory yields global guarantees for a large number of concrete problems, including (1-bit) matrix completion, robust PCA, community detection, sparse PCA and matrix sensing. Our results provide insights to broad applicability and general strong performance of nonconvex approaches.