In this paper, we study the minimization of a convex function f(X) over the space of n×n positive semidefinite matrices X⪰0, but when the problem is recast as the non-convex problem minUg(U) where g(U):=f(UU⊤), with U being an n×r matrix and r≤n. We study the performance of gradient descent on g -- which we refer to as Factored Gradient Descent (FGD) -- under standard assumptions on the original function f.
We provide a rule for selecting the step size, and with this choice show that the local convergence rate of FGD mirrors that of standard gradient descent on the original f -- the error after k steps is O(1/k) for smooth f, and exponentially small in k when f is (restricted) strongly convex. Note that g is not locally convex. In addition, we provide a procedure to initialize FGD for (restricted) strongly convex objectives and when one only has access to f via a first-order oracle.
FGD and similar procedures are widely used in practice for problems that can be posed as matrix factorization; to the best of our knowledge, ours is the first paper to provide precise convergence rate guarantees for general convex functions under standard convex assumptions.