In sparse PCA, we are given noisy observations of a rank-one $n$ by $p$ matrix, and seek to reconstruct it under sparsity assumptions. In particular, we assume here that the principal component $v$ has at most $k$ non-zero entries, and study the high-dimensional regime in which $p$ is of the same order as $n$. Johnstone and Lu introduced a simple algorithm that reconstructs the support of $v$ with high probability if $k = Theta(sqrt{n/log p})$. Despite a considerable amount of work, no practical algorithm was developed with better guarantees. Here we analyze a covariance thresholding algorithm proposed by Krauthgamer et al, and prove that it succeeds with high probability for $k = O(sqrt{n})$. This scaling is arguably optimal under complexity constraints.