PCA is one of the most commonly used statistical procedures with a wide range of applications. We consider both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we establish the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters. The lower bound is obtained by calculating the local metric entropy and an application of Fano's Lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal subspace. A key idea is a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem.