This work studies optimization of the minimum mean square error (MMSE) in order to characterize the structure of the least favorable prior distributions. In the first part, the paper characterizes the local behavior of the MMSE in terms of the input distribution and finds the directional derivative of the MMSE at the distribution PX in the direction of the distribution QX. In the second part of the paper, the directional derivative together with the theory of convex optimization is used to characterize the structure of least favorable distributions. In particular, under some mild regularity conditions, it is shown that the support of the least favorable distributions must necessarily be very small and is contained in a nowhere dense set of Lebesgue measure zero. The results of this paper produce both sufficient and necessary conditions for optimality, do not rely on Gaussian statistics assumption and are not sensitive to the dimensionality of random vectors. The results are evaluated for the univariate and multivariate random Gaussian cases, and the Poisson case. Finally, as one of the applications, it is shown how the results can be used to characterize capacity of Gaussian MIMO channels with an amplitude constraint.