Mixture-of-experts (MoE) is a widely popular neural network architecture and a basic building block of highly successful modern neural networks, for example, Gated Recurrent Units (GRU) and Attention networks. However, despite the empirical success, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem from more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. Our algorithm relies on a novel combination of the tensor method of moment techniques and an EM algorithm. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.