Mixed linear regression involves the recovery of two or more vectors from linear measurements of each, except that these measurements are unlabeled; i.e. we do not know a-priori which measurement is from which vector. It arises in situations where there may be more than one latent cause for observed phenomena. 

The popular algorithm for this is Expectation Maximization (EM); it involves alternating between updating guesses for the labels, and solving of resulting linear equations; typically started from random initialization. In this work we provide the first statistical guarantees and sample complexity  for EM, via (a) a new initialization step, and (b) analysis via a re-sampling argument.