We develop Multi-label Random Forests to tackle problems with millions of labels. We propose a novel node splitting criterion that: (a) generalizes traditional random forests to multi-label problems; (b) facilitates very large scale training by learning from only positive data; and (c) overcomes the well-known random forest limitation of overfitting in high dimensional spaces. To compensate for numerous missing and incorrect labels, we do not train on the given labels directly but extend our random forest classifier to train on beliefs inferred about the state of each label. These beliefs are generated through a novel sparse semi-supervised learning formulation optimized via an efficient distributed iterative hard thresholding algorithm.