We consider the problem of data feature selection prior to inference task specification, which is central to high-dimensional learning. Introducing natural notions of universality for such problems, we show a local equivalence among them. Our analysis is naturally expressed via information geometry, and represents a practically useful learning methodology with computationally efficient implementations. The development reveals the key role of the singular value decomposition, Hirschfeld-Gebelein-Renyi maximal correlation, Tishby's information bottleneck, Wyner's common information, Ky Fan k-norms, and Brieman and Friedman's alternating conditional expectation algorithm. Our results further provide a basis for understanding and optimizing aspects of neural network architectures, matrix completion methods, and semi-supervised learning.