Algorithms are exposed to randomness in the input or noise during the computation. Therefore, algorithms especially in Machine Learning are required to show robustness to input fluctuations or randomization during execution. This talk discusses a new framework where an algorithm is considered to be a noisy channel with a generalization capacity (GC). GC objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. Information theoretic algorithm selection is rigorously demonstrated for sparse mean estimation in high dimensions. The method also aloows us to rank centroid based and spectral clustering methods, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering.