Sparse representations of high-dimensional data arise in various applications ranging from biology to information systems applications. The sparseness of the features should depend on the noise level in the data. We propose an information theoretic approach to control the strength of the sparseness penalty. The method utilizes the concept of approximation set coding to convert a cost function for sparse means estimation and for sparse linear regression to a coding problem. The capacity of such a coding problem determines the approximation precision of the feature selection method.