The central goal in sparse coding is to learn an over-complete dictionary that can sparsely represent a given input dataset. However, storage, transmission, and processing of the learned dictionary can be untenably high if the data dimension is high. In this paper, we consider the double-sparsity model introduced by Rubinstein et al. (2010b) where the dictionary itself is the product of a fixed basis and a data-adaptive sparse component. We introduce a simple algorithm for double-sparse coding that is amenable to implementation via neural architectures. We theoretically analyze its performance and demonstrate asymptotic benefits over existing approaches. This constitutes the first practical algorithm for double-sparse coding with rigorous statistical guarantees.