We consider the "species richness" problem, which arises in applications as diverse as ecology, language acquisition, and terror monitoring. The data consists of D observed counts out of a total of N species. D is known and usually less than N, which is not known. The goal is to estimate N or, equivalently, N-D. To do so, many methods in practice truncate data at a fixed threshold and use only the rare species. Even if the theoretical model is nonparametric, this effectively collapses it to a parametric one. Here we explicitly account for this truncation using a (semiparametric) model. Namely, we interpret abundant species as (nonparametric) outliers or noise that may overlap with (parametric) rare species, up to a threshold. When the threshold is known, we prove that one should perform a modified truncation in order to be statistically efficient. We then propose a model selection heuristic to discover the threshold. We illustrate with both synthetic and real data.