Consider the scenario where an algorithm is given a context, and then it must select a slate of results to display. For example, the context may be a search query, an advertising slot, or an opportunity to show recommendations. We want to compare many alternative ranking functions that select results in different ways. However, online A/B testing with traffic from actual users is expensive. This talk explains how to use traffic that was exposed to a past ranking function to obtain an estimate of the utility of a hypothetical new ranking function, given reasonable assumptions. We show further how to design a ranking function that is the best possible, given the same assumptions. Learning an optimal algorithm to rank search results is a special case. Experimental results on data logged by a real-world e-commerce web site are positive.