The typical use case of recommendations systems is suggesting items such as videos, songs or articles to users. In theory the best judges of the quality and effectiveness of a recommendations system are the users themselves, e.g., ideal metrics describe the intensity of a useršs interaction with the system over the long term. In practice, however, user experiments can be noisy, slow, non-repeatable, and confusing. A complementary offline approach can be used to quickly evaluate and optimize new recommendations systems on historical user-generated data. Yet these offline measurements need not translate directly onto the sought-after actual increases in user engagement. This talk will describe the blend of offline and online experimentation we use at Netflix to improve upon our recommendatio