In contextual bandits, one observes the payoff for the chosen arm, but not the others. This partial observability nature of the payoff leads to the difficulty of using historical data to evaluate a new arm-selection policy: when the new policy chooses an arm different from the one in the data, we simply do not have the payoff signal to do evaluation. It might appear that the only reliable way to evaluate a new policy's per-round payoff is to run it in the real bandit problem, a process that can be both expensive and slow. In this talk, through two case studies (personalized news recommendation and web search), we show how historical data can be collected properly to enable *unbiased* offline evaluation, using statistical tools from causal inference.