Reinforcement learning has had many successes in domains where experience is inexpensive, such as video games or board games. RL algorithms for such domains are often based on policy gradients; they make many noisy policy updates with a small learning rate. We instead examine algorithms that spend more computation per step, in an attempt to reduce noise and make larger policy updates; such algorithms are appropriate when experience is more expensive than compute time. In particular we look at several methods based on approximate policy iteration.