A central problem in artificial intelligence is that of planning to
maximize future reward under uncertainty in a partially observable
environment.  We discuss algorithms for learning a model of such an
environment directly from sequences of action-observation pairs, and
for closing the loop by planning in the learned model.  Specifically,
we present a spectral (or subspace identification) algorithm for
learning the parameters of a Predictive State Representation, and we
describe two methods of planning in the learned model.

http://arxiv.org/abs/0912.2385
http://arxiv.org/abs/1011.0041