We study the classical setting of full-information sequential prediction/action. The randomized multiplicative weights update has been long used to guarantee a worst-case bound on regret with respect to the best single action in hindsight. Adaptive updates have also been recently proposed to obtain improved second-order regret bounds by reducing unwarranted paranoia about potentially adversarial behavior. In this work, we develop an algorithm takes advantage of more complex stochastic structure while remaining computationally feasible. We obtain improved regret rates against the natural benchmark of offline predictability in more complex stochastic environments -- finite-memory Markov chains and hidden Markov models. Our approach effectively selects the best natural model for the environment in an online fashion. This translates to improved guarantees on overall reward (loss).