In online learning, a learner repeatedly chooses actions from a fixed set, each with an unknown loss determined by an all-powerful adversary, with the goal of minimizing regret to the best action. In many realistic settings, the losses are actually dependent on the learner's past moves, which is not captured via the standard notion of regret. In this paper, we focus on two important cases: (1) Switching between different actions incurs a cost; and more generally, (2) The losses depend on the learner's actions in a bounded number of previous rounds. We characterize the attainable performance in these cases, under both full-information and bandit feedback, showing interesting new gaps between the two, and providing new algorithms as well as lower bounds.