In this talk, we consider an interacting two-agent sequential decision-making problem consisting of a Markov source process, a causal encoder with feedback, and a causal decoder. Motivated by a desire to foster links between control and information theory, we augment the standard formulation by considering general alphabets and a cost function operating on current and previous symbols. Using dynamic programming, we provide a structural result whereby an optimal scheme exists that operates on appropriate sufficient statistics. We emphasize an example where the decoder alphabet lies in a space of beliefs on the source alphabet, and the additive cost function is a log likelihood ratio pertaining to sequential information gain. Using the second law of thermodynamics for Markov chains, we strengthen our structural result to show that an optimal scheme always exists where the decoder's decision variable is the true belief. We demonstrate new connections with message point feedback communication and the nonlinear filter for hidden Markov models.