In a bandit problem agents who are initially unaware of the stochastic evolution
of the environment (arms), aim to maximize a common objective based on the
history of actions and observations. The classical difficulty in a bandit
problem is the exploration-exploitation dilemma, which necessitates a careful
algorithm design to balance information gathering and best use of available
information to achieve optimal performance. The motivation to study bandit
problems comes from its diverse applications including cognitive radio networks,
opportunistic spectrum access, network routing, web advertising, and many
others.
In this talk we provide an agent-centric approach in designing online learning
algorithms for bandit problems considering communication, computation and
switching costs