We consider contextual bandits with stochastic experts, where each expert is a conditional distribution over the choices given a context. We propose upper-confidence bound (UCB) algorithms to optimize regret w.r.t the best expert. We use two different importance sampling based estimators. Both these estimators exploit information leakage, thus using samples collected under all the experts to estimate the mean of any given expert. We derive problem dependent regret bounds that roughly scale as $O(M log N log T/Delta)$ where $Delta$ is the gap between the optimal and the second best expert, $N$ is the total number of experts and $M$ quantifies information leakage. Our algorithm shows superior performance on some real-world datasets compared to other state of the art algorithms.