We consider the problem of designing `Learning to Control' algorithms for stochastic systems when the model parameters are unknown. Two models are considered: Markov Decision Processes and Linear Stochastic systems. A Thompson-sampling based regret-optimization learning framework is developed. The proposed algorithms achieve O(sqrt{T}) which is order-optimal. Numerical evaluation suggests robust performance of the algorithms.