Recent years have witnessed a tremendous growth in multimedia applications in wireless systems. Applications such as VoIP and real-time video will continue to take communications to the next level beyond texts and images, and real-time traffic may eventually dominate the network traffic. Taking note of this, we study scheduling of real-time traffic with hard deadlines, in a Markov modeled downlink, where the base station can learn, via feedback, only the channel condition of the currently scheduled user. This is a joint channel learning and scheduling problem that can be cast as a partially observable Markov decision process. We show that, due to the strict deadline constraint of real-time traffic and the partial observability of the channels, the optimal scheduling involves more intrinsic and complicated tradeoffs beyond the classical `exploitation vs exploration'. Specifically, `exploitation vs exploration' is seen at two timescales: one corresponding to the packet lengths and the other to the hard deadlines. In fact, idling adds a new dimension to the action space and leads to an `exploitation vs exploration' tradeoff, at the slower timescale, between the throughput performance of the backlogged users and that of users arriving in the future. For the two-user system, somewhat surprisingly, a simple greedy policy turns out to be optimal despite the complicated nature of the tradeoffs. Extensive numerical results further indicate that the greedy policy is likely to be optimal for the more general setup.