We consider the problem of model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path of the system is available. We focus on the classical approach of Q-learning where the goal is to learn the optimal Q-function. We propose the Nearest Neighbor Q-Learning that utilizes nearest neighbor regression method to learn the Q function. We provide finite sample analysis of the convergence rate using this method. In particular, we establish that it is guaranteed to output an epsilon-accurate estimate of the optimal Q-function with high probability using a number of observations that depends polynomially on epsilon and the model parameters. To establish our results, we develop a robust version of stochastic approximation results; this may be of interest in its own right.