Generating complex discrete distributions is a challenging problem in machine learning. Existing techniques for generating distributions with high degrees of freedom depend on standard generative models such as Generative Adversarial Networks (GANs), Wasserstein GANs, and related networks. Such models are based on an optimization involving the distance between two continuous distributions. We introduce a Discrete Wasserstein GAN (DWGAN) model which is based on a dual formulation of the Wasserstein distance between two discrete distributions. We derive a novel training algorithm and corresponding network architecture based on the formulation. Experimental results are provided for both synthetic discrete data, and discretized data from MNIST handwritten digits. Beyond theory, we also demonstrate applications of GANs for image processing applications such as colorization and style transformations.