In many application areas, the natural representation of data is in the form of an empirical probability distribution. Documents are represented as normalized frequency counts of words, images are represented by color (or other feature) histograms, and speech signals can be represented by spectral histograms. There are many natural and meaningful ways of measuring similarity (or distance) between such distributions, and these define different geometries in which we might wish to analyze collections of distributions for interesting patterns. However, a key practical bottleneck is the high dimensionality of these representations: for example, an 256 x 256 image might be represented by a vector of over 1000 features, and a document might be represented as a sparse vector with hundreds of attributes. Thus, a key problem is: can we reduce the dimensionality of collections of distributions to make data analysis tractable while preserving the distances in the collection ? In this talk, I'll discuss a collection of recent results centered around this theme, that provide both good news (and bad) for dimensionality reduction on distributions in theory and in practice.