The web has changed our lives in many ways, but none are as much fun as our access to large and rich collections of multimedia.  Collections with billions of objects and millions of hours of content are common.  I want to introduce my talk by describing some of the changes the web-scale databases have made to our research. We can now do many tasks, such as tagging content, better by using the web of objects to propagate information.

These large-scale collections have also renewed interest in the very simplest algorithms for processing data.  At the heart of many of these algorithms is finding nearest neighbors.  Locality sensitive hashing is a well-known solution and I will show how to optimize its behavior.  We will show that the optimum parameters for LSH depend only on the distance histograms of the dataset, and not on its dimensionality.  More importantly, our theory helps unify the worlds of space-partioning methods such as k-d trees and randomized algorithms such as LSH.  All of this helps us beat the curse of dimensionality.