One of the basic problems of learning is to separate available data, assigning it to one of several clusters. Even given an infinite amount of (unlabeled) data, i.e., the whole underlying probability distribution, it is often unclear how to draw the boundary. In my talk I will discuss certain geometric intuitions, allowing one to provide a formal definition of optimal clustering and will describe some recent work, connecting this notion to spectral clustering and some ideas from semi-supervised learning.