We investigate agnostic learning when there is no noise in the labeling function, that is, the labels are deterministic. We show that in contrast to other binary classification learning settings, here the sample complexity of learning a hypothesis class is not fully determined by its VC-dimension. We present classes of any VC-dimension $d$ that are learnable from $O(d/epsilon)$ examples and classes that require samples of size $Omega(d/epsilon^2)$. Furthermore, we show that there exist classes where ERM algorithms are not optimal: While the class can be learned with low sample complexity, any ERM algorithm requires $Omega(d/epsilon^2)$ examples. We introduce a new class parameter and show that it provides a full combinatorial characterization of learning deterministic labeling classes