We present a framework for scalable analysis of Internet host characteristics in a machine learning setting, by constructing low-dimensional numerical representations of discoverable hosts on the public Internet using a large database of global scan measurements.