The problem of comparing, classifying and indexing long textual files from large collections is becoming increasingly severe as web applications, digital libraries and genomic studies expand to an unprecedented scale. Established techniques of the past rarely work in these contexts. In computational molecular biology, for instance, edit distances become both computationally unbearable and scarcely significant when they are applied to entire genomes, and are being supplanted by global similarity measures that refer, implicitly or explicitly, to the composition of sequences in terms of their constituent patterns and exploit some underlying notion of relative compressibility. Some such techniques are reviewed in this talk.