In this talk, we investigate the benefits of the side information on the universal compression of a sequence from a mixture of parametric sources. Necessary and sufficient conditions on the mixture parameters are determined such that the side information (i.e., the previous sequences from the mixture) results in a reduction in the average codeword length compared with the universal compression without side information. This reduction is characterized using closed form expressions. It is proved that the optimal universal compression with side information corresponds to the clustering of the side information sequences. Further, a clustering technique is presented to utilize the side information, and its performance on compression is validated using computer simulations on real data traces.