Many learning algorithms rely on the stringent assumption that the learning environment is stationary. Namely, the data distributions on which statistical models are optimized and to which the models are applied are the same. Real-world applications are far more complex. For instance, object recognition systems often suffer from significant performance degradation if they are trained and evaluated on different datasets. We address this important challenge by adapting probabilistic models across different distributions. Our key insight is to discover and exploit intrinsic and hidden structures which are resilient to distribution changes.