Consider a group of variables that can help us predict a quantity of interest. 
If obscuring any subset of these variables completely destroys our ability to make an accurate prediction, we say that these variables exhibit informational synergy -- the predictors are more than the sum of their parts. 
Typical computational models can learn synergistic predictors that are fragile with respect to noise. 
We show how to optimize learning to find relationships that are minimally synergistic and demonstrate the usefulness of this approach with several promising results for problems like covariance estimation and deep learning.