As a measure of statistical dependence, mutual information is remarkably general and has several intuitive interpretations. One common unavoidable problem of mutual information is its estimation. In the first part, we will talk about the most popular class of the MI estimators which take k-nearest-neighbor approaches and address their limitations. As for the second part, we will focus on a variational approach for estimating mutual information and show how this estimation can improve the classical feature selection problem in machine learning as well as make connections to recent advances in deep unsupervised learning.