Information, Statistics, and Practical Achievement of Shannon Capacity (Watch )

Andrew Barron, Yale

Abstract: The interplay of information and statistics occurs in several areas of activity including the following, from which we review a selection.

1) Information quantities have fundamental roles in the understanding of probabilistic phenomena ranging from probability exponents for laws of large numbers, to monotonic approach in the central limit theorem, to convergence of Markov chain distributions, to martingales.

2) Information theory techniques provide fundamental limits in statistical risk characterization, including minimax risks of function estimation.

3) Statistical modeling principles are informed and analyzed using principles of data compression. Characterization of likelihood penalites that are information-theoretically valid and statistically valid. Implications for maximum likelihood, Bayes, and MDL methods.

4) Statistical principles provide formulation and solution of channel communication problems. Sparse superposition encoding and adaptive successive decoding by iterative term extraction in regression with random design. Communication at rates up to capacity for the additive white Gaussian noise channel subject to a power constraint.
 
The Full Monte Carlo: A Live Performance With Stars (Watch )

Xiao-Li Meng, Harvard

Abstract: Markov chain Monte Carlo (MCMC) methods, originated in computational physics more than half a century ago, have seen an enormous range of applications in quantitative scientific investigations. This is mainly due to their ability to simulate from very complex distributions needed by all kinds of statistical models, from bioinformatics to financial engineering to astronomy. The first part of this talk provides an introductory tutorial of the two most frequently used MCMC algorithms: the Gibbs sampler and the Metropolis-Hastings algorithm. Using simple yet non-trivial examples, we demonstrate, via live performance, the good, bad, and ugly implementations. Along the way, we reveal both the mathematical challenges in establishing their convergence rates and the statistical thinking underlying their designs, including the secret behind the greatest statistical magic. Audience participation is required, though no prior experience is needed.

The second part of the talk presents an Ancillary-Sufficient Interweaving Strategy (ASIS), a surprisingly simple and effective boosting method for combating some of the serious problems revealed in the first part. The ASIS method was discovered almost by accident during the struggle of a Ph.D. student (Yaming Yu) with fitting a Cox process model for detecting changes in source intensity of photon counts observed by the Chandra X-ray Telescope from a (candidate) neutron/quark star. Yu’s method for solving that particular problem turned out to be of considerable generality, which ultimately led to the full formulation of ASIS. The method achieves fast convergence by taking advantage of the “beauty and beast” discordant nature of two MCMC schemes to break the usual “stickiness” of a Markov chain, i.e., its high auto dependence and hence slow exploration of the state space. This part of the talk is based on an upcoming discussion article, Yu and Meng (2011), in Journal of Computational and Graphical Statistics.
 
Covariance Matrices Estimation for Time Series

Wei Biao Wu, University of Chicago

Abstract: I will give a tutorial lecture for covariance matrix estimation for time series. Under a short-range dependence condition for a wide class of nonlinear processes described in Wiener (1958), I will show that the banded covariance matrix estimates converge in operator norm to the true covariance matrix with explicit rates of convergence. Such rates are optimal. I will also consider the consistency of the estimate of the inverse covariance matrix. These results are applied to the traditional Wiener-Kolmogorov prediction theory, and error bounds for the finite predictor coefficients are obtained.
 
Sparse Modeling for High Dimensional Data (Watch )

Bin Yu, UC Berkeley

Abstract: Information technology has enabled the collection of massive amounts of data in science, engineering, social science, finance and beyond. Statistics is the science of data and indispensable for extracting useful information from high-dimensional data. After broad successes of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity is used as its proxy. With the virtues of both regularization and sparsity, L1 penalized Least Squares(e.g. Lasso) has been intensively studied by researchers from statistics, applied mathematics and signal processing.

In this tutorial, I would like to give an overview of both theory and pratcice of Lasso and its extensions. I will review theoretical results of Lasso and M-estimation with decomposable penalties under high dimensional statistical models. I will also share experience on using sparse modeling methods in in two on-going projects in two very differnet areas. The first is an on-going collaborative project with the Gallant Neuroscience Lab at Berkeley on human understanding visual pathway. In particular, sparse models (linear, non-linear, and graphical) have been built to relate natural images to fMRI responses in human primary visual cortex area V1. Issues of model validation will be discussed. The second is the on-going StatNews project with the El Guaoui group in EECS at Berkeley where we use sparse methods to derive summaries of newspaper articles on a particular topic. We use human subject experiments to validate our findings.