Home Organization Outline Talks Logistics Participants Sponsors

Information theory for high throughput sequencing

David Tse, U.C. Berkeley


Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Many high-throughput sequencing based assays have been designed to make various biological measurements of interest. A key computational problem is that of assembly: how to reconstruct from the many millions of short reads the underlying biological sequence of interest, be it a DNA sequence or a set of RNA transcripts? Traditionally, assembler design is viewed mainly as a software engineering project, where time and memory requirements are primary concerns while the assembly algorithms themselves are designed based on heuristic considerations with no optimality guarantee. In this talk, we outline an alternative approach to assembly design based on information theoretic principles. Starting with the question of when there is enough information in the reads to reconstruct, we design near-optimal assembly algorithms that can reconstruct with minimal amount of read information. We illustrate our approach in two settings: DNA sequencing and RNA sequencing. We report preliminary results from ShannonRNA, a RNA-Seq assembler currently under development, and compare its performance with state-of-the-art software in the field.

On the interaction between network coding and the physical layer - information theoretic results and a case study.

Muriel Medard, MIT


The question of how and whether to join physical layer and network coding remains actively investigated. In this talk, we present some recent information-theoretic results and an application illustrating a view of this question. We begin by considering whether there are benefits, in capacity, from integrating network and physical layer coding. Network equivalence implies it is interference and broadcast effects, rather than the presence of noise, that might lead to such capacity benefits. However, in regimes of low SNR and of high SNR, where broadcast and interference, respectively, may be the dominant effects, integration appears to provide negligible benefits. We conclude with results from the first network coding chip. The benefit of both network and physical coding can be seen to be cumulative, even in the absence of coordination between the two.

Computer perception with deep learning

Yann LeCun, New York University


Pattern recognition tasks, particularly perceptual tasks such as vision and audition, require the extraction of good internal representations of the data prior to classification. Designing feature extractors that turns raw data into suitable representations for a classifier often requires a considerable amount of engineering and domain expertise.

The purpose of the emergent area of 'Deep Learning' is to devise methods that can train entire pattern recognition systems in an integrated fashion, from raw inputs to ultimate output, using a combination of labeled and unlabeled samples.

Deep learning systems are multi-stage architectures in which the perceptual world is represented hierarchically. Features in successive stages are increasingly global, abstract, and invariant to irrelevant transformations of the input.

Convolutional networks (ConvNets) are a particular type of deep architectures that are somewhat inspired by biology, and consist of multiple stages of filter banks, interspersed with non-linear operations, and spatial pooling. Deep learning models, particularly ConvNets, have become the record holder for a wide variety of benchmarks and competition, including object recognition in image, semantic image labeling (2D and 3D), acoustic modeling for speech recognition, drug design, asian handwriting recognition, pedestrian detection, road sign recognition, biological image segmentation, etc. The most recent speech recognition and image analysis systems deployed by Google, IBM, Microsoft, Baidu, NEC and others use deep learning, and many use convolutional networks.

A number of supervised methods and unsupervised methods, based on sparse auto-encoders, to train deep convolutional networks will be presented. Several applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on the fly, a system that can label every pixel in an image with the category of the object it belongs to (scene parsing), a pedestrian detector, and object localization and detection systems that rank first on the ImageNet Large Scale Visual Recognition Challenge data. Specialized hardware architecture that run these systems in real time will also be described.

Cracking the Brain's Neural Codes

Terry Sejnowski, The Salk Institute, UCSD


There is no one neural code. There are many, each adapted for the signals it must represent and the biological constraints it must contend with. Different neural codes are not mutually exclusive so long as the same neural hardware can support both. In fact, the more different two codes are, the more easily different aspects of neural processing - encoding, transmission, transformation, and decoding - can be kept independent of one another. This enables different neural codes to coexist within the same neural system, either concurrently as multiplexed codes or as rapidly interchangeable options.

Imaging the connectome

Jeff W. Lichtman, Harvard University


Connectional maps of the brain may have value in developing models of both how the brain works and how it fails when subsets of neurons or synapses are missing or misconnected. Such maps might also provide detailed information about how brain circuits develop and age. I am eager to obtain such maps in neonatal animals because of a longstanding interest in the ways neuromuscular circuitry is modified during early postnatal life as axonal input to muscle fibers is pruned. Work in my laboratory has focused on obtaining complete wiring diagrams (“connectomes”) of the projections of motor neuron axons in young and adult muscles. Each data set is large and typically made up of hundreds of confocal microscopy stacks of images which tile the 3dimensional volume of a muscle. As a first step to analyze these data sets we developed computer assisted segmentation approaches and to make this task easier, have developed second generation “Brainbow” transgenic mice that in essence segment each axon by a unique fluorescent spectral hue. Once the axons are segmented, we have been able to graph the connectivity matrices that result. This effort has led to new insights into the developmental processes which help the mammalian nervous system mold itself based on experience. Analysis of these complete muscle connectomes show a striking single axis gradient of connectovity that we think is related to the ordered ranking of neural activity in axons (the "size principle" of Henneman). In brain however, as opposed to muscle, the high density of neuropil is overwhelming, which has precluded using the confocal optical approaches that have worked in the peripheral nervous system because there are too many neural processes in each optical section. We have thus developed of lossless automated physical sectioning strategy that generates thousands of ultra thin (~25 nm) sections on a firm plastic tape. We have developed a thin-section scanning electron microscopy approach to visualize these sections at 3nm lateral resolution. This method makes large scale serial microscopic analysis of brain volumes more routine. We are now focused on developing an automated pipeline to trace out neural circuits in brains using this technique.