Statistical issues in high-throughput profiling of isoform-specific gene
Wednesday, February 23, 2011 - 3:30pm

In mammalian cells, isoforms of a gene can have highly similar sequences yet encode proteins with remarkably different functional roles. Quantifying cellular abundance of isoforms is therefore of significant biological interest. In this talk, we will review methods for profiling isoform-specific gene expression using high-throughput technologies such as microarrays and ultra high-throughput RNA sequencing (RNA-Seq). We will show the intrinsic non-identifiability issue involved in the isoform deconvolution problem, especially for microarray data. We will introduce a statistical approach for profiling isoform-specific gene expression for RNA-Seq data, which uses a joint Poisson model for the estimation and a Bayesian approach for quantifying the uncertainty. We will then generalize the method to accommodate paired-end RNA-Seq data, as well as illustrate its intuitive minimal sufficient statistics and computationally feasible implementation. Time permitting, we will show Fisher information can be used to quantify statistical gains from using a paired-end RNA-Seq protocol.