Statistical Methods with Applications in Epigenomics, Metagenomics and Neuroimaging

Thursday, April 12 2018, 2pm

Room 202, Caldwell

As the rapid development of biotechnology, more complex data sets are now generated to address extremely complex biological problems. It is challenging to develop new statistical methods to analyze such data. In this thesis, I propose a nonparametric hypothesis test and two statistical learning methods to solve biological problems arising from epigenomics, metagenomics, and neuroimaging. First, the proposed test aims at testing the significance of the interaction in bivariate smoothing spline ANOVA model. The derived asymptotic distribution of the test statistic unveils a new version of Wilks phenomenon, and the power is minimax optimal in the sense of Ingster. The performance of the proposed test was demonstrated on discovering differentially methylated regions in a genome-wide DNA methylation study. Second, I propose a statistical learning method that simultaneously identifies microbial species and estimates their abundances without using reference genomes. I show that the proposed method achieves high accuracy in both simulated data and real metagenomic data related to inflammatory bowel disease (IBD), type-2 diabetes (T2D) and obesity. Third, I develop a model-based dictionary learning (MDL) method which provides an effective and flexible framework for different types of data: continuous, discrete and categorical. It also provides a general framework to model data with spatial or temporal correlation. The performance of the MDL method was demonstrated in studying the brain connectivity and learning the cell-type specific expression profile through spatial transcriptomic imaging.

Support us

We appreciate your financial support. Your gift is important to us and helps support critical opportunities for students and faculty alike, including lectures, travel support, and any number of educational events that augment the classroom experience. Click here to learn more about giving.