Tags: Colloquium Series

The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.

Recent developments for analyzing droplet-based single cell transcriptomic data Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes. The newly developed droplet-based technologies enable efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the rapid technology advance,…
The design of the National Forest Inventory for the Royal Government of Bhutan The Ministry of Agriculture and Forests of the Himalayan nation of Bhutan recently released its second report on the forest resources of the land. Planning for its first ever National Forest Inventory(NFI) had begun in 2009 with the aim of providing a model and comprehensive accounting of the country’s terrestrial resources. Details of the sampling design and its…
Mining Differential Correlation Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. This talk will describe a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). DCM is a special case of differential analysis for weighted networks, and is distinct from standard analyses of first order…
D-Optimal Designs for Multinomial Logistic Models We consider optimal designs for general multinomial logistic models, which cover baseline-category, cumulative, adjacent-categories, and continuation-ratio logit models, with proportional odds, non-proportional odds, or partial proportional odds assumption. We derive the corresponding Fisher information matrices in three different forms to facilitate their calculations, determine the conditions…
Neyman-Pearson Classification Algorithms and NP Receiver Operating Characteristics In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural…
Semi-Low-Dimensional Inference Via Bias Correction  We consider statistical inference in a semi-low-dimensional approach to the analysis of high-dimensional data. The relationship between this semi-low-dimensional approach and regularized estimation of high-dimensional objects is parallel to the more familiar one between semi-parametric analysis and nonparametric estimation. Low-dimensional projection methods are used to correct the bias of…
As the rapid development of biotechnology, more complex data sets are now generated to address extremely complex biological problems. It is challenging to develop new statistical methods to analyze such data. In this thesis, I propose a nonparametric hypothesis test and two statistical learning methods to solve biological problems arising from epigenomics, metagenomics, and neuroimaging. First, the proposed test aims at testing the significance…
Large and complex data have been generated routinely from various sources, for instance, time course biological studies and social media. Classic nonparametric models, such as smoothing spline ANOVA models, are not well equipped to analyze such large and complex data. To overcome these challenges, I propose novel nonparametric methods under a reproducing kernel Hilbert space framework to (1) significantly reduce daunting computational costs of…
With the rapid development of technology, increasing amount of data has been produced from many fields of science, such as biology, neuroscience, and engineering. The inadequate sample is no longer a bottleneck of modern statistical research. More often, we are facing data of extremely high dimensionality or coming from remarkably different sources. How to effectively extract information from the large-scale and high-dimensional data or data…
We discuss optimal designs for the panel mixed logit model. The panel mixed logit model is usually used for the analysis of discrete choice experiments. The information matrix used in design criteria does not have a closed form expression and it is computationally difficult to evaluate the information matrix numerically. We derive the information matrix and use the obtained form to propose three methods to approximate the information matrix. The…