On Surrogate Variable Analysis for High Dimensional Genetics and Genomics Data

The University of Florida

Thursday, September 1, 2016 - 3:30pm

Unwanted variation in hidden variables often negatively impacts analysis of high-dimensional data, leading to high false discovery rates, and/or low rates of true discoveries.  A number of procedures have been proposed to detect and estimate the hidden variables, including principal component analysis (PCA).  However, empirical data analysis suggests that PCA is not efficient in identifying the hidden variables that only affect a subset of features but with relatively large effects. Surrogate variable analysis (SVA)  has been proposed to overcome this limitation.  But SVA also suffers some efficiency loss for data with a complicated dependent structure among the hidden variables and the variables of primary interest.  In this talk, we will describe an improved PCA procedure for detecting and estimating the hidden variables.  Some new applications of the method will also be discussed. 



Pharmacy South Building, Room 101