Functional and very high dimension reduction

The Pennsylvania State University

Thursday, December 1, 2016 - 3:30pm

The talk has two components. In the first component, to study the relation between a univariate response and multiple functional covariates, we propose a functional single index model that is semiparametric. The parametric part of the model integrates the linear regression modeling for functional data and the sucient dimension reduction structure. The nonparametric part of the model further allows the response-index dependence or the link function to be unspecied. We use B-splines to approximate the coecient function in the functional linear regression model part and reduce the problem to a familiar dimension folding model. We develop a new method to handle the subsequent dimension folding model by using kernel regression in combination with semiparametric treatment. The new method does not impose any special requirement on the inner product between the covariate function and the B-spline bases, and allows ecient estimation of both the index vector and the B-spline coecients. The estimation method is general and applicable to both continuous and discrete response variables. We further derive asymptotic properties of the class of methods for both the index vector and the coecient function. We establish the semiparametric optimality, which has not been done before in a semiparametric model where both kernel and B-spline estimation are involved.

In the second component, we study large genetic data available easily due to technology advance. However, in comparison with the data collection procedure, statistical analysis is still much cheaper. Thus, secondary analysis of SNPs data|re-analyze existing data in an effort to extract more information, is attractive and cost effective. We study the relation between gene expression and SNPs through a combination of factor analysis and dimension reduction estimation (FADRE). To take advantage of the flexibility in traditional factor models where the latent factors are not required to be normal, we recommend using semiparametric sufficient dimension reduction methods in the joint estimation of the combined model. The resulting estimator is flexible and has superior performance. We further quantify the asymptotic performance of the parameter estimation and perform inference. The new results enables us to identify statistically significant SNPs concerning gene-SNPs relation in lung tissues for the first time from GTEx data.

Room 306, Statistics Building 1130