We are concerned with how to select significant variables in semi-parametric modeling. Variable selection for semi-parametric regression models consists of two components: model selection for nonparametric components and selection of significant variables for parametric portion. Thus, it is much more challenging than that for parametric models such as linear models and generalized linear models because traditional variable selection procedures including stepwise regression and the best subset selection require model selection to nonparametric components for each sub-model.
The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.
Type of Event:
With the advance of biotechnology, massive "omics" data, such as genomic and proteomic data, become rapidly available in population based studies to study interplay of genes and environment in causing human diseases. An increasing challenge is how to analyze such high-throughput "omics" data, interpret the results, make the findings reproducible. We discuss several statistical issues in analysis of high-dimensional "omics" data in population based "omics" studies.
Functional data analysis has received considerable recent attention and a number of successful applications have been reported. In this paper, asymptotically simultaneous confidence bands are obtained for the mean function of the functional regression model, using piecewise constant spline estimation. Simulation experiments corroborate the asymptotic theory. The confidence band procedure is illustrated by analyzing the CD4 cell counts of HIV infected patients. This talk is based on Ma, S., Yang, L. and Carroll, R. (2010)
The sequential Monte Carlo (SMC) methodology has shown a great promise in solving a large class of highly complex inference and optimization problems. Although it was originally designed to solve on-line filtering and smoothing of non-linear non-Gaussian state space models, it has been shown to be equally powerful in dealing with fixed-dimensional problems, utilizing a sequential decomposition principle. In this talk we discuss issues and efficient implementations of SMC for dealing with high dimensional distributions that are defined on restricted and ill-shaped spaces.
Advances in DNA sequencing, genotyping, and microarray technologies are providing new opportunities in all areas of biology. The rate of data increase and cost decrease over the past 2 decades has exceeded Moore's law, resulting in ever larger datasets in the hands of increasing numbers of researchers. Thus, the need for new statistical and other analytical tools is increasing tremendously. I will present information about the types of genetic and genomic data that are being collected in general and at UGA specifically.
Penalized splines are a popular method for nonparametric function estimation in partial linear generalized regression models. Constrained versions are presented in this talk, which are useful if the function is known to be increasing or convex. The shape assumptions often fall into the category of a priori knowledge, but occasionally the research question might concern the shape. A model-selection criterion for determining if the constraints hold is shown to have nice large-sample properties and to perform well in small samples. Several applications are presented.
We propose a cross-validated version of the design-based variance estimator of survey estimators, and describe its use in several survey applications. The estimator is based on the same "leave-on-one" principle as traditional cross-validation, but takes the design effects on the variance into account. We apply the cross-validated estimator as a design-based model selection tool for regression estimators, and show that it is effective in minimizing the asymptotic design mean squared error of regression estimators, both those using parametric and nonparametric models.
A penalized polynomial spline method will be introduced for simultaneous model estimation and variable selection in additive models. The proposed method approximates the nonparametric functions by polynomial splines, and minimizes the sum of squared errors subject to an additive penalty on norms of spline functions. This approach sets estimators of certain function components to exactly zero, thus performing variable selection.
The use of economic and statistics principles have been instrumental in developing many quantitative methodologies in finance, for example the famous formula of Black-Scholes that led to a Noble Prize in economics. In order to research in mathematical finance, it is essential to understand both economic principles and the ever changing financial activities in the market.
In family studies, canonical discriminant analysis can be used to find linear combinations of phenotypes that exhibit high ratios of between-family to within-family variabilities. But with large numbers of phenotypes, canonical discriminant analysis may over-fit. To estimate the predicted ratios associated with the coefficients obtained from canonical discriminant analysis, two methods are developed; one is based on bias correction and the other based on cross-validation. Because the cross-validation is computationally intensive, an approximation to the cross-validation is also developed.