A statistical analysis of data that have been multiplied by randomly drawn noise variables in order to protect the confidentiality of individual values has recently drawn some attention (Nayak, Sinha, and Zayatz, 2011; Sinha, Nayak, Zayatz, 2012). If the distribution generating the noise variables has low to moderate variance, then noise multiplied data have been shown to yield accurate inferences in several typical parametric models under a formal likelihood based analysis (Klein, Mathew, and Sinha, 2012).
The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.
Type of Event:
Introductory statistics is in need of a radicalreconceptualization. This need comes from changes to our culture and from revolutionary changes in technology. We propose a new model for introductory statistics that aims to produces citizen statisticians-- citizens capable of critically engaging with data.
This is an exciting and influential time for the field of Statistics in science. Technological advances in genetic, genomic, and the other 'omic sciences are providing large amounts of complex data that are presenting a number of challenges for the biological community. Many of these challenges are deeply rooted statistical issues that involve experimental design.
This talk consists of two research topics regarding modern non-standard data analytic situations. In particular, data under the High Dimension, Low Sample Size (HDLSS) situation and data lying on manifolds are analyzed. These situations are related to the statistical image and shape analysis.
Recently there has been an interest in asymptotic expansions of the tail probabilities of a variety of processes that are ubiquitous in statistics. However, little to no work has been done when the AR(1) process is built upon extreme value random variables. This process appears when the distribution of the current maximum is dependent on the previous. The goal of this dissertation is to explore asymptotic expansions of tail probabilities on this topic, in particular using the Gumbel distribution.
Complex time series with features, such as non-linearity, high-dimensionality and functional structures, have inspired many interests in statistics community due to limitations of traditional time series models and advancement of methodology and theory of nonparametric statistics. In this dissertation, the nonparametric models for such complex time series are studied. For modeling the financial volatility, we proposed estimators for semiparametric GARCH models with additive autoregressive components linked together by a dynamic coefficient based on spline smoothing.
In this dissertation, I propose an empirical likelihood based method to solve the nonresponse problem and changepoint detection problem. Both methods avoid potential model misspecification problems from which existing parametric methods may suffer. Moreover, the proposed imputation method can correct the bias of the estimate of the complete data for distributions with under- or over-dispersion problem. And the empirical likelihood changepoint detection method is able to detect the change in parameters other than the population mean.
With the rapid development of second-generation sequencing technologies, RNA-Seq has become a popular tool for transcriptome analysis. It offers the chance to detect novel transcripts by obtaining tens of millions of short reads. After mapped to the genome and/or to the reference transcripts, RNA-Seq data can be summarized by a tremendous number of short-read counts. The huge number of short-read counts enables researchers to make transcript quantification in ultra-high resolution.
We consider the problem of detecting hotspots in spatial point patterns observed over time while accounting for an inhomogeneous background intensity. For example, in disease surveillance, the interest is often in identifying regions of unusually high incidence rate given a background incidence rate that may be spatially varying due to underlying variation in population density, say. I will present a K-scan method that uses components of the inhomogeneous K function to identify such anomalies or hotspots.
We will discuss a new approach to estimation in the classical multivariate linear model that yields estimators of the coefficient matrix with the potential to be substantially less variable asymptotically than the standard estimators. The new approach arises by recognizing that the response vector may contain information that is immaterial to the purpose of estimating the coefficients, but can still introduce substantial extraneous variation into estimation.