Tags: Colloquium Series

The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.

In Computerized Adaptive Testing (CAT), questions are selected in real time and are adjusted to the test-taker’s latent ability. While CAT has become popular for many measurement tasks, such as educational testing and patient reported outcomes, it has been criticized for not allowing examinees to review and revise their answers.  Two main concerns regarding response revision in CAT are the deterioration of estimation efficiency, due to…
In this talk, we will discuss research challenges and opportunities of Fog Computing in Cyber-physical Systems and Security and present several case studies. We will first present an innovative Real-time In-situ Seismic Imaging (RISI) system design with fog computing. It is a smart sensor network that senses and computes the 3D subsurface imaging in real-time and continuously. Instead of data collection then post processing, the mesh network…
Linear Gaussian covariance models are Gaussian models with linear constraints on the covariance matrix. Such models arise in stochastic processes from repeated time series data, Brownian motion tree models of phylogenetic data and network tomography models used for analyzing connections in the Internet. Maximum likelihood estimation in this class of models leads to a non-convex optimization problem that typically has many local maxima. Using…
The theory of Compressed Sensing (CS) asserts that an unknown p-dimensional signal can be accurately recovered from an underdetermined set of n linear measurements with n<p, provided that x is sufficiently sparse. However, in applications, the degree of sparsity ||x||_0 is typically unknown, and the problem of directly estimating ||x||_0 has been a longstanding gap between theory and practice. A closely…
I present an approach to incorporating informative prior beliefs about marginal probabilities into Bayesian latent class models for categorical data. The basic idea is to append synthetic observations to the original data such that (i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of…
For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities. In this paper, we propose fast subsampling algorithms to efficiently approximate the maximum likelihood estimate in logistic regression.…
Random effects models play an important role in model-based small area estimation. Random effects account for any lack of fit of a regression model for the population means of small areas on a set of explanatory variables. In Datta, Hall and Mandal (2011, JASA), we showed that if the random effects can be dispensed with through a statistical test, then the model parameters and the small area means can be estimated substantially accurately. This…
The R-squared statistic, or coefficient of determination, is commonly used to measure the predictive power of a linear model.  It is interpreted as the fraction of variation in the response explained by the predictors. Despite its popularity, a direct equivalent measure is not available for nonlinear regression models and for right-censored time-to-event data. In this talk, I will show that in addition to a measure of explained variation,…
We consider generalized linear regression with left-censored covariate due to the lower limit of detection. The complete case analysis by eliminating observations with values below limit of detection yields valid estimates for regression coefficients, but loses efficiency. Substitution methods are biased; and maximum likelihood method relies on parametric models for the unobservable tail probability, thus may suffer from model misspecification.…
Data-based decision making has always been a fundamental part of banking and finance. This has become even more so after the 2008 crisis and the heightened regulatory environment. In this presentation, I will describe the role of statistics in risk modeling and management in large banks, covering model development and model assessment. The talk will give a glimpse into different types of data structures, computing/data platforms used for big…