For many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A shortcoming of black box supervised learning models (e.g., complex trees, neural networks, boosted trees, random forests, nearest neighbors, local kernel-weighted methods, support vector regression, etc.) in this regard is their lack of interpretability or transparency.
The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.
Type of Event:
In Computerized Adaptive Testing (CAT), questions are selected in real time and are adjusted to the test-taker’s latent ability. While CAT has become popular for many measurement tasks, such as educational testing and patient reported outcomes, it has been criticized for not allowing examinees to review and revise their answers. Two main concerns regarding response revision in CAT are the deterioration of estimation efficiency, due to suboptimal item selection, and the compromise of test validity, due to the potential adoption of deceptive test-taking strategies by the examinees.
In this talk, we will discuss research challenges and opportunities of Fog Computing in Cyber-physical Systems and Security and present several case studies. We will first present an innovative Real-time In-situ Seismic Imaging (RISI) system design with fog computing. It is a smart sensor network that senses and computes the 3D subsurface imaging in real-time and continuously.
Linear Gaussian covariance models are Gaussian models with linear constraints on the covariance matrix. Such models arise in stochastic processes from repeated time series data, Brownian motion tree models of phylogenetic data and network tomography models used for analyzing connections in the Internet. Maximum likelihood estimation in this class of models leads to a non-convex optimization problem that typically has many local maxima.
The theory of Compressed Sensing (CS) asserts that an unknown p-dimensional signal can be accurately recovered from an underdetermined set of n linear measurements with n<p, provided that x is sufficiently sparse.
I present an approach to incorporating informative prior beliefs about marginal probabilities into Bayesian latent class models for categorical data. The basic idea is to append synthetic observations to the original data such that (i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records.
For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities.
Random effects models play an important role in model-based small area estimation. Random effects account for any lack of fit of a regression model for the population means of small areas on a set of explanatory variables. In Datta, Hall and Mandal (2011, JASA), we showed that if the random effects can be dispensed with through a statistical test, then the model parameters and the small area means can be estimated substantially accurately. This work is most useful when the number of small areas, m, is moderately large.
The R-squared statistic, or coefficient of determination, is commonly used to measure the predictive power of a linear model. It is interpreted as the fraction of variation in the response explained by the predictors. Despite its popularity, a direct equivalent measure is not available for nonlinear regression models and for right-censored time-to-event data. In this talk, I will show that in addition to a measure of explained variation, another measure of explained prediction error is required to assess the predictive power of a nonlinear model.
We consider generalized linear regression with left-censored covariate due to the lower limit of detection. The complete case analysis by eliminating observations with values below limit of detection yields valid estimates for regression coefficients, but loses efficiency. Substitution methods are biased; and maximum likelihood method relies on parametric models for the unobservable tail probability, thus may suffer from model misspecification.