In the past decades, we have witnessed the revolution of information technology. Its impact to statistical research is enormous. This talk attempts to address recent developments and some potential research issues in Business, Industry and Government (BIG) Statistics, with special focus on computer experiment and information systems. An overall introduction and review will be given, followed by specific research potentials. For each subject, the problem will be introduced, some initial results will be presented, and future research problems will be suggested.
The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.
Type of Event:
This lecture is concerned with probability models for distance matrices, which are non-negative symmetric matrices of negative type. Several families of distributions are considered, including Wishart distance matrices and Mahalanobis distance matrices, all derived ultimately from Gaussian matrices by marginalization. The likelihood functions are obtained in a relatively straightforward manner without an explicit representation of the joint density.
Commonly utilized in educational testing, models within the unidimensional item response theory (IRT) framework locate a student’s overall ability along a latent continuum by modeling the response probabilities to a set of test items as a function of a single continuous latent variable. Diagnostic classification models (DCMs) are an emerging class of models that, in contrast to IRT models, identify the separate components of what students know (distinct skills or abilities called attributes) by modeling response probabilities as a function of a set of categorical latent variables.
Reliability or survival analysis is traditionally based on time-to-failure data. In high-reliability applications, there is usually a high degree of censoring, which causes difficulties in making reasonable inference. There are a number of alternatives to increasing the efficiency of reliability inference in such cases: accelerated testing, collection and use of extensive covariate information, and the use of multistate and degradation data when available. This talk will focus on the last topic. The first part of the talk deals with degradation data.
Last decade has seen rapid advances in genomic technologies. These technologies have provided researchers with tools to probe the genetic basis of complex diseases/traits. There is a wide gap between these genomic technologies and the developments of methods to analyze the massive data as well as lack of computer technologies to facilitate the analyses. The analysis and interpretation of the data they generate is exceptionally challenging due to the amount and sophistication of these data. This presentation discusses the methods needed to understand the massive amount of data.
Parametric and nonparametric models are convenient mathematical tools to describe characteristics of data with different degrees of simplification. When a model is to be selected from a number of parametric candidates, not surprisingly, differences occur when the data generating process is assumed to be parametric or nonparametric. In this talk, in a regression context, we will consider the question if and how we can distinguish between parametric and nonparametric situations and discuss feasibility of adaptive estimation to handle both parametric and nonparametric scenarios optimally.
The availability of powerful computing equipment has had a dramatic impact on statistical methods and thinking, changing forever the way data are analysed. New data types, larger quantities of data, and new classes of research problem are all motivating new statistical methods. We shall give examples of each of these issues, and discuss the current and future directions of frontier problems in statistics.
We will provide a comprehensive review of basics of statistical meta-analysis and discuss its relevance for the problem of drawing inference about a common mean of several univariate normal populations with unknown and unequal variances. This problem, which is related to Behrens-Fisher problem, has many applications, and we will study two real data sets.
Exploring genomic landscapes of different biological endpoints is an important approach for understanding biological processes and disease etiologies. Examples of these endpoints are sequence composition, DNA methylation, histone modifications, and binding sites for different transcription factors. With the completion of human genome project and advances of high-throughput technologies, tightly spaced measurements have been collected from linear chromosomes to create unbiased maps at the whole-genome scale.
In mammalian cells, isoforms of a gene can have highly similar sequences yet encode proteins with remarkably different functional roles. Quantifying cellular abundance of isoforms is therefore of significant biological interest. In this talk, we will review methods for profiling isoform-specific gene expression using high-throughput technologies such as microarrays and ultra high-throughput RNA sequencing (RNA-Seq). We will show the intrinsic non-identifiability issue involved in the isoform deconvolution problem, especially for microarray data.