We study the similarity and differences between two state-of-the-art large margin classifiers DWD and SVM, and propose a unified family of classification machines, the FLexible Assortment MachinE (FLAME), where SVM and DWD are two special cases within the family. The FLAME family helps to understand the connection and differences between SVM and DWD method, and also improves both methods by providing a better tradeoff between imbalance sensitivity and high dimensional data piling. Several asymptotic properties of the FLAME classifiers are investigated.
The Statistics Department hosts weekly colloquia on a variety of statistcal subjects, bringing in speakers from around the world.
Type of Event:
In this International Year of Statistics it is appropriate to review the long history of statistics education at the school level, a history that laid the foundation for the great possibilities that lie before us today. These possibilities can best be made realities through the concerted efforts of the entire statistics community, especially the academic community, in curriculum development and teacher education.
We introduce a quantile regression framework for analyzing high-dimensional heterogeneous data. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression.
There is ample evidence that in a wide variety of settings, as people try to solve problems they switch strategies. For example, a student taking a test might solve one arithmetic problem using one method and then use a different method on the next problem. More importantly experts use different strategies than novices. Both experts and novices switch strategies, but experts use a different mixture of strategies than novices. Existing psychometric models do not model strategy usage,and cannot capture this critical dimension of expert-novice differences.
Latent structure models can be developed for the mean level of a space-time count data observation process. The focus is on small area health outcomes observed in fixed spatial units and fixed time periods. We assume a Poison data level model with mean parameterized as a weighted mixture of temporal components. Each area has a distribution of weights assigning the area to a component. The model development is within the Bayesian paradigm, and we make a set of different choices for prior distributions for the weights and temporal components.
This paper considers the problem of optimal false discovery rate control under the linear model $Y = X \beta + \epsilon$ , where $\epsilon \sim N(0, sigma^2 I ) $. It is an extension of the normal mean model with arbitrary dependence. To solve the problem, we first propose an adjusted z-surrogate which simplifies the original data and captures the useful information. We show that many commonly used surrogates based on univariate associations are biased and inefficient.
Empirical likelihood is a nonparametric method based on a data-driven likelihood. The flexibility of empirical likelihood facilitates its use in complex settings, which can in turn create computational challenges. Additionally, the Empty Set Problem (ESP) which arises with the Empirical Estimating Equations approach can pose problems with estimation, as data are unable to meet constraints when the true parameter is outside the convex hull.
We consider problems in statistical inference with two-step, monotone incomplete data drawn from a multivariate normal population.
Sufficient dimension folding is the technology to reduce the dimensions of matrix- or array-valued objects as well as keep their data structure. In this talk, I consider the sufficient dimension folding for the regression mean function when predictors are matrix- or array-valued. I propose the concept of central mean folding subspace and its two local estimation methods: folded outer product of gradients estimation (folded-OPG) and folded minimum average variance estimation (folded-MAVE). The asymptotic property for folded-MAVE is established.
Recently, a low cost yet highly sensitive colorimetric sensor array (CSA) for the detection and identification of volatile chemical toxicants has been developed. Classification analysis holds the key to the success of the array in discriminating multiple toxicants. The data output by the CSA are in the form of matrices, which render many traditional classification methods inapplicable. In this talk, I will introduce a matrix discriminant analysis method which can be viewed as a generalization of the conventional LDA method to the data in matrices form.