High Dimensional Statistical Learning

University of North Carolina at Chapel Hill

Monday, February 21, 2011 - 3:30pm
PDF icon 20110221Huang.pdf64.63 KB

In this talk, I will present some new contributions to the area of high dimensional statistical learning. The focus will be on both classification and clustering. Classification is one of the central research topics in the field of statistical learning. For binary classification, we propose the Bi-Directional Discrimination (BDD) method which generalizes linear classifiers from one hyperplane to two or more hyperplanes. BDD provides a compromise between linear and general nonlinear methods. It gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability and lower potential for overfitting of linear classifiers. We discuss the implementation of BDD using Support Vector Machine and Distance Weighted Discrimination (DWD) methods. The performance and usefulness of BDD are assessed using asymptotics, simulations and real data. For multiclass classification, we have generalized the DWD method from the binary case to the multiclass case. Clustering is another important topic in statistical learning. One important issue in clustering is the comparison of clustering results obtained from different clustering algorithms. Sigclust is a recently developed powerful tool to assess the statistical significance of clusters, such as those found by standard algorithms. SigClust is under continuous development. We are working in two directions to improve the performance of SigClust method. One is to investigate ways to estimate the covariance matrix of the null distribution more accurately. The other one is to generalize the two-cluster SigClust method to a proper multi-cluster SigClust method. This talk is based on joint work with my advisors J. S Marron and Yufeng Liu.