Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds
Sungkyu
Jung
Thursday, August 23, 2012 - 3:30pm

This talk consists of two research topics regarding modern non-standard data analytic situations. In particular, data under the High Dimension, Low Sample Size (HDLSS) situation and data lying on manifolds are analyzed. These situations are related to the statistical image and shape analysis. The first topic is an asymptotic study of the high dimensional covariance matrix. In particular, the behavior of eigenvalues and eigenvectors of the covariance matrix is analyzed, which is closely related to the method of Principal Component Analysis (PCA). The asymptotic behavior of the Principal Component (PC) directions, when the dimension tends to infinity with the sample size fixed, is investigated. We have found mathematical conditions which characterize the consistency and the strong inconsistency of the empirical PC direction vectors. Moreover, the conditions where the empirical PC direction vectors are neither consistent nor strongly inconsistent are revealed, and the limiting distributions of the angle formed by the empirical PC direction and the population counterpart are presented. These findings help to understand the use of PCA in the HDLSS context, which is justified when the conditions for the consistency occur. The second part concerns data analysis methods for data lying in curved manifolds that are the features from shapes or images. A common goal in statistical shape analysis is to understand variation of shapes. As a means of dimension reduction and visualization, there is a need to develop PCA-like methods for manifold data. We propose flexible extensions of PCA to manifold data: Principal Arc Analysis and Analysis of Principal Nested Spheres. The methods are implemented to two important types of manifolds. The sample space of the medial representation of shapes, frequently used in image analysis to parameterize the shape of human organs, naturally forms curved manifolds, which we characterize as direct product manifolds. Another type of manifolds we consider is the landmark-based shape space, proposed by Kendall. The proposed methods in the dissertation capture major variations along non-geodesic paths. The benefits of the methods are illustrated by several data examples from image and shape analysis.