Yi Chen

Friday, April 18 2014, 1pm

Cohen Room 230, Statistics Building

Chen

PhD Candidate, University of Georgia Department of Statistics

With the development of computing and internet technology, data sets with stupendously large numbers of observations are more and more common. One technique to handle the big data is to aggregate classical data to symbolic data, like lists, intervals, lists with probabilities and intervals with probabilities (histograms). Building clustering methods for symbolic data has been an active area over the past decade. In this dissertation, we first review regression and clustering methods for interval data. Then, we develop a regression approach to single-factor analysis of variance and implement it in the software R. Finally, the clustering method proposed by Chavent (1998, 2000) is coded and implemented in R and applied to both simulated and practical data. Advantages and disadvantages of using different distances for clustering are also discussed.

Support us

We appreciate your financial support. Your gift is important to us and helps support critical opportunities for students and faculty alike, including lectures, travel support, and any number of educational events that augment the classroom experience. Click here to learn more about giving.