Skip to main content
Skip to main menu


Yi Chen

PhD Candidate, University of Georgia Department of Statistics
Cohen Room 230, Statistics Building

With the development of computing and internet technology, data sets with stupendously large numbers of observations are more and more common. One technique to handle the big data is to aggregate classical data to symbolic data, like lists, intervals, lists with probabilities and intervals with probabilities (histograms). Building clustering methods for symbolic data has been an active area over the past decade. In this dissertation, we first review regression and clustering methods for interval data. Then, we develop a regression approach to single-factor analysis of variance and implement it in the software R. Finally, the clustering method  proposed by Chavent (1998, 2000) is coded and implemented in R and applied to both simulated and practical data. Advantages and disadvantages of using different distances for clustering are also discussed.

Support us

We appreciate your financial support. Your gift is important to us and helps support critical opportunities for students and faculty alike, including lectures, travel support, and any number of educational events that augment the classroom experience. Click here to learn more about giving.

Every dollar given has a direct impact upon our students and faculty.