Symbolic Data Regression and Clustering Methods

PhD Candidate, University of Georgia Department of Statistics

Friday, April 18, 2014 - 1:00pm

With the development of computing and internet technology, data sets with stupendously large numbers of observations are more and more common. One technique to handle the big data is to aggregate classical data to symbolic data, like lists, intervals, lists with probabilities and intervals with probabilities (histograms). Building clustering methods for symbolic data has been an active area over the past decade. In this dissertation, we first review regression and clustering methods for interval data. Then, we develop a regression approach to single-factor analysis of variance and implement it in the software R. Finally, the clustering method  proposed by Chavent (1998, 2000) is coded and implemented in R and applied to both simulated and practical data. Advantages and disadvantages of using different distances for clustering are also discussed.

Cohen Room 230, Statistics Building
Major Professor(s): 
Dr. Lynne Billard