Massively large data sets are routine and ubiquitous given modern computer capabilities. What is not so routine is how to analyse these data. One approach is to aggregate the data sets according to some scientific criteria. The resultant data are perforce symbolic data, i.e., lists, intervals, histograms, and so on. Applications abound, especially in the medical and social sciences. Other data sets (small or large in size) are naturally symbolic valued, such as species data, data with measurement uncertainties, confidential data, and the like.
Unlike classical data which are points in p-dimensional space, symbolic data are hypercubes or Cartesian products of distributions in p-dimensional space. We describe such data and how they arise. We look briefly at some of the differences between classical and symbolic data and their respective methodologies, through illustrations.
More information about Lynne Billard may be found at http://www.stat.uga.edu/people/faculty/lynne-billard