Contemporary computer capacity produces massively large datasets; yet those same computers can have difficulty analyzing them because of that size. One way to handle this is to aggregate the data according to some meaningful scientific question(s). The resulting data sets are perforce symbolic-valued (such as intervals, histograms, multi-modal-valued), thus necessitating new methodology for their analyses. We focus on histogram-valued observations. In this perspective, we first consider distance and dis/similarity measures. Then, we consider monothetic and polythetic divisive clustering algorithms for clustering histograms, along with some quality measures. The methods are illustrated and compared through an application.
Keywords: Histogram data; Distance measures; Monothetic clustering; Polythetic clustering
Biography: Dr Billard is a distinguished “University Professor” at the University of Georgia