Statistical Graphics for Aggregated Symbolic Data
Yoshikazu Yamamoto1, Junji Nakano2
1Department of Computer Science and Electronic Engineering, Tokushima Bunri University, Sanuki, Kagawa, Japan; 2Department of Data Science, The Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan

Statistical graphics with interactive operations are useful at the first step of the statistical data analysis, especially when data are new to the analyst and the amount of them is large. Recently, data are collected by automatic data acquisition systems over computer networks, and become so huge that even high-speed computers require considerable time to draw individual data separately. Therefore, we sometimes “aggregate” data by grouping them appropriately to reduce the amount of data without losing the information of the original data too much. Symbolic data analysis handles the group of data as a “aggregate”, a second level data described by several variables which take a little complicated value such as intervals and histograms. Several static statistical graphics has been already proposed to display symbolic data.

We propose interactive operations and dynamic graphics for aggregated symbolic data. Our graphics are based on parallel coordinate plot, and use effectively colors and interactive operations to understand the feature of the symbolic data.

Usual parallel coordinate plot displays a real value by the location on the axis. Symbolic data is expressed by several aggregated values of the group of individual data such as intervals or histograms. It is intuitive to use mean or median of the aggregated symbolic data to specify location on the axis. However, symbolic data also has more information than location by using intervals, boxplots, or histograms. We draw the information by using color. For example, height of histogram is expressed by different colors on the line segment which expresses symbolic data on parallel coordinate plot like graphics.

We can change the group of data interactively and flexibly to produce appropriate symbolic data concept from huge amount of individual data on our graphics.

Keywords: Interactive data visualization; Parallel coordinate plot; Symbolic data

Biography: The presenter is interested in data visualization and user interfaces of statistical analysis software. He is teaching full-time software engineering full-time at TOKUSHIMA BUNRI University, Japan.