The Data Accumulation Method for Symbolic Principal Component Analysis
Manabu Ichino1, Paula Brito2
1College of Science and Technology, Tokyo Denki University, Hatoyama, Japan; 2Faculty of Economics & LIAAD-INESC Porto LA, University of Porto, Porto, Portugal

Quantile representation provides a common framework to represent symbolic data described by variables of different types. The principle is to express the observed variable values by some predefined quantiles of the underlying distribution. This common representation then allows for a unified analysis of the data set, taking all variables simultaneously into account [1-3]. The quantile method for symbolic principal component analysis [1, 2] transforms the given (N objects)×(d features) symbolic data table to a {N×(m+1) sub-objects}×(d features) numerical data table, where m is a preselected integer number to determine the number of quantiles. Then, we can execute the standard PCA for this transformed data table. The quantile method for PCA is based on the fact that a monotone property of symbolic objects is characterized by the nesting structure of the Cartesian join regions [1, 2]. In this paper, we present the data accumulation method for symbolic PCA. When we have n symbolic data tables of the form (N objects)×(d features), the data accumulation method accumulates these n symbolic data tables to a {n×N×(m+1) sub-objects}×(d features) numerical data table, by first computing the (m+1) quantiles, for preselected m, for each table, and then accumulating the values for the n tables into a single one. Then, we execute the PCA for this transformed data table. Since we often encounter periodically summarized data tables, the data accumulation method for symbolic PCA becomes a useful tool to understand n data tables as a whole. We present several examples in order to show the usefulness of this method.


[1] Ichino, M. (2008): Symbolic PCA for Histogram-Valued Data. In: Proc. IASC 2008, December 5-8, Yokohama, Japan, 123.

[2] Ichino, M. (2011): The quantile method for symbolic principal component analysis, Statistical Analysis and Data Mining (in press).

[3] Brito, P. and Ichino, M. (2010): Symbolic clustering based on quantile representation, In: Proc. COMPSTAT2010, August 22-27, Paris, France.

Keywords: PCA; Nesting structure; Quantile method; Data accumulation

Biography: Paula Brito is Associate Professor at the Faculty of Economics, and member of the Artificial Intelligence and Decision Support Research Group of the University of Porto, Portugal. She holds a doctorate degree in Applied Mathematics from the University of Paris Dauphine. Her current research interests include multivariate data analysis methods, with particular incidence in clustering. She has focused on the analysis of multidimensional complex data, known as symbolic data, for which she develops statistical approaches and multivariate analysis methodologies. In this context, she has been involved in two European research projects. She has been invited speaker at several international conferences and Chair of COMPSTAT 2008. She is presently Vice-President of the Portuguese Statistical Society, and has been Vice-President of the IASC Section of the ISI for the last term.