Principal Component Analysis of Model-Valued Data with Constant Numerical Characteristics
Huiweng Wang1,2, Meiling Chen1,2, Nan Li1,2, Lanhui Wang3
1School of Economics and Management, Beihang University, Beijing, China; 2Research Center of Complex Data Analysis, Beihang University, Beijing, China; 3School of Economics and Management, Beijing Forestry University, Beijing, China

Model-valued data is one of the most important types of symbolic data. Each unit of model-valued data matrix contains a histogram or a distribution function. In this paper, a new method about Principal Component Analysis of model-valued data is discussed. The so-called Principal Component Analysis whose essence is to reduce the dimensions of a large data by reconstructing the covariance matrix of dataset. The fundamental elements of the covariance matrix such as mean, variance and the covariance etc and their definition method is important in Principal Component Analysis. In the existing methods in Principal Component Analysis of Model-valued Data, the definition of mean is in distributive data form and the mean is an average level for all model-valued data observations. However, data centralization based on this kind of means actually obtains the residual distributions. The result of Principal Component Analysis in accordance with the matrix of residual distributions may thus fail to present the essential variation of the original data accordingly. In this paper, we define numerical characteristics of Model-valued Data as real constant and obtain the constant type of mean, variance, covariance and correlation coefficient. Centralization in terms of constant numerical characteristics is to relocate the model-valued variances as a whole to get original curves whose gravity center is settled on origin. This definition is more in accord with the nature of traditional centralization than other existing definitions. Furthermore, the Principal Component Analysis of Model-valued Data with Constant Numerical Characteristics is proposed based on the obtained covariance matrix. Simulation shows the effectiveness of the proposed method.

Keywords: Model-valued Data; Principal Component Analysis; Constant Numerical Characteristics

Biography: Professor Huiwen Wang comes from Beihang University, China. She is now the dean of School of Economics and Management, doctoral advisor, the president of Research Center of Complex Data Analysis of Beihang University. And her study work focuses on complex data analysis, symbolic data analysis and partial least squares. Till now, she has published 3 books and more than 60 papers.