A strategy for analyzing the principal components (PCs) in high dimensional data with mixed variable structure is proposed. The method of principal component analysis (PCA) is often applied, when the number of variables under consideration is too large to treat. The PCs, which are obtained by PCA, are use to reduce the dimension of a data set of original interrelated variables. Where the PCs are consisting of the uncorrelated linear combinations with large variance of these variables. However, the normal PCA, which treat all variables simultaneously, requires many computing resources in high dimensional data with a large number of variables. In addition, it is usually difficult to explain the implications of the PCs obtained by the normal PCA in high dimensional data with mixed variable structure. Then we investigate a new strategy for analyzing the PCs in high dimensional data with mixed variable structure. A novel approach for PCA based on our idea is to divide all variables into several blocks and to execute the PCA to the sets of PCs of the every blocks. We also verified the efficiency of our approach through the concrete example with an educational data set of scholastic ability in Japan, a molecular genetics data set, and so on. The educational data set consists of two sets of another type of observations which are high dimensional data with mixed variable structure. These are the scores of the school record in the high school and the scores of common college entrance examination. The two types of PCA with the normal approach and the new approach are performed and the results on the educational data set are compared. The new approach for PCA provides a flexible tool for tackling large scale models in high dimensional data with mixed variable structure.
Keywords: principal component analysis; high dimensional data; mixed variable structure
Biography: I got a M.D. at Chuo University in Japan. I am interested in principal component analysis as multivariate statistical analysis. Recently, I have investigated a strategy for analyzing the principal components in high dimensional data with mixed variable structure.