Principal Component Analysis Applied to Polytomous Quadratic Logistic Regression
Inácio Andruski-Guimarães
DAMAT - Departamento Acadêmico de Matemática, UTFPR - Universidade Tecnolόgica Federal do Paraná, Curitiba, Paraná, Brazil

Is well known that the logistic regression model are a powerful method widely applied for modeling the relationship between a categorical dependent variable and a set of explanatory variables, or covariates, both continuous or discrete. Many papers on logistic regression have only considered the logistic regression model with linear discriminant functions, but there are situations where quadratic discriminant functions are useful, and works better. However, the quadratic logistic regression model involves the estimation of a great number of unknown parameters, and this leads to computational difficulties when there are a great number of independent variables. Furthermore, a great number of parameters should be avoided, because of the risk of over-fitting. This paper proposes to use a set of principal components of the explanatory variables, in order to reduce the dimensions in the problem, with continuous independent variables, and the computational costs for the parameter estimation in polytomous quadratic logistic regression, without loss of accuracy. Examples on datasets taken from the literature show that the quadratic logistic regression model, with principal components, is feasible and, generally, works better than the classical logistic regression model with linear discriminant functions, in terms of correct classification rates.

References:

Aguilera, A. M., Escabias, M., Valderrama, M. J., Using principal components for estimating logistic regression with high-dimensional multicollinear data. Computational Statistics & Data Analysis 55, 1905-1924 (2006)

Anderson, J. A., Quadratic logistic discrimination, Biometrika, 62, pp. 149-154, (1975).

Andruski-Guimarães, I. and Chaves-Neto, A., Estimation in polytomous logistic model: comparison of methods, Journal of Industrial and Management Optimization, 5, pp. 239-252 (2009).

Gervini, D. Robust adaptive estimators for binary regression models. Journal of Statistical Planning and Inference,131, 297–311 (2005).

Heinze, G. and Schemper, M., A solution to the problem of separation in logistic regression, Statistics in Medicine 21, 2409-2419 (2002).

Kodzarkhia, N., Mishra, G. D. and Reiersolmoen, L., Robust estimation in the logistic regression model. Journal of Statistical Planning and Inference, 98, 211–223 (2001).

Rousseeuw, P. J. and Christmann, A., Robustness against separation and outliers in logistic regression, Computational Statistics & Data Analysis 43, pp. 315-332 (2003).

Keywords: Polytomous Logistic Regression; Quadratic Logistic Regression; Principal Component Analysis; Polytomous Response

Biography: Inácio Andruski Guimarães has a degree in Mathematics from Pontifícia Universidade Catόlica do Paraná, at Curitiba, Brazil. Obtained his Ph.D. in Mathematical Programming from the Universidade Federal do Paraná, at Curitiba, Brazil. Nowadays he works at Universidade Tecnolόgica Federal do Paraná, at Curitiba, teaching Statistics and Calculus. His interest areas are Statistical Pattern Recognition, Multivariate Analysis and Statistical Process Control.