The classical theory of Linear Discriminant Analysis assumes the existence of a non-singular empirical covariance matrix. However, nowadays many applications work with data bases where the number of variables (p) is larger than the number of observations (n). Although there are classification rules specifically designed to tackle these problems (e.g. Tibshirani, Hastie, Narismhan, and Chu 2003, Fan and Fan 2008, Duarte Silva 2010) their theoretical properties are often not well known. In this presentation, asymptotic bounds on expected error rates for some linear classifiers will be reviewed and extended.
In one of the first attempts to study theoretical properties of classification rules in the large p, smaller n setting, Bickel and Levina (2004) proposed an asymptotic framework that allows the number of variables to grow faster than the number of observations. Assuming that the vector of mean differences can be estimated consistently as the number of variables grows without limit, these authors have shown that the expected error of the naive rule that ignores all sample correlations can approach a constant close to the expected error of the optimal Bayes rule. In this presentation a framework will be proposed that allows the identification of general conditions in which results of this type can be generalized, and tighter asymptotic bounds can be derived.
The analysis discussed in the previous paragraph focus on the impact of estimating covariances by estimators belonging to some restricted class of well-conditioned matrices, while assuming that the estimation error in the vector of mean differences can be bounded. However, this property does not hold for the vector of sample mean differences without any form or regularization of variable selection. Some asymptotic bounds for error rates of linear classification rules that combine a variable selection scheme with an independence-based covariance matrix estimator were derived in Fan and Fan (2008). Here, it will be shown how Fan and Fan results can be generalized to classification rules based on well-conditioned, but not necessarily diagonal, covariance estimators.
Bickel, P.J. and Levina, E. (2004). Some theory for Fisher's linear discriminant function, naive Bayes and some alternatives when there are more variables than observations. Bernoulli, 10, 989-1010.
Duarte Silva, A.P. (2010). Linear Discriminant Analysis with more variables than observations. A not so naive approach. IN LOCAREK-JUNGE, CLAUS WEIHS (Eds.) Classification as a Tool for Research - Proceedings of the 11th IFCS Biennial Conference and 33rd Annual Conference of the Gesellschaft für Klassifikation. Springer-Verlag, 227-234.
Fan J. and Fan, Y. (2008). High dimensional classification using Features Annealed Independence Rules. The Annals of Statistics, 38. 2605-2637.
Tibshirani, R., Hastie, B., Narismhan, B. and Chu, G. (2003). Class prediction by nearest shrunken centroids, with applications to DNA microarrays, Statistical Science, 18, 104-117.
Keywords: Asymptotic Analysis; Discriminant Analysis; Expected Misclassification Rate; High-Dimensional Classification
Biography: Pedro Duarte Silva is Associate Professor of Statistics and Data analysis at the Faculdade de Economia e Gestão of the Portuguese Catholic University.
Dr. Duarte Silva obtained his doctorate in 1994 at the Terry College of Business of the University of Georgia, and for the past 16 years has published extensively in several international journals and conferences. His research work spans through the areas of multivariate data analysis, particularly classification and discriminant analysis methodologies, development of variable and model selection algorithms for multivariate data analysis, modeling of multidimensional complex data known as symbolic data, and the study of multiple criteria decision aid methodologies.