Shrinkage Covariance Estimation Incorporating Prior Biological Knowledge (SHIP) with Applications to High-Dimensional Data
Vincent Guillemot, Monika Jelizarow, Arthur Tenenhaus, Anne-Laure Boulesteix
Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Germany; Department SSE, Ecole Supélec, Gif-sur-Yvette, France

In “-omic data” analysis, information on the structure of covariates are broadly available either from public databases describing gene regulation processes and functional groups such as the Kyoto encyclopaedia of genes and genomes (KEGG), or from statistical analyses - for example in form of partial correlation estimators. The analysis of transcriptomic data might benefit from the incorporation of such a priori information.

We focus on the integration of structured information into statistical analyses in which at least one major step involves the estimation of a (high-dimensional) covariance matrix. More precisely, we revisit the recently proposed “SHrinkage Incorporating Prior” (SHIP) covariance estimation method, which takes into account the group structure of the covariates. In this paper, we suggest to integrate the SHIP covariance estimator into various multivariate methods such as linear discriminant analysis (LDA), global analysis of covariance (GlobalAncova), and regularized generalized canonical correlation analysis (RGCCA).

We demonstrate the use of the resulting new methods based on simulations and show that the benefit of the integration of prior information through the SHIP estimator critically depends on the importance of the covariance matrix in the considered method. Codes for reproducible analysis will be available from a companion website.

Keywords: Shrinkage covariance estimation; Linear discriminant analysis; Global tests via GlobalAncova; Regularized canonical correlation analysis

Biography: Anne-Laure Boulesteix is an assistant professor of Computational Molecular Medicine at the University of Munich. She obtained her PhD in statistics from the University of Munich in 2005 and has been working since then in medical research institutions. Her researches mainly focus on the statistical analysis of high-dimensional biological data with a special emphasis on multivariate methods, prediction models and validation issues.