Accurate and Powerful Multivariate Outlier Detection
Andrea Cerioli, Marco Riani, Francesca Torti
Economia, Sezione di Statistica e Informatica, Università di Parma, Parma, Italy

The forward search applied to the analysis of data divides the data into a good portion that agrees with the postulated model and a set of outliers, if any (Atkinson et al., 2004; Riani et al., 2009). This is achieved by fitting subsets of data of increasing size. This paper will focus on the multivariate setting, where the agreement between data and model is measured through Mahalanobis distances. We will show the ability of the forward search to strive a balance between the two enemy brothers of robust statistics: robustness against contamination and efficiency under the postulated multivariate normal model. The properties of the forward search, together with the use of accurate finite-sample distributional results, lead to a detection rule for multivariate outliers that has low swamping with well behaved data and high power under a variety of contamination schemes. In this paper we will provide evidence of the statistical properties of the forward search detection rule and make a comparison with inferences that come from the plethora of robust methods available for this problem, including the MCD and multivariate S-estimators. We will also illustrate some of the informative graphs naturally arising as a by-product of the forward search detection rule through dynamic linked plots from the FSDA Matlab toolbox.

References:

Atkinson, A. C., Riani, M. and Cerioli, A. (2004). Exploring Multivariate Data with The Forward Search. Springer, New York.

Riani, M., Atkinson A. C. and Cerioli, A. (2009). Finding an Unknown Number of Multivariate Outliers, Journal of the Royal Statistical Society, Ser. B, 71, 447–466.

Keywords: Multivariate outliers; Forward Search; High-breakdown estimators; masking and swamping

Biography: Andrea Cerioli is Professor of Statistics at the University of Parma, Italy. He is author or coauthor of more than 80 scientific peer-reviewed papers, most of them published in international journals or books. He is Co-Editor of Statistical Methods and Applications, the journal of the Italian Statistical Society, and President of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society.