Multivariate Outlier Detection for Regression – Imputation and Aggregation Weight Calibration by IRLS
Kazumi Wada, Yutaka Abe
Director of Research Office, National Statistics Center, Tokyo, Japan

Outliers occur very frequently in survey data. Some are corrected if they are error, but some are not if they are true. The latter may spoil regression imputation by ordinary least squares (OLS) and those with large aggregation weight may distort figures in tabulation. In this paper, comparison of iterative reweighted least squares (IRLS) and OLS is made regarding regression imputation with aggregation weight explaining enterprise sales by number of employees. Aggregation weight calibration by the IRLS weight is also discussed. IRLS is easy to calculate, robust to outliers in the dependent variable and therefore, estimated values for imputation are more stable than those of OLS with existence of influential outlier. In addition to the imputation values, IRLS also provides data weight which is a scale of outlyingness. This IRLS weight is useful to adjust aggregation weight so that extreme values do not have excessive influence in statistical tables.

Keywords: Robust regression imputation; Aggregation weight calibration by IRLS

Biography: Kazumi Wada is a researcher of the Director of Research Office, National Statistics Center of Japan, working on the field of multivariate outlier detection for official statistics.