Some important inequality indicators are highly influenced by outliers in the upper tail of the income distribution. High-influencial observations, which are correctly measures could be considered unique in the population, need to be downweighted or replaced in the estimation process.
In general, the upper tail of income data behaves differently than the main part and modeling the upper tail of the data with a Pareto distribution is a quite natural concept. However, if the Pareto distribution is modeled using classical methods, the nonrepresentative outliers have a strong influence on the resulting fit. As a remedy, various robust methods for Pareto tail modeling have been introduced in the literature to date. Nevertheless, none of these methods take sample weights into account. Therefore, promising methods for fitting a Pareto distribution have been adapted to incorporate sample weights and robustness.
Close-to-reality simulation studies are performed to demonstrate that robust Pareto tail modeling with these adapted methods can be used to reduce the influence of nonrepresentative outliers on the inequality indicators such as the quintile share ratio or the Gini coefficient. Moreover, we can show that in many cases semi-parametric modelling outperforms trimming and windsoring methods as well as parametric modeling.
All methods have been implemented in the R package laeken which can be freely used for the estimation of social inclusion indicators using complex (household) surveys. In addition, calibration and variance estimation methods that are able to deal with complex sampling designs are included. All these methods are applicable also for large surveys.
The ideas of robustness in semi-parametric modelling are shown as well as the simulation design used and the practical implementation in R.
Keywords: Social inclusion indicators; Pareto distribution; Semi-parametric modelling; R
Biography: Matthias Templ works as assistent professor at the Department of Statistics and Probability Theory of the Vienna University of Technology. He is additionally employed as researcher at the methods unit at Statistics Austria and founded (together with A. Kowarik and B. Meindl) data-analysis OG, a company for coaching and consulting in statistics. He has a diploma in Mathematics and a PhD in Statistics with the thesis in the area of imputation and statistical disclosure control for that he got the award for the best PhD thesis in applied/computational statistics from the Austrian Statistical Society.