Bayes Linear Estimation for Finite Population with Emphasis on Categorical Data
Kelly C.M. Gonçalves, Fernando A.S. Moura, Helio S. Migon
Statistics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil

The designed-based sample survey theory has been very appealed to official statistics agencies around all over the world. As pointed out by Skinner et. al (1989), the main reason for that is the distribution-free advantage. However, in some specific situations, the designed-based approach proved to be inefficient, providing inadequate predictors. For instance, estimation in small domains and the presence of the non-response cannot be dealt with designed-based approach without assuming some implicit assumptions, which can be equivalent to assume a model. Supporters of the designed-based approach argue that model-based inference much depends on the model assumptions which might not be true. On the other hand, interval inference for target population parameters relies on the Central limit theorem, which could not be applied in many practical situations, where the sample sizes are not large enough or independent assumptions of the random variables involved are not realistic.

The question that poses is: is it possible to conciliate both approaches? In the superpopulation model context, Zacks (2002) shows that some designed-based estimators could be recovered by using a general regression model approach. Little in Chapter 4 of Chambers and Skinners (2004) claims that: “careful model specification sensitive to the survey design can address the concerns with model specifications and that Bayesian statistics provides a coherent and unified treatment of descriptive and analytic survey inference”.

In the Bayesian context, another appealing proposal of conciliating designed-based and model-based approach was presented by O'Hagan (1985) in an unpublished report. O'Hagan (1985)'s approach is based upon the Bayes Linear Estimator, therefore distribution-free. This methodology is an alternative to the methods of randomization and appears midway between two extreme views: on the one hand the procedures based on randomization and on the other based on superpopulation models. His model formulation assumes only second order exchangeability, describing a prior knowledge about the structures present in the population. He dealt with several population structures, such as stratification and clustering and showed how some common designed-based estimators could be obtained as a particular case of his more general approach.

In this work we set up a general regression framework, where all cases presented in O'Hagan (1985) are particular cases of it. We also presented a Bayes linear approach for obtaining estimation of proportions for categorical data. We illustrated our approach with a real and a simulated data.


Chambers, R.L. and Skinner, C.J.(2003). Analysis of Survey Data. Wiley.

O'Hagan, A. (1985). Bayes linear estimators for finite populations. Technical Report 58, Department of Statistics - University of Warwick.

Skinner, C.J., Holt, D. and Smith, T.M.F.(1989). Analysis of Complex Surveys. Wiley Series in Probability.

Zacks, S. (2002). In the Footsteps of Basu: The Predictive Modelling Approach to Sampling from Finite Population. The Indian Journal of Statistics, Series A, 64, 532-544.

Keywords: exchangeability; linear model; Bayesian linear prediction

Biography: PhD in Statistics from Southampton University, Uk since 1994. I am an associate professor at Federal University of Rio de Janeiro. My current reserarch topics of interest are small area estimation, sample survey using Bayesian statistics, hierarchical models: theory and application.