A Hierarchical Bayesian Model for Record Linkage and Population Size Estimation
Andrea Tancredi
Department of Methods and Models for Economics, Territory and Finance, Sapienza University of Rome, Rome, Italy

We propose and illustrate a hierarchical Bayesian approach for matching statistical records observed in different occasions. We consider both multivariate continuous data and multivariate normal data.

We show how our approach can be profitably adopted in record linkage problems and in capture-recapture setups, where the size of a finite population is the real object of interest. There are at least two important differences among the proposed model-based approach and the current practice in record linkage. Firstly, the statistical model is built up on the actually observed categorical variables and no reduction (to 0-1 comparisons) of the available information takes place. Secondly, the hierarchical structure of the model allows a two- way propagation of the uncertainty between the parameter estimation step and the matching procedure so that no plug-in estimates are used and the correct uncertainty is accounted for both in estimating the population size and in performing the record linkage. We illustrate and motivate our proposal through real applications in the field of official and ecological statistics. Finally, the more general problems concerning the inference with linked data and the possible model extensions to the multiple lists framework will be briefly discussed.

Keywords: Capture-recapture; Markov Chain Monte Carlo; Closed population; Inference with linked data

Biography: Dr Andrea Tancredi is an assistant professor at the department of Methods and Models for Economics, Territory and Finance, Sapienza University of Rome