A System Developed for Solving the Matching Problem in the Brazilian Census Evaluation Research
Vinicius L. Xavier, Djalma G. Pessoa, Fabio Figueiredo, Andrea Diniz
Department of Systems Engineering and Computer Science, Graduate School of Engineering (COPPE), Federal University of Rio de Janeiro, Rio de Janeiro, Brazil

For evaluate the the Brazilian Census, the post enumeration method was employed. This method evaluate the coverage error of the Census. For this, it is necessary to recognize those records in the Census which match the same residence in the Post Enumeration Census. Thus, it is necessary the use of a system for do this work.

The Brazilian Institute of Geography and Statistics developed a system that performs the matching process.

This paper presents some aspects of the matching problem and the system.

The system was developed in the R Statistical Computing Environment. Some comments about implementation will be made, such as the criterion of selection of variables, the function of string similarity and its parameters for each variable, the function of numeric similarity for age, the usage of expectation-maximization (EM) algorithm, the resolution of the assignment problem and pruning for avoiding false positive pairs.

Keywords: Post Enumeration Census; matching; expectation-maximization (EM); assignment

Biography: Vinicius Layter Xavier is graduated on statistics and now he is a Master Degree Student in the Systems Engineering and Computer Science Department of the Federal University of Rio de Janeiro (UFRJ).