For evaluate the the Brazilian Census, the post enumeration method was employed. This method evaluate the coverage error of the Census. For this, it is necessary to recognize those records in the Census which match the same residence in the Post Enumeration Census. Thus, it is necessary the use of a system for do this work.
The Brazilian Institute of Geography and Statistics developed a system that performs the matching process.
This paper presents some aspects of the matching problem and the system.
The system was developed in the R Statistical Computing Environment. Some comments about implementation will be made, such as the criterion of selection of variables, the function of string similarity and its parameters for each variable, the function of numeric similarity for age, the usage of expectation-maximization (EM) algorithm, the resolution of the assignment problem and pruning for avoiding false positive pairs.
Keywords: Post Enumeration Census; matching; expectation-maximization (EM); assignment
Biography: Vinicius Layter Xavier is graduated on statistics and now he is a Master Degree Student in the Systems Engineering and Computer Science Department of the Federal University of Rio de Janeiro (UFRJ).