Dissemination of microdata files should be constrained to the confidentiality pledge under which a statistical agency collects survey data. To protect the confidentiality of respondents, statistical agencies perform a two-stage statistical disclosure control procedure. In the first stage, with respect to a disclosure scenario, the risk of disclosure of each unit is estimated. In order to maximize data utility, in a second stage, a masking method might be applied only to records considered at risk of re-identification, guaranteeing that no confidential information about respondents could be retrieved from the disseminated microdata file.
After the removal of direct identifiers, e.g. name and address, other indirect identifiers, called key variables, could still allow the disclosure of some confidential information about a unit. Usually, most of the key variables registered in social microdata files are categorical. Particular values taken by variables such as place of residence, gender, age, citizenship, and marital status could correspond to a unique person in the population. An important problem in statistical disclosure control (SDC) is the estimation of the (number of) sample uniques that are also population uniques. In this paper, extensions of the Poisson-log-linear model to estimate a disclosure risk measure in contingency tables will be presented. The main contribution will be an analysis of smoothing strategies based on a penalised likelihood approach and on the Lauritzen formula for decomposing graphical models. Preliminary results of several tests performed on real data will be presented.
Keywords: Microdata dissemination; Individual disclosure risk; Log-linear models
Biography: Dr Daniela Ichim is a researcher with the National Statistics Institute, Italy.