Issues with Measures of Disclosure Risk
Joe Fred Gonzalez
Office of Research and Methodology, Centers for Disease Control and Prevention, National Center for Health Statistics, Hyattsville, MD, United States

Because of the growing challenge of protecting the privacy and confidentiality of survey respondents, measuring the risk of statistical disclosure is an extremely important topic. Both in practice and as a research topic, the measurement of statistical disclosure risk in microdata and tabular data can be very complex, especially since it is scenario dependent (intruder attacks, data user spontaneous recognition, the context of data environment etc.). During the past several decades there has been extensive research in this area both theoretical and empirical. Measurement of the risk of disclosure is a matter of great concern to disclosure review boards (DRBs). Before a DRB can approve the release of Public Use Files (PUFs) or tabular data from a specific survey, the DRB must feel confident that the risk of disclosure of information about any particular survey respondent (person or establishment) must be extremely low. Although not an exhaustive treatise on this extremely complex topic, this paper will provide a brief review of some of the risk metrics and disclosure risk factors involved in microdata, for example, (Elliott 2001): sampling fractions, level of detail on variables, level of geographical detail, number of key variables available to an intruder, and multilevel considerations (file and record). Also, data divergence (De Waal and Willenborg 1994) could drive risk metrics for microdata. Similarly, there are factors that could drive risk metrics for tabular data (Cox 2001). This paper should be relevant to those researchers, data analysts, and practitioners who are responsible for conducting sample surveys and who must pledge to protect the confidentiality of the respondents in order to collect, analyze, and disseminate survey data.

Keywords: Disclosure risk; Microdata; Tabular data; Risk metrics

Biography: Joe Fred Gonzalez, Jr. is a mathematical statistician at the Centers for Disease Control and Prevention, National Center for Health Statistics, Office of Research and Methodology, Hyattsville, MD, USA. He is also adjunct associate professor at the University of Maryland University College, College Park, MD, USA.