A Review of Regression Procedures for Randomized Response Data, Including Univariate and Multivariate Logistic Regression, the Proportional Odds Model and Item Response Models
Peter van der Heijden1, Ardo van den Hout2, Laurence Frank1, Maarten Cruyff1, Ulf Bockenholt3
1Methodology and Statistics, Utrecht University, Utrecht, Netherlands; 2MRC Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom; 3Kellogg School of Management, NorthWestern University, IL, United States

In survey research it is often problematic to ask people sensitive questions because they may refuse to answer or they may provide a socially desirable answer that does not reveal their true status on the sensitive question. To solve this problem Warner (1965) proposed randomized response (RR). Here a chance mechanism hides why respondents say yes or no to the question being asked. Thus far RR has been mainly used in research to estimate the prevalence of sensitive characteristics. It is not uncommon that researchers wrongly believe that the RR procedure has the drawback that it is not possible to relate the sensitive characteristics to explanatory variables. Here we provide a review of the literature of regression procedures for dichotomous RR data. Univariate RR data can be analyzed with a version of logistic regression that is adapted so that it can handle data collected by RR. This was originally worked out by Scheers and Dayton (1988). Subsequently the manuscript presents extensions towards repeated crosssectional data that allowed for a change in the design with which the RR data are collected. We also review regression procedures for multivariate dichotomous RR data, such as the model by Glonek and McCullagh (1995), a model for the sum of a set of dichotomous RR data, and a model from item response theory that assumes a latent variable that explains the answers on the RR variables. We end with a discussion of a recent development in the analysis of multivariate RR data, namely models that take into account that there may be respondents that do not follow the instructions of the RR design by answering no whatever the sensitive question asked.

Keywords: Randomized response; Logistic regression; Multivariate randomized response data; Cheating

Biography: Peter van der Heijden is professor of statistics in the department of methodology and statistics at Utrecht University. He has done research into randomized response over the last 15 years.