HIV Prevalence Estimation in the Presence of Missing Data: A Bounding Approach
Bruno Arpino1, Elisabetta De Cao1,2, Franco Peracchi2
1Dondena Centre for Research on Social Dynamics, Bocconi University, Milan, Italy; 2SEFEMEQ, Tor Vergata University, Rome, Italy; 3Milano, Italy

Knowing the HIV prevalence is crucial for policy makers in order to plan control programs and interventions. HIV prevalence estimated from population-based surveys is usually seriously affected by the presence of nonignorable missing data.

The aim of this paper is to assess the uncertainty about HIV prevalence estimated using data from sample surveys, avoiding the use of strong assumptions that are often made to “solve” the identification problem caused by the presence of missing data. We adopt a bounding approach. Together with worst-case bounds that consider only cross-section information, we propose dynamic bounds which consider the longitudinal information about the HIV status.

Our dynamic bounds exploit the absorbing nature of HIV (i.e. the fact that a person infected in period t has zero probability of becoming HIV negative in period t+1, while a person HIV negative in period t was also HIV negative in period t−1 with probability one). The dynamic bounds reduce, without any additional restrictions, the interval width produced by the worst-case bounds. Then instrumental and monotone instrumental variables restrictions can be used to further narrow the bounds.

We apply this approach to data from Malawi, the Malawi Diffusion and Ideational Change Project (MDICP). The MDICP is the result of a collaboration of the University of Pennsylvania with the College of Medicine and Chancellor College at the University of Malawi. It is a longitudinal study with survey rounds in 1998, 2001, 2004, 2006, 2008, and 2010 that collected data on three Malawian districts: Rumphi, Balaka and Mchinji. We focus, in particular, on the three waves 2004, 2006 and 2008 that contain HIV test results.

Our results clearly show that uncertainty caused by missing data is a serious issue. For example, in 2004 the percentage of missing HIV status is about 40%, the worst-case bounds are 3%-46%. By only using empirical evidence, dynamic bounds help in shrinking the worst-case bounds, but not sufficiently to get satisfactory results (e.g., dynamic bounds in 2004 are 3%-31%).

Plausible instrumental variable restrictions, based on interviewers' characteristics, are effective in narrowing the bounds. For example, in 2004 using interviewers' age as instrument produce bounds equal to 5%-9%.

Finally we notice that ignoring the missing data on HIV status, and computing the so called “complete case estimates” based only on known HIV status, we obtain estimates that are close to the lower bounds, but the HIV prevalence could be much higher.

Keywords: HIV prevalence; Survey non-response; Partial identification; Dynamic bounds

Biography: Elisabetta De Cao is a PhD student in Econometrics and Empirical Economics at Tor Vergata University in Rome, and she is going to finish her program in June 2011. Her thesis is a collection of three empirical papers on health related issues. She spent 2008-2009 academic year, and Spring 2010 at the University of Pennsylvania. She is currently research fellow at Dondena Centre (Bocconi University). The aim of the project she is involved in is to contribute to strengthening decision making in relation to the introduction of new vaccines in European Countries, by the development of a methodological framework for the evaluation of the effectiveness and cost-effectiveness of vaccination programs. Her research interests include health economics, development economics, microeconometrics, applied statistics, and infectious diseases modelling.