Demographic Estimates and Projections Using Multiple Data Sources: A Bayesian Approach
John Bryant1, Patrick Graham2
1Statistics New Zealand, Christchurch, New Zealand; 2Bayesian Research, Christchurch, New Zealand

Sub-national population estimates and projections guide billions of dollars of public and private expenditure. Users of estimates and projections expect increasingly high levels of accuracy and detail. An obvious way of meeting these expectations is to use administrative data alongside more traditional data sources such as the census. However, incorporating multiple data sources into population estimation and projection methods is difficult. Data sources are often inconsistent with one another or use incompatible demographic and geographic categories. Traditional methods such as the application of scaling factors break down when there is more than one relevant data source, or when errors in data do not follow simple patterns. There has also been limited research on the formal representation of uncertainty in sub-national estimates and projections.

Our presentation describes a project to develop and implement a new Bayesian framework for population estimation and projection. At the core of the framework is a demographic account giving a complete description of births, deaths, migration, and population counts over the estimation and projection periods. The framework consists of a system model and an observational model. The system model describes how the components of the demographic account change over time. It consists of Bayesian hierarchical models for births, deaths, migration, and population counts. The observational model predicts the contents of each of data source, such as the vital registration system, tax data, or the census, given the contents of the demographic accounts. For instance, it links migration numbers from the demographic account to numbers of changes of address in tax data. Missing data are easily handled. Estimation and projection are carried out together.

Inference is carried out using Markov chain Monte Carlo methods. The algorithm alternates between (i) updating the system model, given the demographic account; (ii) updating the observational model, given the data and the demographic account; and (iii) updating the demographic account given the system model, the observational model, and the data. Step (iii) is the most difficult. The source of the difficulties is the demographic accounting constraints, such as the constraint that population at the end of a period equals population at the beginning plus entries minus exits. Our approach has been to update small subsets of cells, randomly generating new entries that conform to the accounting constraints.

The framework has some important advantages for statistical agencies. It imposes few constraints on the input data. Because it works on cell counts rather than individual records, it avoids many of the practical and legal difficulties that arise with administrative data. It provides indicators of uncertainty, for both estimates and projections. It offers the possibility of automating processes that are currently ad hoc and labour intensive.

Keywords: Official statistics; Demography; Bayesian statistics; Population estimates and projections

Biography: John Bryant is a Senior Research Statistician at Statistics New Zealand. Previous employers have included Mahidol University and Khon Kaen University in Thailand, and the New Zealand Treasury. He has a PhD in Demography from the Australian National University.