This study assesses the accuracy of new imputation methods to be implemented in FAOSTAT, the worldâs largest database on agriculture, nutrition, fisheries, forestry, food aid and land use, prepared by the FAO. Although best efforts are undertaken to collect a large amount of the agricultural production data from questionnaires filled by country administrations and official publications, imputation of missing data constitutes an essential component in the production of FAOSTAT, because not all data can be obtained from those sources at the required quality. The error assessment presented in this study is part of the FAOâs efforts to enhance the methods applied for that imputation.
We analyze the values recorded in 3000 data points of time series for output of agricultural production, area harvested and livestock randomly selected from the (annual) FAOSTAT data obtained from official sources. Comparison of imputed figures with their corresponding actual values enables us assessing the coverage rate and mean squared error of these imputations.Different random draws are carried out to simulate different environments in which missing values can occur: in the middle and at the margin of time-series gaps as well as behind timeseries endpoints. The imputation methods we compare comprise: linear interpolation, trend smoothing, and benchmarking of growth rates on aggregates.
Our results show that benchmarking on growth rates is much more often applicable than trend smoothing. For imputations behind the time-series endpoints (extrapolation) this approach also leads to a greater accuracy. For imputations within time-series gaps the differences in accuracy are rather small and in many cases not statistically significant. Which approach produces the more exact results is dependent on the analyzed variable and gap-length. But within time-series gaps the accuracy of linear interpolations is in no case significantly lower than the accuracy of any of the other two approaches.
Rubin, D.B., 1996, Multiple Imputation After 18+ years, Journal of the American Statistical Association, 91/434, 473-489.
FAO, FAOSTAT, http://faostat.fao.org.
Pourahmadi, M., 1989, Estimation and interpolation of missing values of a stationary time series, Journal of Time Series Analysis 10/2, 149–169.
Kohn, R. and Ansley, C. F., 1986, Estimation, Prediction and Interpolation for ARIMA Models with Missing Data. Journal of the American Statistical Association, 81, 751-761.
Keywords: Imputation; Time series analysis; Error assessment; Agricultural Statistics
Biography: ONNO HOFFMEISTER works at the Economics and Social Statistics Division of the Food and Agriculture Organization of the United Nations. He obtained his Ph.D. in Economics at the European University Viadrina of Frankfurt/Odra. He was research fellow at the German Institute for Economic Research (DIW Berlin) and at the University of Hamburg. He worked as consultant in capacity building projects in Lithuania, Bulgaria, Tunisia and Jordan in charge of the German Ministry of Economics and Labor. Later, he was Statistical Officer at Eurostat, the statistical office of the European Union, where he was involved in the development of the Sustainable Development Indicators Database and in the preparation of the European Sector Accounts.