Recently, more and more statistical agencies are researching into new methods and strategies in order to improve the efficiency of statistical production processes. This paper discusses the opportunities associated with selective data editing methods and presents some research carried out at the National Statistical Institute of Spain.
Selective editing approaches try to improve the efficiency of the editing process, selecting only a subset of the questionnaires to be edited manually. Although years ago it was considered customary to edit manually every questionnaire, today it is considered inefficient because most of the editing work has no impact at the aggregate level and can in fact even damage the quality of the data (Granquist, 1997). Different selective editing methods are being used in many countries and extensive research is being conducted.
As a matter of fact many approaches of selective editing have been proposed, sometimes with different names (macroediting, plausibility indicators, significance editing, etc). Nevertheless, these approaches are mostly empiric. In our research, we introduce a theoretical framework to guide in defining a selective editing strategy. (Arbués et al., 2010). We consider the selective editing strategy as an optimization problem and present a method to solve stochastic optimization problems with constrains expressed in terms of expectations. In particular we present a new class of stochastic optimization problems where the solutions belong to an infinite-dimensional space of random variables. We prove existence of the solutions and show that under convexity conditions, a duality method can be used. The search for a good selective editing strategy is stated as a problem in this class, setting the expected workload as the objective function to minimize and imposing quality constraints.
A frequently used approach to selective editing is that of score functions. Score functions try to reflect the importance of editing a particular record, identifying data records that need to be followed up. A reason why this is convenient is that some records are more likely to improve the quality if edited than some others, either because they are more suspect to have an error or because the error if it exists has probably more impact in the aggregated data. In our research, we formulate the score function as the solution to the optimization problem with linear constrains.
We present the results of real data experimentation and the practical experience in applying these methods in the production of short-term indicators and structural surveys at the Spanish National Statistical Office.
References:
Arbués I., González M. and Revilla (2010). A Class of stochastic optimization problems with application to selective data editing. Optimization. Taylor & Francis
Granquist L: (1997) The new view on editing. International Statistical Review 65.
Keywords: selective data editing; statistical production processes; stochastic optimization; total quality management
Biography: Director General of Methodology, Quality and IT at the National Statistical Institute of Spain (INE), since 2009. Director of Agriculture and Industrial Statistics of the INE (1988-2009).
Associated Professor in Carlos III University of Madrid and in Salamanca University.
Experience in representation and negotiations before diverse international institutions as a Spanish delegate (EU Council, Eurostat, OECD, United Nations, FAO). Experience collaborating with international organisations in co-operation programmes, within the scope of statistical production and methodology (Israel, Palestine, Jordan, Hungary, China, Brazil, Argentina, Chile, Uruguay, Paraguay, Peru, Guatemala).
Member of ISI.
Member of the Steering Committee of the UN/ECE Work Session on Data Editing.
Member of the Steering Group for coordinating the UN publication “Statistical data editing, volume 3”
Organizer of international seminars on statistical methodologies and tools: United Nation Work Session on Statistical Data Editing (Madrid, 2003).
Author of several research papers on data editing, time series modelling and TQM.