We consider a survey sampling point of view in order to estimate the mean curve of large databases of functional data. When storage capacities are limited or transmission costs are high, selecting with survey techniques a small fraction of the observations is an interesting alternative to signal compression techniques. Our study is motivated, in such a context, by the estimation of the temporal evolution of mean electricity consumption curves. The French operator EDF has planned to install in a few years more than 30 millions electricity meters, in each firm and household, that will be able to send individual electricity consumptions at very fine time scales. Collecting, saving and analyzing all this information which can be seen as functional would be very expensive and survey sampling strategies are interesting to get accurate estimations at reasonable costs. It is also well known that consumption profiles may depend on covariates such as past aggregated consumptions, meteorological characteristics (temperature, etc) or geographical information (altitude, latitude and longitude).
We compare in this work different ways of taking this information into account. A first one consists in using simple sampling designs, such as simple random sampling without replacement, and model assisted estimators. In a functional context, this can be performed by first reducing the dimension of the functional variable of interest through a principal components analysis and then build model assisted estimators on the principal component scores. A second strategy consists in considering unequal probability sampling designs such as stratified sampling or pps that can take additional information into account through their sampling weights.
The second question addressed in this work is how to build reliable confidence bands. When consistent estimators of the covariance function of the estimators are easy to build and the mean estimator satisfies a Functional Central Limit Theorem, a fast technique, based on simulations of Gaussian processes in order to approximate the distribution of their suprema can be employed. This new approach is compared to bootstrap techniques which are also natural candidates for building confidence bands and that can be adapted to the finite population settings.
Keywords: Bootstrap; Design-Based estimation; Functional data analysis; Survey sampling
Biography: Hervé Cardot is professor at University of Bourgogne, France.
Alain Dessertaine is research engineer at Electricité de France, France.
Etienne Josserand is PhD at University of Bourgogne, France.
Pauline Lardin is PhD at University of Bourgogne and engineer at Electricité de France, France.