Prolegomena to the Theory and Practice of Data Analysis
Peter J. Huber

This talk gives my own, unabashedly idiosyncratic view of what we can learn – or should have learned! – from the past 50 years of doing data analysis and statistics, and the implications for the near future. In those years fell the advent of computing and with it the impact of ever larger data sets. A practicing data analyst had to come to grips with the following four areas: (1) strategic thinking, (2) dealing with massive, inhomogeneous data sets, (3) providing proper linguistic support to statistical computing, and (4) the handling of complex, approximate models. Some crucial issues pertaining to these areas shall be singled out and some illustrative examples shall be given. But what are the implications for statistical teaching? Admittedly, what can be transmitted by classroom teaching is limited, and the high art of data analysis still has to be learned through apprenticeship, anecdotes, and on the job. But the insights we have gained in the past years impart additional importance and urgency to some of the traditionally neglected stepchildren of statistical instruction. In particular, we must teach our students to pay attention to the pitfalls of heterogeneity (such as Simpson's paradox) and of invisible missingness, we must show them the value of graphical tools (such as multidimensional scaling and correspondence analysis) that can be used to create order in the data, and we must teach them the free and creative use of the computer (so that they can deal with ease with complex ad hoc models adapted to the data at hand, rather than becoming slaves to the computer by adapting the data and its analysis to the available computer packages).

Keywords: Strategy in Data Analysis; Heterogeneous data; Languages for Data Analysis; Approximate models

Biography: Peter J. Huber started his career in pure mathematics as a topologist but then switched to mathematical statistics. He was Professor of Statistics at ETH Zürich, Harvard and MIT, and spent the last years before his retirement at the University of Bayreuth in Germany. Since then he has worked mostly in Assyriology and astronomical dating. In statistics he is mainly known for his contributions to the theory of robustness, but he also was principal investigator of projects in high-interaction, high-dimensional graphics, and he was involved in applications of statistics in fields such as crystallography, EEGs, and human growth curves.