High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. I will discuss sources of variation in these high-throughput data, including both batch and biological artifacts. These artifacts can become a major problem when they are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and new analyses, I argue that batch and other technical and biological artifacts are widespread and critical to address. I will discuss the impact of dissecting variability in high-throughput biology on both significance analysis and prediction.
Keywords: Genomics; Multiple testing; Prediction; Next generation sequencing
Biography: Jeffrey Leek received his Ph.D. in biostatistics from the University of Washington. He is currently an assistant professor of Biostatistics at Johns Hopkins Bloomberg School of Public Health. His main research focuses are (1) identifying, estimating and removing artifacts from high-dimensional data in both microarrays and next-generation sequencing with tools like surrogate variable analysis, (2) developing practical genomic predictors using simple classifiers that are robust to batch and other effects and (3) understanding natural variation in gene expression and epigenetics.