Discovering Clustering Structure in High-Dimensional Metabolomic Data
Isobel Claire Gormley
School of Mathematical Sciences, University College Dublin, Ireland

Metabolomics is the quantitative study of metabolites in a biofluid and is widely used in many areas including nutritional biochemistry. Metabolomic data typically take the form of high dimensional spectra. The spectral peaks relate to specific metabolites and the height of a peak details metabolite abundance. Metabolite patterns provide insight to underlying molecular mechanisms of disease. Typically metabolomic scientists are interested in using metabolomic spectra to diagnose and understand metabolomic disease. Further, they are interested in studying the influence of covariates jointly with the spectral data.

The high dimensionality of the metabolomic spectra provides statistical challenges; latent factor models can be employed in such situations to represent high dimensional data in lower dimensional space. Here such models are extended to facilitate joint modeling of metabolomic spectra and covariate data. Additionally a mixture modeling framework is employed to provide clustering capabilities, aiding the identification of subtypes of metabolomic disease.

Inference is performed within the Bayesian paradigm where sparse priors are employed to appropriately deal with the high dimensionality of the spectral data. The methodology is illustrated through the analysis of a real metabolomic data set.

Keywords: Metabolomics; Factor analyzers; Clustering; Bayesian methods

Biography: Dr. Gormley was awarded a B.A. in Mathematics and a PhD in Statistics from Trinity College Dublin. She spent some time in the University of Washington, Seattle as a Visiting Scholar before taking up a position as a Lecturer in Statistics in University College Dublin. Her research interests include clustering and classification methods for high dimensional data in varied application areas.