Using Self-Tuning Diffusion Maps To Reduce Dimensionality and Find Clustering Structure
Rebecca Nugent1, David Friedenberg2
1Statistics, Carnegie Mellon University, Pittsburgh, PA, United States; 2Battelle

Diffusion maps are a powerful tool for identifying complicated structure and reducing dimensionality in a wide variety of applications. Representing the connectivity of a data set, diffusion maps project observations into a space in which standard methods can more easily model the structure. These maps rely heavily on the choice of a global tuning parameter that dictates the threshold for similarity. Often, however, using a global tuning parameter does not capture structure that may be a function of fluctuations in the local density. For example, a dense region embedded in background noise would be difficult to capture with standard diffusion maps. We present a flexible self-tuning diffusion map framework that incorporates local tuning parameters to capture this type of structure (if present). We also show results that imply that the use of the self-tuning diffusion map (or similar tools) may decrease the importance of the choice of clustering procedure. While we focus on illustrating dimension reduction and visualization capability in a clustering framework, we also present examples from classification and regression. Where appropriate, a cross-validation algorithm is employed to choose local tuning parameters (including possibly the choice of number of clusters). Use of the self-tuning diffusion map can greatly improve the recovery of structure in the presence of varying local density.

Keywords: Diffusion maps; Clustering; Tuning parameters; Dimension reduction

Biography: Professor Rebecca Nugent received her Bachelor's in Mathematics, Statistics, and Spanish from Rice University, her Master's in Statistics from Stanford University and her PhD in Statistics from the University of Washington. She is currently faculty at the Department of Statistics at Carnegie Mellon University.