Seriation is a data analytic tool for obtaining a permutation of a set of objects with the goal of revealing structural information within the set of objects. Seriating variables, cases or categories generally improves visualisations of statistical data, for example, by revealing hidden patterns in data or by making large datasets easier to understand.
There are many different approaches to the problem of seriation, including the use of Travelling Salesperson Problem heuristics and dimension reduction techniques such as multidimensional scaling. Another popular method is “dendrogram seriation”. Dendrogram seriation algorithms rearrange the nodes in a dendrogram in order to obtain a permutation of the leaves (i.e. objects) that optimises a given criterion.
This presentation describes a new dendrogram seriation algorithm called DendSer. DendSer is more flexible than currently available seriation algorithms because it allows the user to either choose from a variety of seriation criteria or to input their own criteria. The choice of seriation criteria is an important feature of DendSer because different criteria are suitable for different visualisation settings.
Common seriation criteria include finding the shortest path through a set of objects and measuring anti-Robinson form in a symmetric matrix. In this presentation, we examine these criteria and discuss their limitations in data visualisation. We propose new seriation criteria called “lazy path length” and “banded anti-Robinson” form, and demonstrate their effectiveness in improving a variety of visualisations.
Keywords: Visualisation; Seriation; Hierarchical clustering
Biography: Denise Earle completed her undergraduate degree in mathematical studies and statistics at the National University of Ireland Maynooth, where she then completed her Masters degree in mathematics. In October 2010, she completed her PhD in statistics and has been lecturing mathematics at the National University of Ireland Maynooth for the past year.