Clustering-by-Role: Modelling Ego Networks as mixtures of ERGMs
Michael Salter-Townshend
School of Mathematical Sciences and the Complex and Adaptive Systems Laboratory, University College Dublin, Ireland

Most current work on clustering nodes in a network data focuses on finding highly connected subsets of nodes in the network. Many statistical model-based approaches for clustering nodes exist, including the stochastic block model, the mixed membership stochastic block model and the latent position cluster model.

In contrast, clustering nodes by similarity of role within the network is a problem that has been the subject of little or no attention in the statistical literature. We approach this problem using the Exponential (family) Random Graph Model (ERGM) framework (see e.g. Robins et al., 2007). ERGMs model the probability of a network through selected sufficient statistics. Typical choices of statistics include the number of edges, the number of triangles and the number of k-stars for various k values.

We cluster the nodes in a network by role by studying their ego-networks. These are the sub-networks defined by finding all the alters (nodes to which there is a link) for each node (or ego) and then filling in all of the between alter links. We acknowledge that there is much overlap between these ego-networks, however we model them as independent networks arising from a finite mixture of ERGM models. We use an Expectation-Maximisation algorithm to simultaneously learn both the cluster assignments and the ERGM parameters of each role-based cluster. We demonstrate our method on several datasets and discuss the challenges in model validation. This is joint with Brendan Murphy. Keywords: Network data; Clustering; Exponential random graph model; Ego-centric network

Biography: Michael Salter-Townshend is a postdoctoral researcher in the Clique Strategic Research Cluster in University College Dublin. He completed his PhD in statistics at Trinity College Dublin where he worked on Bayesian methods in palaeoclimate reconstruction. His current research is on statistical models for the analysis of network data.