The model-based clustering latent space network model represents relational data visually and takes account of several basic network properties. Due to the structure of its likelihood function, the computational cost is of order O(n2), where n is the number of nodes. This makes it infeasible for large networks. We propose an approximation of the log likelihood function. We adapt the case-control idea from epidemiology and construct an approximate case-control log likelihood which is an unbiased estimator of the full log likelihood. Replacing the full likelihood by the case-control likelihood in the MCMC estimation of the latent space model reduces the computational time from O(n2) to O(n), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. This is joint with with Xiaoyue Niu, Peter Hoff and Ka Yee Yeung.
Keywords: Network Data; Clustering; Markov Chain Monte Carlo; Computational Statistics
Biography: Adrian Raftery is the Blumstein-Jordan Professor of Statistics and Sociology at the University of Washington, Seattle. He was the Founding Director of the Center for Statistics and Social Sciences at this University of Washington. His research interests include developing new statistical methods for social, environmental and health sciences.