Determination of the Number of Clusters and Validation of Clustering through the Matrix of Distances
Luis F. Rivera-Galicia
Dep. Estadística, Estructura Econόmica y O.E.I., Universidad de Alcalá, Alcalá de Henares, Madrid, Spain

Cluster analysis is a popular method used for unsupervised classification. Its goal is to analyse a dataset, in order to find groups, which are well separated, and whose elements are similar in some sense to each other in the same group. The fundamental problem of cluster analysis is to discover the real number of groups in the dataset.

In this paper, a new method of clustering is presented, to simultaneously determine the number of groups and clustering in a dataset. This method is based on graph theory. The Dissimilarity data is used to form a dissimilarity graph, in which vertices are cases in dataset, and edges connect elements with dissimilarity values under some certain threshold. In this case, clusters are the resulting connected subgraphs.

This method has been tested on some different datasets, and results obtained have been compared to some traditional clustering methods, using validation indices.

References:

Hardy, A. (1996). On the Number of Clusters. Computational Statistics & Data Analysis, 23, 83-96.

Hartuv, E.; Shamir, R. (2000): A Clustering Algorithm Based on Graph Connectivity. Information Processing Letters, 76, 175-181

Jain, A.K.; Dubes, R.C. (1988). Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice-Hall.

Xu, R.; Wunsch, D. (2005). Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, Vol. 16, No. 3, 645-678.

Keywords: Clustering; Graph connectivity; Validity indices

Biography: Luis F. Rivera-Galicia received the Bachelor of Science degree in Mathematics (with options in Computer Science) from the University Complutense of Madrid, Spain, and the Ph.D. degree in Economics from the University of Alcalá, Madrid, Spain, where he is currently Lecturer of Economics and Bussiness Statistics. His main lines of research are unsupervised classification methods and statistics teaching and learning in Social Sciences. He has published several articles in journals and has participated in several national and international conferences. He is currently Vicedean of the Faculty of Economics and Business at the University of Alcalá.