ARC Colloquium - Tanday Warnow
From Franeseya Kendrick
views
comments
From Franeseya Kendrick
Title: Well-connected clusters: a valuable but elusive property in community detection methods
Speaker: Tandy Warnow, University of Illinois Urbana-Champaign
Abstract: Community detection, also known as clustering, takes as input a network and returns a decomposition of the vertices into disjoint sets so that each set represents a community. Two natural expectations of these communities are (1) they should be relatively dense compared to the background and (2) they should be well-connected, in the sense of not having small edge cuts. Surprisingly, we find that the output of many well-established clustering approaches (e.g., the Leiden algorithm with either the Constant Potts Model or modularity, Infomap, Markov Clustering, and the use of the Stochastic Block Model) produce clusters that fail even a mild requirement for well-connectedness, and many even produce disconnected clusters.
To address this failure, we have developed the “Connectivity Modifier” (CM), which iteratively removes small edge cuts and re-clusters until all communities detected are well-connected. Our study on real-world networks provides insights into tradeoffs between methods in terms of node coverage and well-connectedness, and our analyses of synthetic networks shows that the CM algorithm improves accuracy in recovering true communities. More generally, our research raises questions about assumptions in mathematical models of networks with ground truth community structure, and reveals significant issues in current methods.