Abstract
Community detection is a branch of network analysis concerned with identifying strongly connected subnetworks. Social bookmarking sites aggregate datasets of often hundreds of millions of triples (document, user, and tag), which, when interpreted as edges of a graph, give rise to special networks called 3-partite, 3-uniform hypergraphs. We identify challenges and opportunities of generalizing community detection and in particular modularity optimization to these structures. Two methods for community detection are introduced that preserve the hypergraph's special structure to different degrees. Their performance is compared on synthetic datasets, showing the benefits of structure preservation. Furthermore, a tool for interactive exploration of the community detection results is introduced and applied to examples from real datasets. We find additional evidence for the importance of structure preservation and, more generally, demonstrate how tripartite community detection can help understand the structure of social bookmarking data.
Acknowledgements
The authors are indebted to Wendelin Böhmer for in-depth discussions and, in particular, his insistence on human-readable graphs. We thank Caimei Lu for providing the code of her tripartite clustering approach and Andreas König for the work on crawling Visualize.us. We also thank the anonymous reviewers for valuable remarks on an earlier version. The first author has been supported by the Integrated Graduate Program on Human-Centric Communication at Technische Universität Berlin. The research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007–2011] under grant agreement No. 21644 (PetaMedia).