Abstract
Most empirical studies of complex networks return rich but noisy data, as they measure the network structure repeatedly but with substantial errors due to indirect measurements. In this article, we propose a novel framework, called the group-based binary mixture (GBM) modeling approach, to simultaneously conduct network reconstruction and community detection from such rich but noisy data. A generalized expectation-maximization (EM) algorithm is developed for computing the maximum likelihood estimates, and an information criterion is introduced to consistently select the number of communities. The strong consistency properties of the network reconstruction and community detection are established under some assumption on the Kullback-Leibler (KL) divergence, and in particular, we do not impose assumptions on the true network structure. It is shown that joint reconstruction with community detection has a synergistic effect, whereby actually detecting communities can improve the accuracy of the reconstruction. Finally, we illustrate the performance of the approach with numerical simulations and two real examples. Supplementary materials for this article are available online.
Supplementary Materials
Appendix: The detailed steps of Algorithm 1 under some commonly used distributions and all the technical proofs for Theorems 1-6. (pdf)
Code and data: R codes and data for reproducing the real data analysis results. (zip)
Acknowledgments
We thank the editor, the AE, and the referees for their insightful comments which greatly improved the article.
Disclosure Statement
No potential conflict of interest was reported by the author(s).