Abstract
This article deals with nonobserved dyads during the sampling of a network and consecutive issues in the inference of the stochastic block model (SBM). We review sampling designs and recover missing at random (MAR) and not missing at random (NMAR) conditions for the SBM. We introduce variants of the variational EM algorithm for inferring the SBM under various sampling designs (MAR and NMAR) all available as an R package. Model selection criteria based on integrated classification likelihood are derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We explore two real-world networks from ethnology (seed circulation network) and biology (protein–protein interaction network), where the interpretations considerably depend on the sampling designs considered. Supplementary materials for this article are available online.
Acknowledgments
The authors thank Sophie Donnet (INRA-MIA, AgroParisTech) and Mahendra Mariadassou (INRA-MaIAGE, Jouy-en-Josas) for their helpful remarks and suggestions. We also thank all members of MIRES for fruitful discussions on network sampling designs and for providing the original problems from social science. In particular, we thank Vanesse Labeyrie (CIRAD-Green) for sharing the seed exchange data and for related discussions on the analysis.