Abstract
Aggregated Relational Data (ARD), formed from “How many X’s do you know?” questions, is a powerful tool for learning important network characteristics with incomplete network data. Compared to traditional survey methods, ARD is attractive as it does not require a sample from the target population and does not ask respondents to self-reveal their own status. This is helpful for studying hard-to-reach populations like female sex workers who may be hesitant to reveal their status. From December 2008 to February 2009, the Kiev International Institute of Sociology (KIIS) collected ARD from 10,866 respondents to estimate the size of HIV-related groups in Ukraine. To analyze this data, we propose a new ARD model which incorporates respondent and group covariates in a regression framework and includes a bias term that is correlated between groups. We also introduce a new scaling procedure using the correlation structure to further reduce biases. The resulting size estimates of those most-at-risk of HIV infection can improve the HIV response efficiency in Ukraine. Additionally, the proposed model allows us to better understand two network features without the full network data: (a) What characteristics affect who respondents know, and (b) How is knowing someone from one group related to knowing people from other groups. These features can allow researchers to better recruit marginalized individuals into the prevention and treatment programs. Our proposed model and several existing NSUM models are implemented in the networkscaleup R package. Supplementary materials for this article are available online.
Supplementary Materials
The supplementary materials contains additional information about the Ukraine data, the bias decomposition procedure, a more detailed explanation of scaling for NSUM models, and a study of the missing data. Additionally, the supplementary materials contain details about the proposed Bayesian surrogate residuals and additional visualizations, diagnostics, results from the simulation studies and Ukraine data analysis.
Acknowledgments
Computational efforts were performed on the Hyalite High Performance Computing System, operated and supported by University Information Technology Research Cyberinfrastructure at Montana State University.
Notes
1 We recognize the importance of understanding and accounting for the significant amount of missing responses in our data. We examined several missing-data diagnostics and presented key findings in the supplementary material Section 4. We found that while there is a relationship between some of the covariates and the frequency of missing responses, this relationship is fairly weak and is subset to only a few of the subpopulations, most notably the subpopulations related to gender and age but not any of the unknown subpopulations. In general, we do not believe that removing the respondents with missing data significantly affects our inference. It may be of interest to explore sophisticated methods to handle missing data in ARD models.