ABSTRACT
As our modern world has become increasingly digitalized, various types of data from different data domains are available that can enrich survey data. To link survey data to other sources, consent from the survey respondents is required. This article compares consent to data linkage requests for seven data domains: administrative data, smartphone usage data, bank data, biomarkers, Facebook data, health insurance data, and sensor data. We experimentally explore three factors of interest to survey designers seeking to maximize consent rates: consent question order, consent question wording, and incentives. The results of the study using a German online sample (n = 3,374) show that survey respondents have a relatively high probability of consent to share smartphone usage data, Facebook data, and biomarkers, while they are least likely to share their bank data in a survey. Of the three experimental factors, only the consent question order affected consent rates significantly. Additionally, the study investigated the interactions between the three experimental manipulations and the seven data domains, of which only the interaction between the data domains and the consent question order consistently showed a significant effect.
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplementary data
Supplemental data for this article can be accessed online at https://doi.org/10.1080/13645579.2023.2173847.
Notes
1. The results of the data linkage procedure for Facebook data are not discussed in this paper.
2. The results of both questions are not discussed in this paper.
3. This measure is not included in the analyses because the number of respondents who agreed to prior requests for additional data was very small (only 367 respondents shared data summed up across all data domains combined).
4. We controlled for gender and education but not for age because the former two variables were not equally distributed between the incentive conditions. The device manipulation needed to be controlled for as it was used as a screening variable, thus creating a nonresponse bias.
5. A more parsimonious modeling approach would be to include question order as a log-transformed continuous variable instead of using a categorical variable. Here, the LRT also indicates that the model with interaction term provides a better fit (,
). However, since this would not have changed the substantial results and the categorical question order variable is easier to interpret, we decided to keep the previous model.
Additional information
Funding
Notes on contributors
Christoph Beuthner
Christoph Beuthner was an Associate Researcher at GESIS. He is a PhD student at the university of Mannheim. His research interests include machine and deep learning.
Bernd Weiß
Bernd Weiß is head of the GESIS Panel and deputy head of the department Survey Design and Methodology at GESIS – Leibniz Institute for the Social Sciences. His research interests focus on methods of empirical research in the social sciences, family sociology and juvenile delinquency. (ORCID: 0000-0002-1176-8408)
Henning Silber
Henning Silber is a Senior Researcher and head of the Survey Operations Team at the Survey Design and Methodology Department at GESIS – Leibniz Institute for the Social Sciences. His research interests include survey methodology, political sociology, and the experimental social sciences. (ORCID: 0000-0002-3568-3257)
Florian Keusch
Florian Keusch is Professor of Social Data Science and Methodology at the University of Mannheim, Germany, and Adjunct Research Professor at the Joint Program in Survey Methodology (JPSM), University of Maryland. His research focuses on nonresponse and measurement error in (mobile) web surveys and digital trace data collection. (ORCID: 0000-0003-1002-4092)
Jette Schröder
Jette Schröder is a senior researcher at GESIS - Leibniz Institute for the Social Sciences in Mannheim, Germany. Her research interests include survey methodology, well-being, and family research. (ORCID: 0000-0002-1000-5855)