ABSTRACT
To explore socially desirable responding in telephone surveys, this study examines response latencies in answers to 27 questions in a corpus of 319 audio-recorded voice interviews on iPhones. Response latencies were compared when respondents (a) answered questions on sensitive vs. nonsensitive topics (as classified by online raters); (b) produced more vs. less socially desirable answers; and (c) were interviewed by a professional interviewer or an automated system. Respondents answered questions on sensitive topics more quickly than on nonsensitive topics, though patterns varied by question format (categorical, numerical, ordinal). Independent of question sensitivity, respondents gave less socially desirable answers more quickly when answering categorical and ordinal questions but more slowly when answering numeric questions. Respondents were particularly quicker to answer sensitive questions when asked by interviewers than by the automated system. Findings demonstrate that response times can be (differently) revealing about question and response sensitivity in a telephone survey.
Acknowledgments
The authors gratefully acknowledge National Science Foundation grants to authors Schober and Conrad that funded collection of the data set on which these analyses focus (SES-1026225 and SES-1025645); use of the services and facilities of the Population Studies Center at the University of Michigan, funded by NICHD Center Grant P2CHD041028; New School for Social Research faculty research funds to author Schober; and advice from William Hirst, Ai Rene Ong, Paul Schulz, Brady West, and the editors and reviewers.
Disclosure statement
No potential conflict of interest was reported by the authors.
Supplementary material
Supplemental data for this article can be accessed here.
Correction Statement
This article has been republished with minor changes. These changes do not impact the academic content of the article.
Notes
1. Almost all the deviations consisted in omitting the parenthetical text ‘including the recent past that you have already told us about’ for two questions asking about sexual partners since the respondent’s 18th birthday.
2. A transformed set of standardized latency scores was also calculated, following McDaniel and Timm's (Citation1990) standardization procedure, to control for individual differences in responding. Initial analyses carried out on both raw and standardized latencies produced exactly the same pattern of results, and so only the raw latency scores are reported here.
3. This method of empirically determining question sensitivity goes beyond the less formal method of question selection in Schober et al. (Citation2015).
4. Note that other criteria could have been included to measure question and response sensitivity, for example, the extent to which raters believed that a particular response could harm someone’s reputation, the extent to which a particular response could be legally compromising, the extent to which a respondent might feel shame, etc. Given the burdensome length of the rating task, we selected embarrassment for most people as a plausible proxy that could tap into judgments of group (rather than personal) norms. Whether different criteria (for example, asking about personal norms) would have led to different ratings is, of course, unknown.
5. As detailed in Feuer, Fail, and Schober (2020), different thresholds led to different classifications of question sensitivity. For example, with a 40% threshold 4 more questions (cigarette smoking, drinking alcohol, sexual orientation and television watching) move to being sensitive. Analyses reported here tested our research questions with several different thresholds; the 50% threshold yielded the clearest and most consistent findings, while the 60% threshold classified too few questions as sensitive to allow us to explore our research questions.
6. Using the dependent variable of time until sound rather than time until response, this interaction of response sensitivity and response type is a bit less clear; the interaction is significant in the ordinal model (robustness check) but not significant in the linear model.
Additional information
Notes on contributors
Stefanie Fail
Stefanie Fail is the Global Lead for Conversation Experience Design at Nuance Communications, where she designs conversational voice AI interfaces for some of the largest US and Global enterprises. She holds a PhD in Psychology from The New School for Social Research.
Michael F. Schober
Michael F. Schober is Professor of Psychology and Vice Provost for Research at The New School. He studies shared understanding—and misunderstanding—in survey interviews, collaborative music-making, and everyday conversation, and how new communication technologies (e.g., text messaging, video chat, automated speech dialogue systems, social media posts) are affecting interaction dynamics.
Frederick G. Conrad
Frederick G. Conrad is Research Professor and Director, Program in Survey Methodology, University of Michigan. His research concerns improving survey data quality (e.g., via texting, two-way video, and virtual interviewers) as well as new data sources (e.g., social media posts and sensor measurement) for possible use in social and behavioral research.