Abstract
Peer effects, in which an individual’s behavior is affected by peers’ behavior, are posited by multiple theories in the social sciences. Randomized field experiments that identify peer effects, however, are often expensive or infeasible, so many studies of peer effects use observational data, which is expected to suffer from confounding. Here we show, in the context of information and media diffusion, that high-dimensional adjustment of a nonexperimental control group (660 million observations) using propensity score models produces estimates of peer effects statistically indistinguishable from those using a large randomized experiment (215 million observations). Compared with the experiment, naive observational estimators overstate peer effects by over 300% and commonly available variables (e.g., demographics) offer little bias reduction. Adjusting for a measure of prior behaviors closely related to the focal behavior reduces this bias by 91%, while models adjusting for over 3700 past behaviors provide additional bias reduction, reducing bias by over 97%, which is statistically indistinguishable from unbiasedness. This demonstrates how detailed records of behavior can improve studies of social influence, information diffusion, and imitation; these results are encouraging for the credibility of some studies but also cautionary for studies of peer effects in rare or new behaviors. More generally, these results show how large, high-dimensional datasets and statistical learning can be used to improve causal inference. Supplementary materials for this article are available online.
Supplementary Materials
The supplementary materials consist of: (a) a document with additional details about the data, the methods, and additional tests comparing the estimators, and (b) code and aggregated data for replicating the results in the main text.
Acknowledgments
We are grateful to L. Adamic, S. Aral, J. Bailenson, J. H. Fowler, W. H. Hobbs, D. Holtz, G. W. Imbens, S. Messing, C. Nass, M. Nowak, A. B. Owen, A. Peysakhovich, B. Reeves, D. Rogosa, J. Sekhon, A. C. Thomas, J. Ugander, and participants in seminars at New York University Stern School of Business, Stanford University Graduate School of Business, UC Berkeley Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health, University of Chicago Booth School of Business, Columbia University Department of Statistics, and UC Davis Department of Statistics, and anonymous referees for comments on this work.
Disclosure Statement
D.E. was previously an employee of Facebook while contributing to this research. E.B. has significant financial interests in Facebook, as did D.E. during writing earlier versions of this article. While revisions of this article were under editorial review, D.E. received a grant from Facebook for other research.