Abstract
This study evaluates the spatial patterns of flows generated from geo-located Twitter data to measure human migration. Using geo-located tweets continuously collected in the U.S. from 2013 to 2015, we identified Twitter users who migrated per changes in county-of-residence every two years and compared the Twitter-estimated county-to-county migration flows with the ones from the U.S. Internal Revenue Service (IRS). To evaluate the spatial patterns of Twitter migration flows when representing the IRS counterparts, we developed a normalized difference representation index to visualize and identify those counties of over-/under-representations in the Twitter estimates. Further, we applied a multidimensional spatial scan statistic approach based on a Poisson process model to detect pairs of origin and destination regions where the over-/under-representativeness occurred. The results suggest that Twitter migration flows tend to under-represent the IRS estimates in regions with a large population and over-represent them in metropolitan regions adjacent to tourist attractions. This study demonstrated that geo-located Twitter data could be a sound statistical proxy for measuring human migration. Given that the spatial patterns of Twitter-estimated migration flows vary significantly across the geographic space, related studies will benefit from our approach by identifying those regions where data calibration is necessary.
Acknowledgments
The authors thank the editor and the anonymous reviewers for their constructive comments on earlier versions of the article. Appreciation is extended to Dr. Jennifer Van Hook at The Pennsylvania State University for her many helpful suggestions.
Author contributions
All authors conceived and designed the study and revised the manuscript. Junjun Yin and Yizhao Gao processed the data and coded methods. Junjun Yin performed experiments and wrote the original draft. Guangqing Chi contributed to the refinement of the proposed concepts, method and manuscript writing, and the discussion of the findings.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data and codes availability statement
The data and source code that support the findings of this study are available in [figshare] at [https://doi.org/10.6084/m9.figshare.14759976]. The geo-located Twitter data are not publicly available according to the Twitter data sharing policy as precise geo-locations could compromise the privacy of Twitter users. However, a sample collection of 2000 raw geo-located tweets are included in the shared file.
Additional information
Funding
Notes on contributors
Junjun Yin
Junjun Yin is an Assistant Research Professor in the Computational and Spatial Analysis Core at The Pennsylvania State University, University Park, PA 16802. E-mail: [email protected]. His research interests include computational geography approaches and geospatial big data to model human-urban environment interactions about urban environmental sustainability, resilience, and mobility.
Yizhao Gao
Yizhao Gao received his Ph.D. at the Department of Geography and Geographic Information Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801. E-mail: [email protected]. His research interest includes CyberGIS, movement analysis, and spatiotemporal analysis.
Guangqing Chi
Guangqing Chi is a Professor of Rural Sociology, Demography, and Public Health Sciences in the Department of Agricultural Economics, Sociology, and Education and Director of the Computational and Spatial Analysis Core at The Pennsylvania State University, University Park, PA 16802. E-mail: [email protected]. His research interests focus on socio-environmental systems, aiming to understand the interactions between human populations and built and natural environments and to identify important assets to help vulnerable populations adapt and become resilient to environmental changes by developing and implementing spatial and big data analytic methods.