799
Views
1
CrossRef citations to date
0
Altmetric
Research Articles

An evaluation of geo-located Twitter data for measuring human migration

ORCID Icon, & ORCID Icon
Pages 1830-1852 | Received 10 Jun 2021, Accepted 06 May 2022, Published online: 15 Jun 2022
 

Abstract

This study evaluates the spatial patterns of flows generated from geo-located Twitter data to measure human migration. Using geo-located tweets continuously collected in the U.S. from 2013 to 2015, we identified Twitter users who migrated per changes in county-of-residence every two years and compared the Twitter-estimated county-to-county migration flows with the ones from the U.S. Internal Revenue Service (IRS). To evaluate the spatial patterns of Twitter migration flows when representing the IRS counterparts, we developed a normalized difference representation index to visualize and identify those counties of over-/under-representations in the Twitter estimates. Further, we applied a multidimensional spatial scan statistic approach based on a Poisson process model to detect pairs of origin and destination regions where the over-/under-representativeness occurred. The results suggest that Twitter migration flows tend to under-represent the IRS estimates in regions with a large population and over-represent them in metropolitan regions adjacent to tourist attractions. This study demonstrated that geo-located Twitter data could be a sound statistical proxy for measuring human migration. Given that the spatial patterns of Twitter-estimated migration flows vary significantly across the geographic space, related studies will benefit from our approach by identifying those regions where data calibration is necessary.

Acknowledgments

The authors thank the editor and the anonymous reviewers for their constructive comments on earlier versions of the article. Appreciation is extended to Dr. Jennifer Van Hook at The Pennsylvania State University for her many helpful suggestions.

Author contributions

All authors conceived and designed the study and revised the manuscript. Junjun Yin and Yizhao Gao processed the data and coded methods. Junjun Yin performed experiments and wrote the original draft. Guangqing Chi contributed to the refinement of the proposed concepts, method and manuscript writing, and the discussion of the findings.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data and codes availability statement

The data and source code that support the findings of this study are available in [figshare] at [https://doi.org/10.6084/m9.figshare.14759976]. The geo-located Twitter data are not publicly available according to the Twitter data sharing policy as precise geo-locations could compromise the privacy of Twitter users. However, a sample collection of 2000 raw geo-located tweets are included in the shared file.

Additional information

Funding

This research was supported in part by the National Science Foundation [Awards #1541136, #1823633, and #1927827]; the Eunice Kennedy Shriver National Institute of Child Health and Human Development [Award #P2C HD041025]; the USDA National Institute of Food and Agriculture and Multistate Research Project #PEN04623 [Accession #1013257]; and the Social Science Research Institute, Population Research Institute, and Institute for Computational and Data Sciences of the Pennsylvania State University.

Notes on contributors

Junjun Yin

Junjun Yin is an Assistant Research Professor in the Computational and Spatial Analysis Core at The Pennsylvania State University, University Park, PA 16802. E-mail: [email protected]. His research interests include computational geography approaches and geospatial big data to model human-urban environment interactions about urban environmental sustainability, resilience, and mobility.

Yizhao Gao

Yizhao Gao received his Ph.D. at the Department of Geography and Geographic Information Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801. E-mail: [email protected]. His research interest includes CyberGIS, movement analysis, and spatiotemporal analysis.

Guangqing Chi

Guangqing Chi is a Professor of Rural Sociology, Demography, and Public Health Sciences in the Department of Agricultural Economics, Sociology, and Education and Director of the Computational and Spatial Analysis Core at The Pennsylvania State University, University Park, PA 16802. E-mail: [email protected]. His research interests focus on socio-environmental systems, aiming to understand the interactions between human populations and built and natural environments and to identify important assets to help vulnerable populations adapt and become resilient to environmental changes by developing and implementing spatial and big data analytic methods.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.