1,430
Views
9
CrossRef citations to date
0
Altmetric
Articles

Effects of Data Preprocessing Methods on Addressing Location Uncertainty in Mobile Signaling Data

ORCID Icon, , ORCID Icon, , & ORCID Icon
Pages 515-539 | Received 10 Oct 2019, Accepted 10 Mar 2020, Published online: 28 Jul 2020
 

Abstract

Recent years have witnessed an increasing use of big data in mobility research. Such efforts have led to many insights on the travel behavior and activity patterns of people. Despite these achievements, the data veracity issue and its impact on the processes of knowledge discovery have seldom been discussed. In this research, we investigate the veracity issue of mobile signaling data (MSD) when they are used to characterize human mobility patterns. We first discuss the location uncertainty issues in MSD that would hinder accurate estimations of human mobility patterns, followed by an examination of two existing methods for addressing these issues (clustering-based method and time window–based method). We then propose a new approach that can overcome some of the limitations of these two methods. By applying all three methods to a large-scale mobile signaling data set, we find that the choice of preprocessing methods could lead to changes in the data characteristics. Such changes, which are nontrivial, will further affect the characterization and interpretation of human mobility patterns. By computing four mobility indicators (number of origin–destination trips, number of activity locations, total stay time, and activity entropy) from the outputs of the three methods, we illustrate their varying impacts on individual mobility estimations relevant to location uncertainty issues. Our analysis results call for more attention to the veracity issue in data-driven mobility research and its implications for replicability and reproducibility of geospatial research.

近年来, 出现了越来越多的基于大数据的流动性研究, 揭示了人们在旅行中的行为和活动模式。但是, 数据真实性问题及其对知识发现过程的影响, 目前鲜有研究。本文探讨了手机信令数据(MSD)在描述流动性模式中的数据真实性问题。首先讨论了影响人类流动性模式估算的MSD位置不确定性问题, 然后研究了现有的两个解决方法(聚类法、时间窗口法)。我们还提出一个新方法, 可以克服上述两种方法的某些缺陷。将这三种方法用于MSD大数据, 我们发现, 不同的数据预处理方法, 可能会很大程度上改变数据的特征, 并进一步影响对人类流动性模式的描述和解释。对三种方法的输出结果, 我们计算了四个流动性指数(旅行数、活动点位数、停留总时间、活动熵), 发现三种方法对(与位置不确定性相关的)个体流动性估算有不同的影响。分析结果表明, 我们应更多地关注数据驱动流动性研究中的真实性问题、及其对地理空间研究的可复制性、可重现性的意义。

Los años recientes han sido testigos del incremento en el uso de los big data en la investigación sobre movilidad. Esos esfuerzos han conducido a un mayor conocimiento de la conducta relacionada con viajes y de los patrones de actividad de la gente. Pero a pesar de estos logros, el asunto de la veracidad de los datos y su impacto sobre los procesos del hallazgo de conocimiento rara vez ha sido discutido. En esta investigación abocamos la cuestión de la veracidad de datos señaladores móviles (MSD), cuando son usados para caracterizar patrones de movilidad humana. Primero que todo discutimos asuntos de incertidumbre de la localización en los MSD que obstaculizarían los cálculos exactos de los patrones de movilidad humana, seguido de un examen de dos métodos existentes para abordar estas cuestiones (el método basado en agrupamiento y el método basado en la ventana del tiempo). Luego, proponemos un enfoque nuevo que puede superar algunas de las limitaciones de estos dos métodos. Al aplicar todos los tres métodos a un conjunto de datos señaladores móviles a gran escala, encontramos que la escogencia de métodos de preprocesamiento podría inducir cambios en las características de los datos. Tales cambios, que no son triviales, afectarán aún más la caracterización e interpretación de los patrones de movilidad humana. Computando cuatro indicadores de movilidad (número de viajes de origen–destino, número de localizaciones de la actividad, tiempo total de permanencia y entropía de la actividad) a partir del producto de los tres métodos, ilustramos sus impactos variables sobre los cálculos de movilidad individual que sean relevantes para los asuntos de la incertidumbre locacional. Nuestros resultados del análisis reclaman mayor atención a la cuestión de la veracidad en investigación de movilidad orientada por datos y sus implicaciones para la replicabilidad y reproducibilidad de la investigación geoespacial.

Notes

1 When mobile phones are turned on but lose signals (e.g., traveling underground), no events or records are documented. Once phones regain signals, either emerging from underground or entering an area (e.g., subway station) where a cell tower signal is available, a CH event is triggered, which indicates a cellphone’s “movement” from one cell antenna to another.

Additional information

Funding

This research was jointly supported by the Research Grant Council of Hong Kong (No. 25610118), the National Natural Science Foundation of China (No. 41801372), National Key Research and Development Program of China (No. 2016YFB0502104), the Alvin and Sally Beaman Professorship, Arts and Sciences Excellence Professorship, and James and Catherine Ralston Family Fund at the University of Tennessee, Knoxville.

Notes on contributors

Yang Xu

YANG XU is an Assistant Professor in the Department of Land Surveying and Geo-Informatics at the Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. E-mail: [email protected]. His research interests include GIScience, human mobility, and urban informatics.

Xinyu Li

XINYU LI is a PhD Student in the Department of Land Surveying and Geo-Informatics at the Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. E-mail: [email protected]. His research interests include spatiotemporal data mining, deep learning, and geospatial artificial intelligence.

Shih-Lung Shaw

SHIH-LUNG SHAW is Alvin and Sally Beaman Professor and Arts and Sciences Excellence Professor in the Department of Geography at the University of Tennessee, Knoxville, TN 37996. E-mail: [email protected]. His research interests include transportation geography, human dynamics, GIScience, space–time geographic information systems (GIS), and GIS for transportation.

Feng Lu

FENG LU is a Professor of the Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China. E-mail: [email protected]. His research interests involve spatial data modeling, trajectory data mining, complex network analysis, knowledge graphs, and GIS for transportation.

Ling Yin

LING YIN is an Associate Professor in the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong Province 518055, China. E-mail: [email protected]. Her research interests include human dynamics, spatial epidemic models, space–time GIS, and GIS for transportation.

Bi Yu Chen

BI YU CHEN is a Professor in the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China. E-mail: [email protected]. His research interests include GIS for transportation, transport geography, and spatiotemporal big data analytics.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 312.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.