Abstract
Recent years have witnessed an increasing use of big data in mobility research. Such efforts have led to many insights on the travel behavior and activity patterns of people. Despite these achievements, the data veracity issue and its impact on the processes of knowledge discovery have seldom been discussed. In this research, we investigate the veracity issue of mobile signaling data (MSD) when they are used to characterize human mobility patterns. We first discuss the location uncertainty issues in MSD that would hinder accurate estimations of human mobility patterns, followed by an examination of two existing methods for addressing these issues (clustering-based method and time window–based method). We then propose a new approach that can overcome some of the limitations of these two methods. By applying all three methods to a large-scale mobile signaling data set, we find that the choice of preprocessing methods could lead to changes in the data characteristics. Such changes, which are nontrivial, will further affect the characterization and interpretation of human mobility patterns. By computing four mobility indicators (number of origin–destination trips, number of activity locations, total stay time, and activity entropy) from the outputs of the three methods, we illustrate their varying impacts on individual mobility estimations relevant to location uncertainty issues. Our analysis results call for more attention to the veracity issue in data-driven mobility research and its implications for replicability and reproducibility of geospatial research.
近年来, 出现了越来越多的基于大数据的流动性研究, 揭示了人们在旅行中的行为和活动模式。但是, 数据真实性问题及其对知识发现过程的影响, 目前鲜有研究。本文探讨了手机信令数据(MSD)在描述流动性模式中的数据真实性问题。首先讨论了影响人类流动性模式估算的MSD位置不确定性问题, 然后研究了现有的两个解决方法(聚类法、时间窗口法)。我们还提出一个新方法, 可以克服上述两种方法的某些缺陷。将这三种方法用于MSD大数据, 我们发现, 不同的数据预处理方法, 可能会很大程度上改变数据的特征, 并进一步影响对人类流动性模式的描述和解释。对三种方法的输出结果, 我们计算了四个流动性指数(旅行数、活动点位数、停留总时间、活动熵), 发现三种方法对(与位置不确定性相关的)个体流动性估算有不同的影响。分析结果表明, 我们应更多地关注数据驱动流动性研究中的真实性问题、及其对地理空间研究的可复制性、可重现性的意义。
Los años recientes han sido testigos del incremento en el uso de los big data en la investigación sobre movilidad. Esos esfuerzos han conducido a un mayor conocimiento de la conducta relacionada con viajes y de los patrones de actividad de la gente. Pero a pesar de estos logros, el asunto de la veracidad de los datos y su impacto sobre los procesos del hallazgo de conocimiento rara vez ha sido discutido. En esta investigación abocamos la cuestión de la veracidad de datos señaladores móviles (MSD), cuando son usados para caracterizar patrones de movilidad humana. Primero que todo discutimos asuntos de incertidumbre de la localización en los MSD que obstaculizarían los cálculos exactos de los patrones de movilidad humana, seguido de un examen de dos métodos existentes para abordar estas cuestiones (el método basado en agrupamiento y el método basado en la ventana del tiempo). Luego, proponemos un enfoque nuevo que puede superar algunas de las limitaciones de estos dos métodos. Al aplicar todos los tres métodos a un conjunto de datos señaladores móviles a gran escala, encontramos que la escogencia de métodos de preprocesamiento podría inducir cambios en las características de los datos. Tales cambios, que no son triviales, afectarán aún más la caracterización e interpretación de los patrones de movilidad humana. Computando cuatro indicadores de movilidad (número de viajes de origen–destino, número de localizaciones de la actividad, tiempo total de permanencia y entropía de la actividad) a partir del producto de los tres métodos, ilustramos sus impactos variables sobre los cálculos de movilidad individual que sean relevantes para los asuntos de la incertidumbre locacional. Nuestros resultados del análisis reclaman mayor atención a la cuestión de la veracidad en investigación de movilidad orientada por datos y sus implicaciones para la replicabilidad y reproducibilidad de la investigación geoespacial.
Palabras clave:
Notes
1 When mobile phones are turned on but lose signals (e.g., traveling underground), no events or records are documented. Once phones regain signals, either emerging from underground or entering an area (e.g., subway station) where a cell tower signal is available, a CH event is triggered, which indicates a cellphone’s “movement” from one cell antenna to another.
Additional information
Funding
Notes on contributors
Yang Xu
YANG XU is an Assistant Professor in the Department of Land Surveying and Geo-Informatics at the Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. E-mail: [email protected]. His research interests include GIScience, human mobility, and urban informatics.
Xinyu Li
XINYU LI is a PhD Student in the Department of Land Surveying and Geo-Informatics at the Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. E-mail: [email protected]. His research interests include spatiotemporal data mining, deep learning, and geospatial artificial intelligence.
Shih-Lung Shaw
SHIH-LUNG SHAW is Alvin and Sally Beaman Professor and Arts and Sciences Excellence Professor in the Department of Geography at the University of Tennessee, Knoxville, TN 37996. E-mail: [email protected]. His research interests include transportation geography, human dynamics, GIScience, space–time geographic information systems (GIS), and GIS for transportation.
Feng Lu
FENG LU is a Professor of the Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China. E-mail: [email protected]. His research interests involve spatial data modeling, trajectory data mining, complex network analysis, knowledge graphs, and GIS for transportation.
Ling Yin
LING YIN is an Associate Professor in the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong Province 518055, China. E-mail: [email protected]. Her research interests include human dynamics, spatial epidemic models, space–time GIS, and GIS for transportation.
Bi Yu Chen
BI YU CHEN is a Professor in the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China. E-mail: [email protected]. His research interests include GIS for transportation, transport geography, and spatiotemporal big data analytics.