ABSTRACT
Exploratory analysis is an important tool to formulate hypotheses about data and build data-driven models. To efficiently explore massive movement datasets, researcher and analysts require appropriate exploratory analysis tools. However, there is a lack of appropriate tools for movement data exploration that can handle large data volumes. We therefore propose a novel scalable distributed exploratory analysis model for massive movement datasets with billions of records which we call .
is more flexible than classical aggregation approaches that use grids with aggregate statistics and it can be updated incrementally with large amounts of data. We demonstrate this new model and its implementation in Apache Spark using massive ship and vehicle movement data with up to 3.9 billion records.
Acknowledgements
We would like to thank the Danish Maritime Authority for providing the AIS data and Taxi 31300 for providing the floating car data used in this paper.
Data and codes availability statement
All relevant data were obtained from third parties. The Danish AIS data is available from the Danish Maritime Authority at ftp://ftp.ais.dk/ais_data/. A one month sample is also available from Figshare under the identifier https://doi.org/10.6084/m9.figshare.11577543. The taxi FCD data are not publicly available due to third party restrictions. The source code for this article is available from Figshare under https://doi.org/10.6084/m9.figshare.11658906.
Disclosure statement
The authors declare no conflict of interest.
Notes
1. https://spark.apache.org/.
2. https://ambari.apache.org/.
3. https://geomesa.org/.
5. https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/sql/expressions/Aggregator.html.
6. https://qgis.org.
7. ftp://ftp.ais.dk/ais_data/.
8. https://www.postgresql.org.
Additional information
Funding
Notes on contributors
Anita Graser
Anita Graser is a data scientist at the Center for Mobility Systems at the AIT Austrian Institute of Technology and a PhD student at the Department of Geoinformatics at the University of Salzburg. Her research interests include GIScience and mobility research, focusing on the analysis of movement data.
Peter Widhalm
Peter Widhalm is a data scientist at the Center for Mobility Systems at the AIT Austrian Institute of Technology. His research interests include machine learning and artificial intelligence for mobility and transport applications.
Melitta Dragaschnig
Melitta Dragaschnig is a research engineer at the Center for Mobility Systems at the AIT Austrian Institute of Technology. Her research focuses on the development of efficient methods for working with data in the mobility and transportation domain.