154
Views
3
CrossRef citations to date
0
Altmetric
Articles

Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 206-250 | Received 28 Dec 2021, Accepted 23 Dec 2022, Published online: 19 Feb 2023

References

  • Ahmadi, Elham, and Mario A. Nascimento. 2016. “K-Closest Pairs Queries in Road Networks.” In MDM Conference, Porto, Portugal, June 13-16, 232–241.
  • Alam, Md. Mahbub, Luís Torgo, and Albert Bifet. 2021. “A Survey on Spatio-temporal Data Analytics Systems.” CoRR abs/2103.09883: 1–44.
  • Böhm, Christian, and Florian Krebs. 2004. “The k-Nearest Neighbour Join: Turbo Charging the KDD Process.” Knowledge and Information Systems 6 (6): 728–749. doi:10.1007/s10115-003-0122-9.
  • Chatzimilioudis, Georgios, Constantinos Costa, Demetrios Zeinalipour-Yazti, Wang-Chien Lee, and Evaggelia Pitoura. 2016. “Distributed In-Memory Processing of All K Nearest Neighbor Queries.” IEEE Transactions on Knowledge and Data Engineering 28 (4): 925–938. doi:10.1109/TKDE.2015.2503768.
  • Corral, Antonio, Yannis Manolopoulos, Yannis Theodoridis, and Michael Vassilakopoulos. 2000. “Closest Pair Queries in Spatial Databases.” In SIGMOD Conference, Dallas, Texas, USA, May 16-18, 189–200.
  • Corral, Antonio, Yannis Manolopoulos, Yannis Theodoridis, and Michael Vassilakopoulos. 2004. “Algorithms for Processing K-closest-pair Queries in Spatial Databases.” Data & Knowledge Engineering 49 (1): 67–104. doi:10.1016/j.datak.2003.08.007.
  • Damji, Jules S., Brooke Wenig, Tathagata Das, and Denny Lee. 2020. Learning Spark -- Lightning-fast Data Analysis. 2nd ed. O'Reilly.
  • de Carvalho Castro, Joao Pedro, Anderson Chaves Carniel, and Cristina Dutra de Aguiar Ciferri. 2020. “Analyzing Spatial Analytics Systems Based on Hadoop and Spark: A User Perspective.” Software -- Practice and Experience 50 (12): 2121–2144. doi:10.1002/spe.v50.12.
  • Doulkeridis, Christos, and Kjetil Nørvåg. 2014. “A Survey of Large-scale Analytical Query Processing in MapReduce.” VLDB Journal 23 (3): 355–380. doi:10.1007/s00778-013-0319-9.
  • Doulkeridis, Christos, Akrivi Vlachou, Nikos Pelekis, and Yannis Theodoridis. 2021. “A Survey on Big Data Processing Frameworks for Mobility Analytics.” SIGMOD Record 50 (2): 18–29. doi:10.1145/3484622.3484626.
  • Dritsas, Elias, Andreas Kanavos, Maria Trigka, Gerasimos Vonitsanos, Spyros Sioutas, and Athanasios K. Tsakalidis. 2020. “Trajectory Clustering and K-NN for Robust Privacy Preserving K-NN Query Processing in GeoSpark.” Algorithms 13 (8): 182. doi:10.3390/a13080182.
  • Eldawy, Ahmed, and Mohamed F. Mokbel. 2015. “SpatialHadoop: A MapReduce Framework for Spatial Data.” In ICDE Conference, Seoul, South Korea, April 13-17, 1352–1363.
  • Eldawy, Ahmed, and Mohamed F. Mokbel. 2016. “The Era of Big Spatial Data: A Survey.” Foundations and Trends in Databases 6 (3-4): 163–273. doi:10.1561/1900000054.
  • Eldawy, Ahmed, and Mohamed F. Mokbel. 2017. “The Era of Big Spatial Data.” Proceedings of the VLDB Endowment 10 (12): 1992–1995. doi:10.14778/3137765.3137828.
  • Fu, Zishan, Jia Yu, and Mohamed Sarwat. 2019. “Demonstrating GeoSparkSim: A Scalable Microscopic Road Network Traffic Simulator Based on Apache Spark.” In SSTD Conference, Vienna, Austria, August 19-21, 186–189.
  • García-García, Francisco, Antonio Corral, Luis Iribarne, and Michael Vassilakopoulos. 2017. “RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study.” In MEDI Conference, Barcelona, Spain, October 4-6, 200–207.
  • García-García, Francisco, Antonio Corral, Luis Iribarne, and Michael Vassilakopoulos. 2019. “MRSLICE: Efficient RkNN Query Processing in SpatialHadoop.” In MEDI Conference, Toulouse, France, October 28-31, 235–250.
  • García-García, Francisco, Antonio Corral, Luis Iribarne, and Michael Vassilakopoulos. 2020. “Improving Distance-Join Query Processing with Voronoi-Diagram Based Partitioning in SpatialHadoop.” Future Generation Computer Systems 111: 723–740. doi:10.1016/j.future.2019.10.037.
  • García-García, Francisco, Antonio Corral, Luis Iribarne, and Michael Vassilakopoulos. 2021. “Enhancing Sedona (formerly GeoSpark) with Efficient k Nearest Neighbor Join Processing.” In MEDI Conference, Tallinn, Estonia, June 21-23, 305–319.
  • García-García, Francisco, Antonio Corral, Luis Iribarne, Michael Vassilakopoulos, and Yannis Manolopoulos. 2016. “Enhancing SpatialHadoop with Closest Pair Queries.” In ADBIS Conference, Prague, Czech Republic, August 28-31, 212–225.
  • García-García, Francisco, Antonio Corral, Luis Iribarne, Michael Vassilakopoulos, and Yannis Manolopoulos. 2018. “Efficient Large-scale Distance-Based Join Queries in SpatialHadoop.” GeoInformatica 22 (2): 171–209. doi:10.1007/s10707-017-0309-y.
  • García-García, Francisco, Antonio Corral, Luis Iribarne, Michael Vassilakopoulos, and Yannis Manolopoulos. 2020. “Efficient Distance Join Query Processing in Distributed Spatial Data Management Systems.” Information Sciences 512: 985–1008. doi:10.1016/j.ins.2019.10.030.
  • Gounaris, Anastasios, and Jordi Torres. 2018. “A Methodology for Spark Parameter Tuning.” Big Data Research 11: 22–32. doi:10.1016/j.bdr.2017.05.001.
  • Kalyvas, Christos, and Manolis Maragoudakis. 2019. “Skyline and Reverse Skyline Query Processing in SpatialHadoop.” Data & Knowledge Engineering 122: 55–80. doi:10.1016/j.datak.2019.04.004.
  • Lee, Taewhi, Kisung Kim, and Hyoung-Joo Kim. 2012. “Join Processing Using Bloom Filter in MapReduce.” In RACS Conference, San Antonio, TX, USA, October 23-26, 100–105.
  • Leong, Hou U., Nikos Mamoulis, and Man Lung Yiu. 2008. “Computation and Monitoring of Exclusive Closest Pairs.” IEEE Transactions on Knowledge and Data Engineering 20 (12): 1641–1654. doi:10.1109/TKDE.2008.85.
  • Lu, Junwen, Guanfeng Liu, and Xianmei Hua. 2020. “Cloud-Based K-Closest Pairs Discovery in Dynamic Cyber-Physical-Social Systems.” IEEE Access 8: 70664–70675. doi:10.1109/Access.6287639.
  • Lu, Wei, Yanyan Shen, Su Chen, and Beng Chin Ooi. 2012. “Efficient Processing of K Nearest Neighbor Joins Using MapReduce.” Proceedings of the VLDB Endowment 5 (10): 1016–1027. doi:10.14778/2336664.2336674.
  • Manolopoulos, Yannis, Alexandros Nanopoulos, Apostolos N. Papadopoulos, and Yannis Theodoridis. 2006. R-Trees: Theory and Applications. Springer.
  • Mavrommatis, George, Panagiotis Moutafis, and Michael Vassilakopoulos. 2017a. “Binary Space Partitioning for Parallel and Distributed Closest-Pairs Query Processing.” International Journal on Advances in Software 10 (3–4): 275–285. https://www.thinkmind.org/articles/soft_v10_n34_2017_10.pdf.
  • Mavrommatis, George, Panagiotis Moutafis, and Michael Vassilakopoulos. 2017b. “Closest-Pairs Query Processing in Apache Spark.” In Cloud Computing Conference, Athens, Greece, February 19-23, 26–31.
  • Mavrommatis, George, Panagiotis Moutafis, Michael Vassilakopoulos, Francisco García-García, and Antonio Corral. 2017. “SliceNBound: Solving Closest Pairs and Distance Join Queries in Apache Spark.” In ADBIS Conference, Nicosia, Cyprus, September 24-27, 199–213.
  • Moutafis, Panagiotis, Francisco García-García, George Mavrommatis, Michael Vassilakopoulos, Antonio Corral, and Luis Iribarne. 2021. “Algorithms for Processing the Group K Nearest-neighbor Query on Distributed Frameworks.” Distributed Parallel Databases 39 (3): 733–784. doi:10.1007/s10619-020-07317-8.
  • Nodarakis, Nikolaos, Evaggelia Pitoura, Spyros Sioutas, Athanasios K. Tsakalidis, Dimitrios Tsoumakos, and Giannis Tzimas. 2016. “kdANN+: A Rapid AkNN Classifier for Big Data.” Trans. Large-Scale Data- and Knowledge-Centered Systems 24: 139–168. doi:10.1007/978-3-662-49214-7_5.
  • Pandey, Varun, Andreas Kipf, Thomas Neumann, and Alfons Kemper. 2018. “How Good Are Modern Spatial Analytics Systems?.” Proceedings of the VLDB Endowment 11 (11): 1661–1673. doi:10.14778/3236187.3236213.
  • Rigaux, Philippe, Michel Scholl, and Agnès Voisard. 2002. Spatial Databases -- with Applications to GIS. Elsevier.
  • Roumelis, George, Michael Vassilakopoulos, Antonio Corral, and Yannis Manolopoulos. 2016. “New Plane-sweep Algorithms for Distance-based Join Queries in Spatial Databases.” GeoInformatica 20 (4): 571–628. doi:10.1007/s10707-016-0246-1.
  • Schiller, Jochen H., and Agnès Voisard. 2004. Location-Based Services. Morgan Kaufmann.
  • Shi, Juwei, Yunjie Qiu, Umar Farooq Minhas, Limei Jiao, Chen Wang, Berthold Reinwald, and Fatma Özcan. 2015. “Clash of the Titans: MapReduce Vs. Spark for Large Scale Data Analytics.” Proceedings of the VLDB Endowment 8 (13): 2110–2121. doi:10.14778/2831360.2831365.
  • Tang, Mingjie, Yongyang Yu, Ahmed R. Mahmood, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref. 2020. “LocationSpark: In-memory Distributed Spatial Query Processing and Optimization.” Frontiers Big Data 3: 30. doi:10.3389/fdata.2020.00030.
  • Tang, MingJie, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref. 2016. “LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data.” Proceedings of the VLDB Endowment 9 (13): 1565–1568. doi:10.14778/3007263.3007310.
  • Velentzas, Polychronis, Antonio Corral, and Michael Vassilakopoulos. 2021. “Big Spatial and Spatio-Temporal Data Analytics Systems.” Transactions on Large-Scale Data- and Knowledge-Centered Systems 47: 155–180. doi:10.1007/978-3-662-62919-2_7.
  • Vu, Tin, Ahmed Eldawy, Vagelis Hristidis, and Vassilis J. Tsotras. 2021. “Incremental Partitioning for Efficient Spatial Data Analytics.” Proceedings of the VLDB Endowment 15 (3): 713–726. doi:10.14778/3494124.3494150.
  • Xie, Dong, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. “Simba: Efficient In-Memory Spatial Analytics.” In SIGMOD Conference, San Francisco, CA, USA, June 26-July 01, 1071–1085.
  • You, Simin, Jianting Zhang, and Le Gruenwald. 2015. “Large-Scale Spatial Join Query Processing in Cloud.” In ICDE Workshops, Seoul, South Korea, April 13-17, 34–41.
  • Yu, Jia, Zongsi Zhang, and Mohamed Sarwat. 2018. “GeoSparkViz: A Scalable Geospatial Data Visualization Framework in the Apache Spark Ecosystem.” In SSDBM Conference, Bozen-Bolzano, Italy, July 09-11, 15:1–15:12.
  • Yu, Jia, Zongsi Zhang, and Mohamed Sarwat. 2019. “Spatial Data Management in Apache Spark: the GeoSpark Perspective and Beyond.” GeoInformatica 23 (1): 37–78. doi:10.1007/s10707-018-0330-9.
  • Zeidan, Ayman, and Huy T. Vo. 2022. “Efficient Spatial Data Partitioning for Distributed KNN Joins.” Journal of Big Data 9 (1): 77. doi:10.1186/s40537-022-00587-2.
  • Zhang, Hao, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. “In-Memory Big Data Management and Processing: A Survey.” IEEE Transactions on Knowledge and Data Engineering 27 (7): 1920–1948. doi:10.1109/TKDE.2015.2427795.
  • Zhang, Chi, Feifei Li, and Jeffrey Jestes. 2012. “Efficient Parallel kNN Joins for Large Data in MapReduce.” In EDBT Conference, Berlin, Germany, March 27-30, 38–49.
  • Zhao, Xujun, Jifu Zhang, and Xiao Qin. 2018. “kNN-DP: Handling Data Skewness in kNN Joins Using MapReduce.” IEEE Transactions on Parallel and Distributed Systems 29 (3): 600–613. doi:10.1109/TPDS.71.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.