154
Views
3
CrossRef citations to date
0
Altmetric
Articles

Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 206-250 | Received 28 Dec 2021, Accepted 23 Dec 2022, Published online: 19 Feb 2023
 

ABSTRACT

Apache Sedona (formerly GeoSpark) is a new in-memory cluster computing system for processing large-scale spatial data, which extends the core of Apache Spark to support spatial datatypes, partitioning techniques, spatial indexes, and spatial operations (e.g. spatial range, nearest neighbor, and spatial join queries). Distance-based Join Queries (DJQs), like nearest neighbor join (kNNJQ) or closest pairs queries (kCPQ), are not supported by it. Therefore, in this paper, we investigate how to design and implement efficient DJQ distributed algorithms in Apache Sedona, using the most appropriate spatial partitioning and other optimization techniques. The results of an extensive set of experiments with real-world datasets are presented, demonstrating that the proposed kNNJQ and kCPQ distributed algorithms are efficient, scalable, and robust in Apache Sedona. Finally, Sedona is also compared to other similar cluster computing systems, showing the best performance for kCPQ and competitive results for kNNJQ.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

Additional information

Funding

Work of all authors was funded by the MINECO research project [TIN2017-83964-R] and the Spanish Ministry of Science and Innovation research project [PID2021-124124OB-I00].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 949.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.