Views

CrossRef citations to date

Altmetric

Articles

Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems

Francisco García-Garcíaa Dept. of Informatics, University of Almeria, Almeria, Spain

https://orcid.org/0000-0001-7208-1661 View further author information

Antonio Corrala Dept. of Informatics, University of Almeria, Almeria, SpainCorrespondence[email protected]

https://orcid.org/0000-0002-0069-4642 View further author information

Luis Iribarnea Dept. of Informatics, University of Almeria, Almeria, Spain

https://orcid.org/0000-0003-1815-4721 View further author information

Michael Vassilakopoulosb Dept. of Electrical and Computer Engineering, University of Thessaly, Volos, Greece

https://orcid.org/0000-0003-2256-5523 View further author information

ABSTRACT

Apache Sedona (formerly GeoSpark) is a new in-memory cluster computing system for processing large-scale spatial data, which extends the core of Apache Spark to support spatial datatypes, partitioning techniques, spatial indexes, and spatial operations (e.g. spatial range, nearest neighbor, and spatial join queries). Distance-based Join Queries (DJQs), like nearest neighbor join (kNNJQ) or closest pairs queries (kCPQ), are not supported by it. Therefore, in this paper, we investigate how to design and implement efficient DJQ distributed algorithms in Apache Sedona, using the most appropriate spatial partitioning and other optimization techniques. The results of an extensive set of experiments with real-world datasets are presented, demonstrating that the proposed kNNJQ and kCPQ distributed algorithms are efficient, scalable, and robust in Apache Sedona. Finally, Sedona is also compared to other similar cluster computing systems, showing the best performance for kCPQ and competitive results for kNNJQ.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Available at https://spark.apache.org/

2 Available at https://sedona.apache.org/download/

3 Available at https://github.com/purduedb/LocationSpark

4 Available at http://www.cs.utah.edu/~dongx/simba/

5 Available at https://github.com/acgtic211/LocationSpark/tree/DJQ

6 See https://github.com/apache/incubator-sedona/

7 See https://sedona.apache.org/setup/overview/

8 Available at https://github.com/acgtic211/incubator-sedona/tree/KNNJ

9 Available at https://github.com/acgtic211/incubator-sedona/tree/KCP

10 Available at http://spatialhadoop.cs.umn.edu/datasets.html

11 Available at https://github.com/apache/incubator-sedona

12 Available at https://github.com/purduedb/LocationSpark

13 Available at https://github.com/locationtech/jts

14 See https://sedona.apache.org/setup/overview/

Additional information

Funding

Work of all authors was funded by the MINECO research project [TIN2017-83964-R] and the Spanish Ministry of Science and Innovation research project [PID2021-124124OB-I00].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems

Information for

Open access

Opportunities

Help and information

Efficient distributed algorithms for distance join queries in spark-based spatial analytics systems

ABSTRACT

Disclosure statement

Notes

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature