567
Views
0
CrossRef citations to date
0
Altmetric
Original Research Article

A mediation system for continuous spatial queries on a unified schema using Apache Spark

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 115-141 | Received 29 Aug 2022, Accepted 23 Oct 2023, Published online: 09 Nov 2023

Figures & data

Figure 1. Phases of query planning in SparkSQL (Armbrust et al., Citation2015).

Figure 1. Phases of query planning in SparkSQL (Armbrust et al., Citation2015).

Table 1. Comparison of architectural characteristics of Spark, Storm and Flink (Chintapalli et al., Citation2016; Inoubli et al., Citation2018).

Table 2. The four popular big spatio-temporal systems (Alam et al., Citation2021).

Figure 2. Local and integrated schema.

Figure 2. Local and integrated schema.

Table 3. The relations in the integrated schema.

Figure 3. Running query example Q.

Figure 3. Running query example Q.

Figure 4. System architecture.

Figure 4. System architecture.

Figure 5. Procedure of rewriting queries.

Figure 5. Procedure of rewriting queries.

Figure 6. Running query example Q: query syntax tree.

Figure 6. Running query example Q: query syntax tree.

Table 4. Dataframe transformation description.

Figure 7. Running query example Q: Spark application DAG representation (A) and optimizer Spark application DAG representations (B).

Figure 7. Running query example Q: Spark application DAG representation (A) and optimizer Spark application DAG representations (B).

Table 5. Dataset of building and commune.

Figure 8. Snippet of queries.

Figure 8. Snippet of queries.

Figure 9. λ=1 with four worker nodes.

Figure 9. λ=1  with four worker nodes.

Figure 10. λ=1 with eight worker nodes.

Figure 10. λ=1 with eight worker nodes.

Figure 11. λ=10 with eight worker nodes.

Figure 11. λ=10 with eight worker nodes.

Figure 12. λ=100 with eight worker nodes.

Figure 12. λ=100  with eight worker nodes.

Table 6. Amount of shuffled data for three queries.

Supplemental material

Supplemental Material

Download MS Word (747.2 KB)

Data availability statement

The data that support the findings of this study are openly available in GitHub at https://github.com/AnnaNgo13/streamgeomed.