2,505
Views
14
CrossRef citations to date
0
Altmetric
Articles

Semantic typing of linked geoprocessing workflows

ORCID Icon & ORCID Icon
Pages 113-138 | Received 26 Sep 2016, Accepted 08 Mar 2017, Published online: 04 Apr 2017
 

Abstract

In Geographic Information Systems (GIS), geoprocessing workflows allow analysts to organize their methods on spatial data in complex chains. We propose a method for expressing workflows as linked data, and for semi-automatically enriching them with semantics on the level of their operations and datasets. Linked workflows can be easily published on the Web and queried for types of inputs, results, or tools. Thus, GIS analysts can reuse their workflows in a modular way, selecting, adapting, and recommending resources based on compatible semantic types. Our typing approach starts from minimal annotations of workflow operations with classes of GIS tools, and then propagates data types and implicit semantic structures through the workflow using an OWL typing scheme and SPARQL rules by backtracking over GIS operations. The method is implemented in Python and is evaluated on two real-world geoprocessing workflows, generated with Esri's ArcGIS. To illustrate the potential applications of our typing method, we formulate and execute competency questions over these workflows.

Acknowledgments

We would like to thank the anonymous reviewers for their helpful suggestions. Furthermore, we need to thank Tom de Jong as well as our students for providing example workflows and illustrations for this paper. Finally, we are grateful for discussions with Werner Kuhn, Edzer Pebesma and others who helped us shape these ideas.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1. E.g. See ArcGIS ModelBuilder http://pro.arcgis.com/en/pro-app/help/analysis/geoprocessing and ArcGIS Workflow Manager for Server http://server.arcgis.com/en/workflow-manager

2. See for example the Mapbox API: https://www.mapbox.com/api-documentation

7. Examples include ESRI's ArcGIS ModelBuilder and Workflow Manager for Server, Kepler https://kepler-project.org, Taverna http://www.taverna.org.uk, and Orange http://orange.biolab.si.

9. Examples include W3C PROV (https://www.w3.org/TR/prov-overview), OPMW (http://www.opmw.org), and PWO (http://purl.org/spar/pwo).

12. This is comparable to the distinction between dimension and measure in OLAP. See also Sinton's notions of control, fix and measure (Sinton Citation1978).

16. A blank node is an RDF resource without URI, acting like an unknown or a variable.

18. A signature introduces functions in a formal program together with their types of input and output.

20. The algorithms are implemented in Python in the code base, using the RDFlib library (https://rdflib.readthedocs.io).

21. The full specifications are available online in the code base. In case an operation mentioned in the text is unknown to the reader, we suggest to consult these resources.

26. This workflow was used in a simplified form in a GIS course, convened by Andrea Ballatore at Birkbeck, University of London.

28. The full specification is available at https://github.com/simonscheider/SemGeoWorkflows/workflows

29. A 64 bit machine with Intel Core i7-5500U CPU at 2.4 GHz.

30. Note that the implementation http://github.com/RDFLib/OWL-RL by Ivan Hermann is not particularly optimized, see http://www.ivan-herman.net/Misc/2008/owlrl.