347
Views
5
CrossRef citations to date
0
Altmetric
Research Papers

Harmonisation requirements and capabilities towards a European spatial data infrastructure (ESDI): the HUMBOLDT protected areas scenario

, , &
Pages 417-438 | Received 05 Nov 2010, Accepted 27 Apr 2011, Published online: 09 Jun 2011

Abstract

The HUMBOLDT project has the aim of implementing a Framework for harmonisation of data and services in the geoinformation domain, under the Infrastructure for Spatial Information in Europe (INSPIRE) Directive and in the context of the Global Monitoring for Environment and Security (GMES) Initiative. The two-pronged approach of HUMBOLDT comprises a technical side of software framework development and an application side of scenario testing and validation. Among the HUMBOLDT Application Scenarios designed to demonstrate the capabilities of the Framework there is the one covering Protected Areas themes and use cases. It aims to transform geoinformation, managed by park authorities, into a seamless flow that combines multiple information sources from different governance levels (European, national, regional), and exploits this newly combined information for the purposes of planning, management and tourism promotion. The Scenario constitutes a step further towards the integration of monitoring systems envisaged in the view of Digital Earth. Protected Areas Scenario creates an examples of the use of the HUMBOLDT tools in Desktop and Web GIS environment, together with setting up a server environment exploiting HUMBOLDT harmonisation framework as taking into account user requirements and needs and providing benefits for making the road to ESDI establishment easier.

Introduction

Harmonised geoinformation is a basic need for fulfilling the task of creating a Spatial Data Infrastructure (at scales ranging from regional to global) which is reliable and efficient, in which different data sources and different services for discovery, portrayal and retrieval of geodata are a fundamental asset (Annoni and Smits Citation2003, Bernard et al. Citation2005).

Digital Earth vision (Gore Citation1999) envisages new perspectives and points of view for all the scientific disciplines and technical sectors linked to geoinformation sharing and utilisation, to a better knowledge and management of our planet. A crucial point in Digital Earth vision is the integration of services, tools and data (Grossner et al. Citation2008).

At the European level, the road to geoinformation sharing and integration is deeply inscribed into the process of implementation of a European Spatial Data Infrastructure (ESDI) that follows the guidelines contained in the INSPIRE (Infrastructure for Spatial Information in Europe) Directive of the European Union (Commission of the EC Citation2007). The INSPIRE Directive consists of a regulation framework for European Union geodata aiming at enforcing the use of best practices and easy-to-use and integrated interfaces for the benefit of geoinformation users (virtually, every citizen of the EU), thus making possible the creation of an ESDI. In this context, the structure of an ESDI shall be composed of a set of interoperable, interacting services, thus following the Service Oriented Architecture (SOA) paradigm (MacKenzie et al. Citation2006). Such an architecture well matches the distributed responsibilities regarding service provision and data management in the geoinformation sector. For an SOA to work, an essential element is to select or build on a group of interface standards that are mutually interoperable and complementary (Smits and Friis-Christensen Citation2007). For this, any new component for SDI should be interoperable with the existing services codified by geospatial standardisation organisations, and in particular the Open Geospatial Consortium (OGC) at international level (McKee Citation2001). On this topic, as Web Mapping Service (WMS), Web Feature Service (WFS), Web Coverage Service (WCS), Web Processing Service (WPS), and Catalogue Services for the Web (CSW) are the most relevant for the geoinformation domain covered by the HUMBOLDT project and this work. Besides these well-established standards, there are various areas where standardisation has not come very far yet or where there are multiple competing standards (Ziegler and Dittrich Citation2004). The HUMBOLDT project focused on the objective of enabling harmonisation instances, especially the ones not covered by existing standardised procedures, as a whole; that is to say from harmonisation instances definition to harmonisation performing. Its developments must therefore be very flexible with respect to their configuration and modes of deployment to fit into existing Spatial Data Infrastructures. The main advancements and outcomes in enabling and making easier geodata harmonisation are related to: management of cross-border geoinformation, enabling of cross-domains geoinformation integration, and enhancement of geodata access.

The data harmonisation process involves both technical and non-technical aspects. Nevertheless this work has been mainly centred on technical aspects. Even if this work is focused on technical and scientific aspects of harmonisation processes, a strict distinction between technical and non-technical aspects can often not be made because of existing interdependencies, since a harmonisation process can either be executed on an organisational level (through a common agreement of all parties involved in the harmonisation process) or on a technical level (that is providing all involved parties with a tool that supports a harmonised solution).

During recent years, some projects and technical tools as well as scientific works have been dealing with solving harmonisation issues. An outcome of the RISE project (Eriksson and Hartnor Citation2006, Portele Citation2006) was the provision of a general data harmonisation methodology that can be applied to spatial data, through the development of harmonised product specifications following some steps carried out ‘manually’ or rather by human beings (experts).

Other projects dealt mainly with ontological issues, as for example the HarmonISA-project (Hall Citation2006) that aimed at developing a set of tools to semi-automatically integrate different land-use datasets through an expert-driven approach supported by the used and developed software such as ontology editors.

On the other hand, the application of the Model Driven Approach (MDA) to harmonisation process definition and implementation has given momentum to the development of some tools and languages for enabling application schema mapping as one of the main issues to be solved in the field of geodata harmonisation handling, such as UGAS-ShapeChange, a Tool for changing ISO-19109 UML-models to GML Application Schemas. An important step forward in state of the art for heterogeneous geodata transformation and transfer was marked by INTERLIS (Gnägi et al. Citation2006), a conceptual schema language (CSL; customised profile of ISO 19103/19109) based on MDA (Model Driven Architecture) as well as a system-neutral transfer format. INTERLIS facilitates data transfer between different data stores with different data models via transformation to/from a common data model and using a common interchange format and it supports both spatial and non-spatial data. Some solutions have also been developed in an integrated web-based frame as OGC services compliant products useful in cross-border applications, and namely the mdWFS (model driven Web Feature Service), adding a theoretical approach to the capabilities of handling schema translation between data models on-the-fly (Donaubauer et al. Citation2006).

All those efforts in the field of geodata harmonisation approach have tackled only one harmonisational issues at a time (e.g. schema mapping, harmonised catalogue search services, language translation and ontology). The HUMBOLDT project, and the work described in this paper, delivered a framework that is both a theoretical one and a framework of software tools that can handle the harmonisation process as a whole, tackling multiple harmonisational issues as instances of the same harmonisational process.

Within the scope of our work and after a first analysis of harmonisation-related studies and efforts, the following working definition for data harmonisation was taken into account in the HUMBOLDT project environment:

Geodata harmonisation implies and means the possibility to combine data from heterogeneous sources into seamlessly integrated, consistent and unambiguous information products, in an easy and repeatable way, adapted to the end-user's requirements and context. (Schulze Althoff and Giger Citation2009)

For better comprehension, it is useful to approach the different needs for data harmonisation through a list of undistinguished causes of heterogeneity in spatial data (Villa et al. Citation2007). Heterogeneity in the case of spatial (geographic, atmospheric, geological) data is for example caused by differences in:

Data format

Data collection procedures/data quality

Spatial reference system

Data/conceptual model: structure and constraints

Metadata model

Nomenclature, classification, taxonomy, terminology/vocabulary, thesaurus, ontology (Tikunov et al. Citation2008)

Scale, degree/amount of detail, extent (spatial, thematic, temporal)

Portrayal (legend/classification, style)

Processing functions, their parameters and formulas/algorithms

Geodata harmonisation within the HUMBOLDT project

The HUMBOLDT project, built by EU founding and support, contributes to the implementation of a ESDI that integrates the diversity of spatial data available for a multitude of European organisations. The main goal of HUMBOLDT is to enable organisations to document, publish and harmonise their spatial information in a way that is as seamless as possible. The software tools and processes created demonstrate the feasibility and advantages of the INSPIRE Directive. Moreover, the overall outcomes of the project will contribute to the technological advance of data harmonisation and sharing, integrating therefore with the vision of Digital Earth and especially through the joint EU and ESA initiative Global Monitoring for Environment and Security (GMES), which is the main European contribution to GEOSS (European Commission Citation2008, GEO Citation2009). The contribution of the project to GMES consists of the demonstration of usefulness of harmonisational capabilities in application areas that are or will be covered by GMES services, especially downstream services.

The approach of the HUMBOLDT project for solving raising harmonisation issues and user needs displays a two-pronged structure (see ) and focuses on integrating both concrete application requirements but also technical innovations, best practices and research results (Villa et al. Citation2008).

Figure 1.  The approach of the HUMBOLDT project to geodata harmonisation with its two-pronged structure made up of a technological side and application momentum converging into the implementation of a Framework architecture.

Figure 1.  The approach of the HUMBOLDT project to geodata harmonisation with its two-pronged structure made up of a technological side and application momentum converging into the implementation of a Framework architecture.

The benefits brought via harmonised spatial data using HUMBOLDT tools are not only resulting in the reduction of implementing efforts and costs for the future ESDI, together with an abatement of costs related to ESDI realisation and easier geodata handling but, also, dealing with the technical and scientific outcomes of the project, a sensible advancement has been given to:

Support to cross-borders geoinformation management.

Enabling cross-application domains geoinformation sharing and integration, thus affecting scientific fields of analysis not only in directly related geosciences, but also bringing a wide benefit to socio-economic studies, statistical analysis, civil protection and security, medical and epidemiological issues (Vanderhaegen and Muro Citation2005).

Overcoming limitations inhered in spatial data availability from incompatible data formats to semantic gaps related to lacking data and metadata models.

Enabling access to geospatial services not available or not usable at this very moment, using current technological solutions because of inconsistencies in data definitions and formats or lacking of data documentation and modelling.

Creation of new information through the access to additional data and services, affecting the decision-making process and making it more comprehensive (in the fields of social security, environmental issues and infrastructure planning, for instance).

Enhancing and facilitating of data and services access and distribution.

An essential element of the application driven side of the project is the development of the so-called HUMBOLDT Scenarios, in which the different components of the HUMBOLDT software Framework for geodata harmonisation are applied and tested under realistic conditions and that represent different GMES application fields:

Border Security: Effective Border Control and Security in Rural Areas

Urban Planning: European Urban Management Information Systems

Urban Atlas: Enforcing GMES in Urban Areas Mapping Core Services

Forest: Saxony & Czech Cross-Border Forest Scenario

Protected Areas: Management of Protected Areas

ERiskA: Environmental Risk

Transboundary Catchments: Cross-border Water Basin Management

Ocean: Oil/Contaminants Spill Crisis Impact and Management, expanding the experience done with SeaDataNet (Schaap and Lowry Citation2010)

Atmosphere: Integration for Atmospheric Data Distribution

The contribution of the HUMBOLDT project to GMES, already mentioned above, is delivered through the HUMBOLDT Scenarios, and is not a direct contribution to current projects or core services, but instead an indirect one, centred on the benefit brought to GMES more specialised and local services by geodata harmonisation. For the listed Scenarios, a process analysis has been done, and this has shown the steps necessary to harmonise data and metadata. A practical demonstration of this Scenarios analysis and results is the main focus of this paper, and it will be later described in detail in the next sections, dealing with applications covering Protected Areas geoinformation domain.

The HUMBOLDT framework for geodata harmonisation

At the core of the development work described here on the topic of geodata harmonisation processes stands the HUMBOLDT Framework, which consists of a software architecture targeted at performing harmonisational instances and process. This software framework is the hull for the various data harmonisation application scenarios tested during the project.

The Architecture of the HUMBOLDT Framework, SOAP-based, has been centred on an approach that comprises the fundamental part of a Mediator Service (see ), a proxy that acts as controller of the service components that are part of the Framework for service integration. It offers a number of standard OGC interfaces like WMS, WFS or WCS to clients. The HUMBOLDT Mediator Service combines a number of different functionalities and hides them behind standard OGC interfaces. It is a Workflow Engine, capable of executing chains of geoprocessing services as well as a Feature Portrayal Service, dynamically portraying Features and serving them via the OGC WMS interface. The Mediator Service orchestrates a set of more specialised interfaces that are also integrated in the overall architecture (Fitzner and Reitz Citation2009), and all together they compose the framework itself, as shown in . These components are briefly described as:

Figure 2.  Overview of the HUMBOLDT Framework Components.

Figure 2.  Overview of the HUMBOLDT Framework Components.

The HUMBOLDT GeoModel Editor

An easy-to-use editor for application experts, aiming at collecting all required information on the geodata inputs. The HUMBOLDT GeoModel Editor is producing and providing a graphical and a textual representation of the data model containing basic spatial data types. It was implemented on a model-based framework (Eclipse) and thus is able to support the so-called vertical mapping, which is the serialisation to transfer standards or other representations (e.g. XMI, GML, INTERLIS, ISO19131).

The HUMBOLDT Alignment Editor (HALE)

A tool with a rich graphical user interface for defining mappings between concepts in conceptual schemas (application schemas created with the HUMBOLDT GeoModel Editor), as well as for defining transformations between attributes of these schemas. The HUMBOLDT Alignment Editor has several properties that make it stand out from other data transformation definition tools (Reitz and Kuijper Citation2009).

The Workflow Design and Construction Service (WDCS)

The Workflow Design and Construction Service is a tool with a graphical user interface that allows users to register, manage and graphically compose geoprocessing components into workflows that are the schematic representations of the harmonisation processes chain, mainly composed of WPS either incapsulated into the framework or external. The Workflow Frontend therefore offers a quite similar functionality as e.g. the GUI of the ArcGIS Model Builder or a BPEL Workflow Designer.

The Geodatabase/Repository service

Aside from processing tools and transformation components, the Geodatabase and Repository services support catalogue enhanced search, through the Information Grounding Service (IGS) and the Model Repository. The IGS is a cascading catalogue in the sense that it holds information on other catalogues and metadata stores, in addition to metadata of data sources. What makes the Information Grounding Service unique is that it does not only return those data sources that directly satisfy a user request but additionally those that can potentially be transformed to satisfy the request. This makes a new concept in catalogue services delivering for geospatial information and it supports the vision of the formulation of demand versus provision of directly usable information. The Model Repository is a service component that allows maintenance of application schemas (e.g. those created with the HUMBOLDT GeoModel Editor) and mappings between those (e.g. those created with the HUMBOLDT Alignment Editor) for future reuse.

A set of Transformation Services

The working facade of the harmonisation framework, the Transformation Services are in charge of the actual transform of the data, following the harmonisation requests made using other components. They mainly consist of WPS, such as:

Coordinate reference system transformation serviceThe Coordinate Transformation allows transforming coordinates between various geographic reference systems, i.e. geoids and projections.

Conceptual schema transformation serviceThe Conceptual Schema Transformer is able to apply a schema transformation to a source dataset expressed in a certain Application Schema.

Multiple-representation merging serviceThe Multiple Representation Merging Service (MRM) is capable of fusing Features of datasets with a spatial overlap, such as along a common border, where water bodies are part of both datasets.

Edge-matching serviceThe Edge Matching Service aligns edges and points of vector geometries so that they will be gapless.

Language transformation serviceThe Language Transformation transforms single terms from one language to another and enables the language transformation of datasets.

Being that the harmonisation software framework is very articulated and flexible, its deployability depends upon the user's needs and requirements, covering a broad range of harmonisation issues (e.g. format conversions, multilinguality, schema mapping). Nevertheless, the transformation and harmonisation process and its feasibility and efficiency, strongly depends on the availability of the description of transformation rules from the viewpoint of the conceptual schema level. Therefore, the great importance of fuelling domain and expert knowledge into the activities of implementation of the HUMBOLDT Framework, which is done through the continuous interaction between developers and application experts involved in HUMBOLDT Scenarios.

An application scenario introduction: Protected Areas

Among the HUMBOLDT Application Scenarios (listed in the previous section) designed to demonstrate the capabilities of the Framework, both as a test bed for harmonisation components in real world conditions and as user community application momentum, there is the one covering Protected Areas themes and use cases. The Protected Areas Scenario exploits the valuable background of activities related to Spatial Data Infrastructures for Protected Sites in the framework of NATURE-GIS and NATURA 2000 that have delivered knowledge and expertise to the implementation of guidelines for INSPIRE Data Specifications on Protected Sites (INSPIRE Drafting Team DS Citation2010).

HUMBOLDT Protected Areas Scenario aims to transform geoinformation, managed by park authorities, into a seamless flow that combines multiple information sources from different governance levels (European, national, regional), and exploits this newly combined information for the purposes of planning, management and tourism promotion.

The Protected Areas Scenario Demonstrator Portal has been developed and one example as an application case for the Scenario is described in the work. During the work, a Desktop and Web GIS environment was set up together with a server environment test and created examples of data harmonisation in this domain, and the resulting tests with two HUMBOLDT Web Processing Services have been documented.

Protected Area Scenario is structured into Use Cases that consist of applications of different harmonisational instances related to different requirements and users that make use of harmonisation capabilities provided by the HUMBOLDT Framework. In detail, the Use Cases are:

Management of a Protected Area

This use case refers to the management of the area. Users of geographic information are planners and officers, but the management of a Protected Area is a decision-maker's responsibility. The objective is to embed geographic information in a seamless flow that gathers information from all available sources and exploits it for planning and management. The main task is to create plans and managing the protected area.

Tourism valorisation in a Protected Area

This use case refers to the promotion of the area and implies access to geographic information especially by citizens and commercial operators who are also final users looking for browsing tourism information. The objective is to embed geographic information in a seamless flow that gathers information from all available sources and exploits it for promotion. The main task is to exploit the area in the best way to enjoy what nature offers.

Application and use cases developed in the Protected Areas Scenario have, of course, a European focus on INSPIRE compliant data provision, which is the creation of a new data structure for protected areas datasets based on the INSPIRE schema for Protected Sites; the Scenario delivers it side to side with some examples also showing a schema harmonisation case using a data model specifically created in the Protected Areas Scenario (see ). The Protected Areas Scenario has explored the relation between the Protected Areas Scenario and the data Specification for Protected Sites from INSPIRE. Scenario activities were actively participating in the process of putting into practice INSPIRE themes, especially dealing with the Annex I Protected Sites data theme.

Figure 3.  Protected Areas Conceptual Data Model and UML Schema.

Figure 3.  Protected Areas Conceptual Data Model and UML Schema.

The main harmonisation issues of the Scenario are related to the mapping of different schemas and transformation of the structure and geometry of datasets of Protected Areas. Portuguese, Spanish and Italian datasets used in the Scenario are currently not based on a Data Model. The creation of a Common Protected Areas Target data model (or the use of existing ones like INSPIRE) is a major need for the schema mapping and transformation tasks. Also, high priority is given to spatial and thematic consistency (i.e. seamless and consistent map layers using edge matching techniques).

Following the opensource approach of the HUMBOLDT Framework, all the operations and processing steps within the Protected Areas Scenario have been performed using opensource tools, from the pre-processing of the datasets to the visualisation of final results. A schema of the HUMBOLDT components architecture, which are deployed for Protected Areas Scenario, is given in .

Figure 4.  Overview of the HUMBOLDT Framework Components architecture, as utilised in Protected Areas Scenario.

Figure 4.  Overview of the HUMBOLDT Framework Components architecture, as utilised in Protected Areas Scenario.

The Scenario develops the harmonisation process via active engagement with various stakeholders at the national and transnational levels including national authorities and European agencies. Protected Areas Scenario actors are people/institutions using geodata and geoinformation for preservation, sustainable exploitation, tourism and science/education.

A schematic classification of actors/users involved in Protected Areas Scenario is structured this way (the classes of users are summarised in ):

Table 1. Classification of geoinformation users and characteristics in relation to the HUMBOLDT Framework and Protected Areas Scenario.

End users

The end users are involved in browsing geoinformation (aggregated information: the lowest level of access) or geodata (information elements). They are decision makers, tourism operators or citizens (the last ones intended as persons). This group can be split into two categories.

End users of geodata

They are the decision makers who need to access information for taking decisions. In general they browse but do not process it digitally. Their main needs are: (1) access via a friendly user interface in order to browse data, (2) efficient management of heterogeneous documents access and handling, (3) present and discuss the debated issues to public administrators or vice versa to citizens.

End users of geoinformation

They are the citizens (as persons) who need to access geoinformation for participation/awareness, personal exploitation, education: they are the final recipients of information by the stakeholders of protected areas, who only browse information and do not have any specific technical skill.

Data integrators

The data integrator is considered as responsible to collect and analyse relevant data and give derived information in different forms (verbal, reports, prepared maps) to different user groups (e.g. the End User of geodata). They are planners/officers and/or scientific researchers, that is users involved in data processing. Their main needs are: (1) producing plans at the various levels, (2) exchanging of information with other departments, (3) reporting to the other levels of responsibility, (4) communicating to citizens, (5) exporting into web-services the outputs of models as structured and complex information.

Data providers

They provide data, ‘on catalogue’ or tuned to specific uses. We consider in this class only commercial operators, because Administrations, which play a major role in data providing, are considered as ‘Data custodian’ and not as a mere provider. They can be divided into Geodata providers and Geoservice providers and they must be able, overall, to deal with all kinds of input and release products according to the requested output. They do not represent their specific own needs but have to meet all needs arising from the other actors.

Data custodians

This class includes people or institutions providing data (harmonised or not), adapted to given standards. It includes different kinds of actors and in general can be considered a ‘super-set’ of data providers. They are in fact responsible, in several cases, not only of data production (as it is the case of data providers) but are as well responsible for the whole cycle of life of geodata: production and documentation (metadata), data modelling and compliance with standard, maintenance and update. At the protected areas level, they are mainly producers of the basic maps and/or producers of the thematic geoinformation related to nature conservation. Data Custodians are responsible for the harmonisation of the available data (in case they are coming from different regions), for the creation of specific application profiles in case of complex and multilateral tasks, and for the creation of web services for provision of data.

Namely, the Protected Areas Scenario is intended to provide harmonisation support especially for the interaction between various levels of work and administration: management bodies, local stakeholders, national authorities, European agencies, cross-border administrative bodies. The needs of these users for harmonised data in the scope of Protected Areas Scenario, described above, are also summarised in .

Table 2. Harmonisation issues and needs in the HUMBOLDT Protected Areas Scenario.

The process of data harmonisation is addressed to make interoperable the information shared by the different data providers. It is important to distinguish data harmonisation on different levels (conceptual schema, logical schema or physical schema). The Scenario counts on a good and representative catalogue of datasets to enable understanding of harmonisation issues and the use of the HUMBOLDT tools, as well as the need for using and integrating several data layers. A number of interoperability and data harmonisation issues were addressed within the work. The following data harmonisation requirements have been identified by users of protected areas geodata:

Data formats: There is a need for the creation-modification of Web Services (WMS, WFS) with standardised syntax.

Spatial reference systems: There is a need for a common reference system.

Metadata profile: Different metadata profile had been identified for the data made available for the Scenario. For instance, Portuguese metadata is based on the ISO 19139 standard and Italian metadata uses the 19115 profile.

Conceptual schemas (data models): Since the data structure for the datasets object of study in our Scenario is different, there is the need for the creation of a Common Protected Areas Target data model for heterogeneous data from different protected areas data providers. The used approach is to be as much as possible compliant with the Protected Sites INSPIRE data model.

Classification schemas: Datasets have been created on different classification schemes.

Scale/resolution: It is important to be able to deal with the different planning and management levels.

Spatial consistency of data: The geometry of real-world objects must be consistent between different datasets.

Multiple representation of the ‘same’ spatial objects.

Terminology and Multilinguality support.

A Scenario assessment phase was carried out both from internal users (partners in the project) and from external ones, mainly through questionnaires, workshops and online tools. This was done using evaluation criteria mainly dealing with usability aspects, for which we received varying results for different Scenarios: Protected Areas Scenario ranked among the best and more easy-to-use application cases described, especially for criteria attaining understandability, learnability, usefulness and relevance.

In this context, it was a crucial fact that the development of scenario services and data modelling was meant to be compliant with existing standards (INSPIRE, OGC, UML, etc.), which suggests that it would be possible for the external geospatial community to reutilise the existing components and/or extend them for their customised purposes (e.g. introducing new services, adding new data sources, etc.). This makes a strong point in assessing the transferability and efficiency of HUMBOLDT outcomes, especially for the applications delivered through Scenarios.

From the point of view of geodata harmonisation benefits for Protected Areas, results are very promising and show the high relevance and benefits of the Scenario demonstrators in achieving HUMBOLDT objectives, solving specific harmonisation problems, and usefulness and relevance to INSPIRE and GMES communities. In conclusion, it can be said that HUMBOLDT Scenarios, and Protected Areas Scenario among them, provide the proof and the concepts useful for solving a subset of identified harmonisation problems by using the HUMBOLDT Framework to various communities of geodata users, in an easy and accessible way. Also, HUMBOLDT Framework and the Scenario demonstrators have established a foundation for solving other major geospatial data harmonisation problems that were tackled during the lifetime of the project and provides the flexibility of the architecture to adapt and re-use in the context of other harmonisational issues not yet covered during the HUMBOLDT project.

An application scenario example in practice: Protected Areas schema alignment

The Scenario is tested in three areas, one between Portugal and Spain, covering the Douro River Natural Park in Portugal and the Arribes del Duero Park in Spain, one covering Protected areas of Community of Castile and León, one of the 17 autonomous communities of Spain, and one in Italy, covering the Beigua Regional Park, in Liguria. An example of harmonisation using HUMBOLDT, and in particular on performing semiautomatic schema mapping and alignment with HUMBOLDT Alignment Editor (HALE) involves the use of the dataset for the Protected Areas in the second site listed above: the Community of Castile and León natural areas, in Spain.

One of the main objectives of the HUMBOLDT project is to provide tools to map and transform complex database and application schemas. In this sense, the work of the Protected Areas Scenario has been focused on harmonising Protected Areas data from various countries using the HUMBOLDT Alignment Editor.

The example which is being introduced makes use in particular of some components of the HUMBOLDT Framework, briefly described in previous sections:

The HUMBOLDT Alignment Editor (HALE), which helps us map and transform complex database and application schemas.

The Conceptual Schema Translation Service (CST), a Web Processing Service for transforming data from one application schema to another.

The context of this application example showed that a first necessary step in the harmonisation process is the definition of a shared and coherent data profile for the Protected Areas domain. The analysis of the information from Scenario data providers state a high level of heterogeneity that represents the situation within the EU context. The multi-disciplinary approach to environmental issues implies the management of seamless geoinformation in different application fields. As a first step towards the harmonisation process using the HUMBOLDT tools, the objective of the Scenario was the creation of a common data profile and common data model as much as possible compliant with the INSPIRE specifications.

In the HUMBOLDT Protected Areas Scenario, data from Portugal, Italy and Spain have been used. The Scenario conceptual data model is based on the relevant data covering the test sites, and the INSPIRE Protected Sites data model has been used as a reference. We have tried to make the model, features, and attributes as similar as possible to the INSPIRE model. This means that, when a HUMBOLDT Scenario schema attribute (or feature) shares the meaning with an INSPIRE attribute, we have changed our attribute to the same name as the one in INSPIRE model.

Once the target data schema/model is defined, HALE (briefly introduced in the previous section) helps in establishing the mapping rules for the classes and attributes of a source to a target conceptual schema.

The first step in using HALE for schema mapping, a very crucial harmonisation instance for geodata, is to load the schemas in the HALE Schema Explorer. We start with our source schema and we import into HALE as source schema the Protected areas dataset, which is in this case provided in vector format (shapefiles).

After loading our source schema we can also load our source data, permitting the visualisation of a cartographic representation of the reference data for the source schema and the transformed data alongside each other. Once the source schema is loaded, we can load alongside the target schema, which in this example consists of a specifically defined HUMBOLDT Protected Areas schema.

As a second step, after the inspection of our schemas, we will continue with the mapping of the elements, selecting the elements we want to map in the Schema Explorer. Once the mapping is performed using HALE, the matching table created can be used for applying the transformations in the Schema Explorer, after selecting the appropriate mapping function. In this case we used just the ‘Rename Attribute’ function. To use the ‘Rename Attribute’ function, you must select the attribute in the source schema you would like to rename, then select the element in the target schema that the attribute should be copied to. We can also use the ‘Attribute Default Value’ function to fill a field with no data in the source like ‘IUCNCategory.’ When running the ‘Attribute Default Value’ function, a list of available values appear. In this case we choose ‘Protected Landscape/Seascape.’ You can interactively check the results of matching operations requested. The features in the ‘Transformed Data’ view are already transformed using the alignment mapping. For instance, by expanding the attributes, you can view what value HALE assigns to them. In shows an overview of HALE interface and visualisation tools.

Figure 5.  Example of Schema Mapping and Alignment with HALE, based on Protected Areas data models and attributes. The features in the ‘Transformed Data’ view are already transformed using the alignment mapping. For every attribute, you can view what value HALE assigns to them. In the figure you can see how the data structure has changed for the ‘name’ attribute of a given Protected Area. In this case we use the dataset for the Protected Areas (Red de Espacios Naturales) in the Community of Castile and León, in Spain.

Figure 5.  Example of Schema Mapping and Alignment with HALE, based on Protected Areas data models and attributes. The features in the ‘Transformed Data’ view are already transformed using the alignment mapping. For every attribute, you can view what value HALE assigns to them. In the figure you can see how the data structure has changed for the ‘name’ attribute of a given Protected Area. In this case we use the dataset for the Protected Areas (Red de Espacios Naturales) in the Community of Castile and León, in Spain.

After the Schema alignment is performed, mapping rules delivered by HALE are used as input for schema translation and transformation to the target schemas, using the HUMBOLDT Conceptual Schema Translation Service (CST). Its main feature consists of the ability to apply a schema transformation to a spatial dataset in order to provide another dataset modelled in the target application schema. Hence, CST actually performs the geospatial transformations defined beforehand, for instance making use of the HUMBOLDT Alignment Editor (HALE). From the design point of view, CST can be accessed in two different ways: as a Web Processing Service (WPS) through any WPS client, as Snowflake, Jump or Udig, or making use of the HUMBOLDT Mediator Service (in this case CST 3-4 can be a library or a WPS).

Summary and outlook

In the context of the vision envisaged under the umbrella of Digital Earth, a crucial issue is the interoperability and integration of services, tools and data in a wide range of domains and uses. In the European context, this vision is put into practice in geoinformation fields among which the main ones are the activities related to ESDI implementation, following the guidelines marked by INSPIRE Directive.

The HUMBOLDT project has taken charge of providing solutions to geodata and geoservices harmonisation, covering harmonisation concept as a whole.

This paper has described the structure and approach of the HUMBOLDT project, giving a rationale for the HUMBOLDT Framework capabilities and discussing the outcomes of the HUMBOLDT Protected Areas Application Scenario.

The major aim of HUMBOLDT is the implementation of efficient, cost-effective, reliable, generic, interoperable and sustainable solutions for the issue of spatial data harmonisation and integration of geographic services in the framework of an ESDI. This objective is to be reached by putting INSPIRE principles into practice, applying international standards and using, as core reference, the users’ requirements and needs, finally establishing a community of users and developers.

The HUMBOLDT Framework is an architecture of software components and services aimed at managing the harmonisation process of geoinformation within the European context. The methodology of the HUMBOLDT development is based on a dual approach, comprising both a technological and an application side, and on an iterative process of implementation, during which the solutions found are tested and validated with the cooperation of an application momentum, composed of Scenarios that cover topics of great importance also in GMES.

Geoinformation has become more and more relevant in supporting decision making during the last two decades. With the rise of geodatabases and digital information sharing, data models were developed and consistency rules were established. Although large investments have been spent on this migration, it has paid off in more streamlined procedures and higher quality of the data. This has reduced cost and increased the efficiency of data management within the data production organisations. As an example, the reduction in delivery times and costs for providing basic maps and carthography, using geoservices instead of paper map sheets is clear.

It is sometimes claimed that data is used for making decisions and value is created when decisions are turned into action (Krek and Frank Citation2000). When building up a European SDI, we then face the problem of having the data producers having the major costs of the SDI implementation, while the data users, or the ones having the benefits, sit at other organisations. As more and more services in the field of data harmonisation become available, the more efficient and complete can be the availability and reliability of those data and services.

The HUMBOLDT Protected Areas Scenario applications have demonstrated in practice that the implemented framework for data harmonisation described is a working solution to tackle a variety of geodata harmonisational issues.

Automation is one key aspect in improving the cost efficiency of data management. Unified harmonised data delivered through an opensource, modular and integrable framework, like the software developed and described within this work, is one key element in this automation. Moreover, geospatial data is to be used for decisions, and harmonised data (in this case focused on protected Sites) provide opportunities for making more efficient decisions (higher level of automation, less uncertainty about the semantics, etc.). In economic terms, we can therefore state that the transaction costs are reduced. Finally, since harmonised data reduce the uncertainty in the semantic interpretation, it may also give opportunities to making reliable decisions and the approach shown in this paper assures a variety of solutions for making this reliable decision making (through the exploitation of heterogeneous data in a unique frame) affordable (almost no cost, open source, strong community of developers, INSPIRE compliancy, flexibility and multi-solutions availability).

Those are the main achievements of the HUMBOLDT approach and tools for geodata harmonisation, which most importantly adds to the state of the art in data harmonisation the delivery of a framework that is both a theoretical one and a framework of software tools that can handle the harmonisation process as a whole, tackling multiple harmonisational issues as instances of the same harmonisational process, which is the point that makes HUMBOLDT a relevant advancement in enabling as smooth as possible data harmonisation to the user.

The outcomes and benefits of HUMBOLDT provided results (Framework and Scenarios) are mainly related to the reduction of implementing efforts for the future ESDI, both from a technological point of view and a cost-effective approach to geodata sharing, and can be summarised briefly in the following points:

Support to cross-borders geoinformation management (all over Europe and beyond)

Enabling cross-domain applications through geoinformation sharing and integration (affecting scientific fields of geosciences, social studies, security)

Overcoming limitations in spatial data availability (incompatible data formats, semantic gaps, lacking data and metadata models)

Enabling access to geospatial services not available using current technological solutions (due to inconsistencies in data definitions and formats or lacking of data documentation)

Creating new information through the access to additional data and services (thus making decision-making exploiting geospatial data easier)

Enhancing and facilitating data and services access and distribution (affection technological and commercial sector investments)

Among HUMBOLDT Scenarios, Protected Areas Scenario focus the harmonisation of schema mapping and transformation of the structure and geometry of datasets of Protected Areas, for the purposes of planning, management and tourism promotion and demonstrates HUMBOLDT capabilities for geodata harmonisation in Protected Areas domain.

The HUMBOLDT project shows challenges to geosciences research, covering topics in data harmonisation at a continental scale. Nonetheless, the more relevant the challenges to face, the better the benefits that will surge from their solutions: benefits for specialised and non-specialised users of spatial data, for policy-makers, planners and managers, for European citizens and their organisations, at a level that varies from local to regional to European. These benefits have been demonstrated possible to achieve using the described approach and tools to harmonisation.

Notes on contributors

Paolo Villa is an Environmental Engineer and has a PhD in Geodesy and Geomatics at Polytechnic of Milan. He specialises in Remote Sensing and Environmental Analyses based on geoinformation. He works at the Institute for Electromagnetic Sensing of the Environment of the National Research Council of Italy on the topic of change detection methodologies and applications using mid-resolution satellite data for urban and flood monitoring studies, hyperspectral data processing and SDI management and implementation, including the field of geodata and geoservices harmonisation. His main expertise covers GMES topics in the context of the European SDI implementation.

Roderic Molina Perez is a Geographer and MSc in Geographic Information Technologies with more than 10 years of experience as a GIS technician and consultant, both in Italy and Spain. His current work is focused on geodata integration, e-learning and GIS projects at the European level related to the INSPIRE Directive. As Technical Manager he is involved in the development of key projects at GISIG, a non-profit international association on GIS in Italy. Catalan by birth, currently lives and works in Genoa.

Mario A. Gomarasca is a researcher expert in environmental information management Geomatics at the Institute for Electromagnetic Sensing of the Environment of the National Research Council of Italy. He is an Agronomist with expertise in environmental hazard and risks management and currently works on geomatics and geoinformation management and harmonisation. The author acquired a specialisation at the International Institute for Geo-Information Science and Earth Observation (ITC), Enschede, The Netherlands (1987), and he was Visiting Scientist at the Purdue University, Laboratory for the Application of Remote Sensing (LARS).

Emanuele Roccatagliata is Director of the Association GISIG, Geographical Information Systems International Group, promoted in 1992 as University Enterprise partnership for European co-operation in GIS technology and applications. He graduated in Physics and his role in GISIG is related to the technical aspects of defining and developing the projects, and to the production and revision of training contents. Along the years, special attention was paid to the themes of nature conservation and protected areas in mountain environment. From 2002 to 2009 he was Secretary-General of ICCOPS, Landscape Natural and Cultural Heritage Observatory, a study centre dedicated to coastal management, heritage and landscape.

Acknowledgements

This paper was partially supported by EC FP6 project HUMBOLDT (Contract SIP5-CT-2006-030962).

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.