814
Views
4
CrossRef citations to date
0
Altmetric
Articles

FedIDS: a federated cloud storage architecture and satellite image delivery service for building dependable geospatial platforms

Pages 730-751 | Received 24 Mar 2017, Accepted 05 Jul 2017, Published online: 20 Jul 2017

ABSTRACT

Earth observation satellites produce large amounts of images/data that not only must be processed and preserved in reliable geospatial platforms but also efficiently disseminated among partners/researchers for creating derivative products through collaborative workflows. Organizations can face up this challenge in a cost-effective manner by using cloud services. However, outages and violations of integrity/confidentiality associated to this technology could arise. This article presents FedIDS, a suite of cloud-based components for building dependable geospatial platforms. The Fed component enables organizations to build shared geospatial data infrastructure through federation of independent cloud resources to withstand outages, whereas IDS avoids violations of integrity/confidentiality of images/data in sharing information and collaboration workflows. A FedIDS prototype, deployed in Spain and Mexico, was evaluated through a study case based on a satellite imagery captured by a Mexican antenna and another based on a satellite imagery of a European observation mission. The acquisition, storage and sharing of images among users of the federation, the exchange of images between Mexican and Spanish sites and outage scenarios were evaluated. The evaluation revealed the feasibility, reliability and efficiency of FedIDS, in comparison with available solutions, in terms of performance, storage consume and integrity/confidentiality when sharing images/data in collaborative scenarios.

1. Introduction

Earth observation missions (EOM) manage large amounts of satellite images (Ramapriyan Citation2017) that must be processed, stored and maintained in a cost-effective, reliable and secure manner. Earth Observing System Data and Information System (EOSIS) reported that DAACs stored, in September 2015, a volume of data stored of 17.5 PB in 2015 (Behnke et al. Citation2005) (not including NRT products). This trend represents a challenge for spatial data infrastructures (SDI) in terms of storage utilization and bandwidth management.

In addition, SDI must also assume the consumption of resources required to attend external requests produced when the products are published/shared for/with scientific community. The sharing of images and information through collaborative patterns built among government agencies, scientific community and EOMs is key to conduct studies such as geographical location of species diversity, historical changing of territories and disaster prevention. These collaborative patterns are also quite useful for scientific community and companies to produce derivative products (maps and image repositories).

Traditionally, the collaboration patterns have been performed through online workflows attended by geoportals using Web and FTP services installed in physical servers at IT sites of EOMs. This management scheme, however, puts organizations and partners at a risk of saturation as the amount of sharing and collaborative patterns have grown at constant rates over the past few years. This trend will follow as United Nation Operational Satellite Applications program (UNOSAT) has observed a constant increment in the number of requests for satellite images (UNOSAT Citation2012), whereas Satellite Industry Association (SIA) has found a constant growth of the satellite services over the last few years (SIA Citation2016). The saturation scenarios and the growth management of the satellite imagery are problems traditionally solved by constantly updating IT infrastructure as is the case of NASA network of data centers (DAACS Citation2017) but this situation becomes critical when it is not possible for them to invest in IT in a constant manner.

Cloud computing is a cost-effective solution that has recently been explored by some agencies to face up to the saturation and IT management challenges such as the management and storage of a massive amount of heterogeneous digital EO products (metadata, images and derivative products). Cloud content delivery systems or CDS (CloudFront Citation2017; OnApp Citation2017) and cloud storage (Gonzalez et al. Citation2015; AWS Citation2017; Azure Citation2017) are available solutions to outsource the management and storage of digital resources with a cloud service provider. In this outsourcing model, the organizations only pay for consumed resources (pay-as-you-go model) when exchanging EO products with either partners or end-users. ESA developed a hot-standby solution in which two private cloud infrastructures (Germany and Italy partners) have been deployed by cloud service provider to withstand the cloud site failures (Orange Citation2014). This means that the issues regarding saturation, security and management of the satellite imagery are relegated to cloud providers, which are in charge to struggle with them by using elastic and on-demand services (Mell and Grance Citation2011). In this context, cloud computing has also been proposed to manage geoportals (Li et al. Citation2017) and build centralization of services (Lian, Mcguire, and Moore Citation2017) but also it can be an option, for instance, either for processing workflows of LiDAR data (Li, Hodgson, and Li Citation2017) or for allocating web-based tools managing spatial data.

Cloud technology certainly allows organizations saving costs and to reduce stress in the IT infrastructure as well as providing partners and end-users with an online distribution and sharing platform for the acquisition of digital earth resources at any time and anywhere through any device. However, this technology can put them at risk of losing sovereignty in the management of their digital resources (Chow et al. Citation2009). This risky situation is known as the vendor lock-in problem (Nasuni Citation2015; Ponemon Citation2016; Sloan and Warner Citation2014). In such a situation, a functional dependency between the organization and the cloud provider is produced, which affects critical aspects for the business continuity such as data availability, security/privacy and paradoxically business flexibility.

Data availability may be affected by service outages of public provider (AWS Citation2017), because organizations cannot get access to their satellite imagery during this time (Sloan and Warner Citation2014). This situation produces economic affectations, as shown in a report from Ponemon Institute (Ponemon Citation2016) showing that organizations suffered, in average, outages for 200 minutes and, depending on the scope of the companies, a minute of outage could cost thousands of dollars. When an outage is permanent, because of the cloud provider going out of business (Nirvanix Citation2013), organizations must migrate data from that provider to a new one (Nasuni Citation2015). In a migration scenario, a vendor lock-in situation reduces the ability of an organization to adapt its imagery management to changing circumstances. For instance, the feasibility of contracting with another provider will depend on the volume of data controlled by the old provider, which could imply costs and migration time taking the current growing rates into account (Gantz and Reinsel Citation2013; Nasuni Citation2015). Over the time, an EOM could produce an effect of accumulation scaling up from a small volume of images to a very large satellite imagery. In such a scenario, the migration procedure could take several hours or days depending on the providers involved in such a migration as well as the size of the satellite imagery (Laatikainen, Mazhelis, and Pasi Citation2014; Nasuni Citation2015; Walker, Brisken, and Romney Citation2010). Moreover, the vendor lock-in dependency introduces privacy issues because the data security of the satellite images is under control of a third party (public provider), which could also introduce a set of legal issues (Watzel Citation2014).

This paper is motivated by our collaboration experiences with space agencies and environment departments to provide fast and reliable satellite data delivery. As a result of those collaborations, all participants agreed that there was a need for cost-effective solutions that face up to the side effects of image accumulation under control of one single provider (avoiding the vendor lock-in problem). Specifically addressed was also the need of reliability and security mechanisms to provide organizations and their partners with a reasonable assurance of availability of EO products for end-users, even in cloud outage scenarios. Integrity/confidentiality of satellite imagery in sharing and collaboration organizational environments building through cloud services was another major concern.

In this context, we introduce in this paper a suite of cloud-based components named FedIDS , which includes a cloud federated architecture (Fed) and a satellite image delivery service (IDS). Fed is a federation manager service that enables organizations to build dependable geospatial platforms, whereas IDS is a cloud-based IDS with which users of the platform can ensure in-house the availability, integrity and confidentiality of their EO products in collaborative scenarios. This means that our solution solves the full chain of satellite data delivery, from producer to consumer, ensuring also point-to-point security, a reliability mechanism for producers and a geospatial data infrastructure of either EOM sites or IT resources contracted with cloud providers.

We have developed a FedIDS prototype to build a federated geospatial platform that includes resources of two cloud sites (one at Madrid, Spain, and another at northeast of Mexico). We conducted a case study to evaluate the FedIDS prototype based on a satellite imagery including images captured by an antenna called ERIS placed at Mexico (ERIS Citation2017) and a repository provided by the European Space Astronomy Center (ESAC). As shown in this paper, the evaluation revealed the feasibility of applying FedIDS to the management of satellite imagery, in terms of reliability and confidentiality. The evaluation also revealed the frugality of FedIDS when offering reliability and confidentiality functionalities to end-users as FedIDS only consumes, at the worst case, 6% of extra storage capacity to ensure EO products, whereas traditional schemes require 66.7% and 200 % of capacity to address the same fault tolerance offered by FedIDS. A performance comparative between FedIDS and a set of security/reliability tools revealed that FedIDS yielded the lowest response time when producers and end-users send/retrieve ensured images to/from the federated geospatial platform. This comparative evaluation also revealed that FedIDS improves the service experience observed by organizations and partners as it produces a better performance than conventional web solutions in sharing and collaborative scenarios. FedIDS prototype is currently being used in an ongoing project supported by Mexican Space Agency (AEM for Spanish acronym) to build a dependable federated cloud platform in which a set of partners can share, to each other in a secured and resilient manner, satellite images (Terra, Aqua, Landsat) captured by ERIS (Citation2017) as well as products derivative from these images created by disaster prevention projects associated to the space agency. The goal of this project is to enhance the sharing of EO products produced by the agency and with its partners and to foster the information diffusion about virtual earth through a public-domain platform based on FedIDS.

The outline of this paper is as follows. Section 2 presents the design principles, architecture, and major components of FedIDS . Section 3 describes the prototype implemented following previous design principles and architecture. Experiments run to test the system and metrics used are described in Section 4, and the results of the evaluation and comparison with other solutions are shown in Section 5. Section 6 studies existing works related to our solution. Finally, Section 7 presents major conclusions of our work and some future lines.

2. FedIDS design principles

In this section, we present the global architecture of FedIDS and the design principles of Fed and IDS services. As part of the section, we also describe the building of a geospatial data infrastructure by using Fed service, whereas the reliability and security mechanisms included into IDS for the management of satellite imagery are described in the last part of this section.

2.1. The FedIDS architecture

FedIDS service has been designed to achieve four basic goals: the first one is to avoid delivering the control of a whole satellite imagery to a single cloud provider/infrastructure, the second is to establish a secure and reliable environment for partners, agencies and end-users to exchange images and derivative EO products through the cloud, the third one is to enable producers of images to ensure in-house the availability, integrity and confidentiality of their EO products before sending them to the cloud and the last one is to improve the service experience of the end-users when they acquire images through the federation.

To achieve those goals, FedIDS architecture provides two main components: a federation manager service (Fed) and a cloud-based IDS. Fed component was designed to be installed either in the IT infrastructure of EOM sites or IT resources contracted with cloud providers. This component enables EOMs and their partners to build shared geospatial data infrastructure through federation of independent cloud resources; as a result, a federated geospatial platform is created and it enables them to collaborate for reducing the side effects of EO product accumulation in one single cloud provider and to withstand cloud outages by providing image availability to end-users of EO products. Fed component also includes a secure sharing service based on publish–subscribe patterns, which enables producers to share EO products with a well-defined set of end-users through the federated platform as well as to establish controls over workflows performed by end-users when downloading EO products. Fed provides functionality to achieve the first two design goals.

The IDS component has been designed to be installed in the computers of end-users. IDS includes a security mechanism that enables producers to establish access controls over their EO products to avoid suffering violations of confidentiality and integrity. It also includes a reliability mechanism for producers to keep control over the distribution of their EO products as well as to establish availability degrees that allow end-users to retrieve EO products even in outage scenarios. IDS allows one to fulfill the reminder design goals.

shows the architecture of FedIDS and a conceptual representation of a federated geospatial cloud platform built by using FedIDS . also shows an example where a set of organizations have shared a portion of their cloud infrastructure resources to build a federated platform. In this example, two organizations (Org_1 and Org_3) are in charge of the management of the satellite images/data captured by two antennas as well as the storage and distribution of these images. An organization (Org_2) creates derivative products from the images supplied by Org1 and Org3 and shares them with other federation members. In this platform, both producers and end-users associated to the federation members can exchange satellite images by using IDS applications. Notice that, although in the example shown in a cloud federated service has been built by using the infrastructure of private cloud of a set of partners, the members could build it by using either only a set of public services (public cloud) or a combination of private and public resources (hybrid cloud).

Figure 1. Global architecture of FedIDS.

Figure 1. Global architecture of FedIDS.

2.2. Building a geospatial platform by using Fed architecture

From the previous global architecture, Fed component enables a set of organizations called members to collaborate in the building of a secure and reliable federated geospatial platform in which users called producers (let us consider a server receiving images from an antenna) can store and share portions of satellite imagery (catalogs) with end-users associated to the federation members.

In the first stage of the building of a federation, the members register information such as the cloud storage resources they will share with the federation, the producers/end-users associated to this member and the catalogs that will be shared with other federation members. In a second stage, the governance of the federation is defined by using a management architecture. In the last stage of this process, each producer and end-user is provided with IDS applications, which are installed in the computers that they will use to access the federation resources. Fed architecture includes service management layers such as security, metadata and multi-cloud storage to establish controls over the exchange and management of EO products among the members of the federation. This architecture also enables organizations to establish controls over the federated resource consumption. A conceptual representation of Fed architecture, including the exchange of images between producers and end-users through IDS, is depicted in . As it can be seen, Fed has a layered architecture, with three main layers: security, metadata and storage. Their functionality and relations are described below.

Figure 2. A conceptual representation of Fed architecture including the exchange of images between producers and end-users through IDS.

Figure 2. A conceptual representation of Fed architecture including the exchange of images between producers and end-users through IDS.

The basic idea is that each time a producer processes new satellite images or products (see producer Org in ), the IDS application installed in the producer's computer automatically sends a request (upload workflow) to the federation (see IDS applications in ). When Fed service confirms that this request has been sent by a valid producer (security layer) and this user has the rights to launch a publication of that product (metadata layer), IDS automatically transports the images to the federation (storage layer), where they are stored by the repositories management system depending on the type of publication chosen by that producer for each product. In the case of the producer choosing to share this product with other member of the federation, that product is stored in the contingency pool (shared cloud resources). Otherwise, that product is stored in the autonomy pool, which means that only end-users associated to the federation member that have stored that product could download it.

At this point, depending on the type of publication, a given set of end-users can acquire the uploaded product through the federation (download workflow) whenever the Fed service verifies that the retrieval requests have been sent by a valid end-user (Federated Tokenization system in security layer) and that they have the rights to acquire it (Pub/Sub system Org in metadata layer for the example shown in ). When the acquisition request has been authorized by Fed service, the IDS application installed in the end-user computers automatically retrieves the images from either local or remote storage managed by the federation (storage layer) and store them in a local folder.

Using Fed , we can define inter/intra-publication patterns for different groups of users. Inter-publication means that users sharing a product are associated to different federation members, while intra-publication means that they are associated to the same federation member. In both cases, different cloud providers might are involved in the sharing operation. This functionality is important as it allows us to share resources between cloud providers and federation members transparently depending on the configuration installed. In case of inter-publication of resources, Fed takes care of automatic replication in the different cloud services. To alleviate possible negative overhead of copying, a caching functionality has been included in Fed.

shows an example of inter/intra-publications and caching functionality. In this example, two producers (P1 and P2), associated to two different federation members, perform publications of products extracted from their antennas. The IDS installed in producer server (P) sends products (five) to the cloud resources of its corresponding Fed service. In this example, both members publish one product by using an inter-publication pattern, whereas the rest (four) were published by using an intra-publication pattern. The end-users (E1) are included in a work-group of the intra-publication, whereas other end-users (E2) were included in an inter-publication pattern.

Figure 3. An example of inter/intra-publications and caching functionality in FedIDS.

Figure 3. An example of inter/intra-publications and caching functionality in FedIDS.

As it can be seen, the caching method produces three benefits: (i) the transportation of products/catalogs among members of the federated geospatial platform is performed only once no matter the number of end-users included in the inter-publication; (ii) the costs of caching operations are absorbed by the members in where end-users were registered and (iii) caching does not affect the service experience of end-users as they only download products through the cloud resources of the members where they are associated to.

2.2.1. Contingency plans

The IDS running in the computers of producers and end-users is associated to a main Fed service. In this context, IDS receives a denial of service when either some of the three layers of Fed Service, or the whole Fed service, are not available. In order to solve this faulty scenario, we have designed a that keeps up the flow of Pub/Sub operations and the exchange of products between the producers and end-users within the federation members.

A contingency plan represents a plan B for producers and end-users. This plan describes a protocol that each IDS must follow when a portion of the federation is not available and it cannot get access to any of the modules of security, metadata or storage layers where this application commonly performs Pub/Sub operations. For instance, let us consider that producer 1 commonly sends publications, through IDS 1, to Fed 1 deployed by member 1. Let us also consider that some modules in the layers of Fed 1 or the whole Fed 1 is not available. In such a scenario, IDS is the first component discovering such a faulty scenario and the action is to deploy plan B; as a result, a negotiation with other Fed service in the federation starts. In FedIDS each member knows (i) plan B of IDSs where they are included into, (ii) a list of producers and end-users tokens included in that plan and (iii) agrees to receive a well-defined amount of operations from the producers/end-users in that list.

The Fed services of the federation members exchange any change performed in a given contingency plan and IDSs update the information about such a plan. CPM service is only enabled in the case of other members suffering from any of outage, maintenance or saturation scenarios. When a Fed service recovers a safety status, its IDS applications return to it. In the case of IDSs have consumed the number of operations for this member in the federation agreement, a reallocation of these applications to other Fed service is invoked for producers and end-users can still use the service.

2.3. Design of IDS service and applications

IDS was designed to send requests to Fed Service in the form of secure and reliable PUT and GET workflows, which, respectively, deliver and retrieve images to/from federated multi-cloud storage (SCMS). A workflow includes authentication and Pub/Sub stages where uploads or downloads of images are authorized. It also included a last stage for the transportation of the images. This stage includes four sub-stages that are performed by IDS in a pipeline manner through a continuous data flow.

The transport stage of a PUT workflow, the first sub-stage compresses EO products for saving storage costs by using a lossless compression algorithm. Schemes based on multi-threading could be used in this stage instead of using traditional algorithms (Swanepoel and Van Den Bergh Citation2017). In security sub-stage, compressed products are encrypted by using a symmetric cryptosystem and access control is established for avoiding violations of privacy by encrypting the symmetric key by using a scheme based on attributes (Bethencourt, Sahai, and Waters Citation2007), whereas confidentiality and integrity are ensured by using digital signatures and secure envelope techniques (Yanez-Sierra et al. Citation2015). In the reliability sub-stage, a fault tolerance mechanism based on the information dispersal algorithm (Rabin Citation1989) splits the encrypted products (EP) into redundant portions {P1, P2…Pn}, where each of length P = L/m. This means m portions from n suffice for reconstructing original EP whenever n > m, which allows IDS withstanding n-m outage scenarios. The final stage is distribution, where the encoded portions are distributed to different locations in the federated service to avoid dependency with the cloud resources of a given member and through a continuous dataflow (Libcurl Citation2017) to improve the transportation efficiency as well as the experience of service of end-user. The GET workflows include the same four stages used in PUT workflows but invoked in an inverse order.

(a) depicts a PUT workflow performed in the delivery procedure of a raster image produced by the Landsat5 satellite (P). As it can be seen, EO product P is compressed, encrypted and encoded (P is split into n redundant portions in parallel (Reed, Chen, and Johnson Citation2011), which are sent to n cloud storage locations through a continuous dataflow. At this point, product P is published in the federation.

Figure 4. An example of transport stage of workflows performed by image delivery system in the computers of producers and end-users: (a) PUT dataflow and (b) GET dataflow.

Figure 4. An example of transport stage of workflows performed by image delivery system in the computers of producers and end-users: (a) PUT dataflow and (b) GET dataflow.

It is important to note that the execution of these processing dataflows produces shrinking/expanding effect in-memory. This means that the size of product P is reduced in the compression mechanism, which is expanded by reliability and encryption mechanisms. This effect improves the storage utilization as the sum of redundant portions at the end of the dataflow is similar to the size of the original products (P) as the extra capacity added by the redundancy produced by reliability mechanism as well as the control structures added by encryption mechanism is compensated both by the compression stage. This effect also improves the time spend in the reliability and encrypting stages as both mechanisms are applied to compressed version of the product (P), which reduces the service times of these mechanisms reducing the response time observed by the producers/consumers.

(b) depicts the retrieval procedure performed by IDS of an end-user in a GET workflow. In this procedure, the redundant portions are retrieved by IDS, decoded to convert them into an encrypted version of product P, which is decrypted, decompressed and stored in the end-users computer in its original form (P).

The IDS applications include REST functions that enable designers to add IDS service to geoportals and enable consumers/producers to configure their IDS applications through these geoportals. In the case of consumers not having resources (computers) to install an IDS application, an IDS proxy service can be deployed in the geospatial platform for managing the dataflows and the consumers can acquire EO products in a traditional way through a geoportal.

3. The FedIDS prototype

We have developed a FedIDS prototype to build a federation, that includes commodity components with minimal proprietary requirements resources, of two cloud sites: one at Spain and another at Mexico. The Spanish cloud was installed in cities near to Madrid such as Leganes (SpainCloud) and Colmenarejo (Spain1) by using Open Stack Havana (OpenStack Citation2017). The Mexican cloud MexCinvestav-Cloud was installed in Victoria City, northeastern Mexico, by using OpenStack Ice-house. We also installed a Fed component at a city of central Mexico called San Luis Potosí (Mex2) and another at a southeastern city of Mexico (Mex1) where Eris antenna is installed.

A geospatial platform called Mexican cloud Fed by using the resources of MexCinvestav-Cloud, Mex1 and Mex2 sites and another by using the resources of the Spain1-Cloud and Spain1 called Spanish Cloud Fed were deployed. shows the current status of the federated geospatial platform (FedIDS ), the members of that federation as well as the location of producers and end-users registered in the federation.

Figure 5. A conceptual representation of the deployment of FedIDS prototype.

Figure 5. A conceptual representation of the deployment of FedIDS prototype.

3.1. Fed service

We developed the Fed service in this prototype by using cloud instances. A cloud instance is a computational seed including an application (Fed component) that can be cloned and launched into the cloud in the form of a virtual computer. This virtual computer is available online and runs the applications included in its seed. We developed two cloud instances including security and metadata Fed services. One service instance was launched in the Spanish cloud and the other in the Mexican cloud. We also developed a cloud instance including the modules of storage layer and we launched five copies of this instance in the cloud sites of the federation. These images were configured as a single coherent multi-cloud storage system (SCMS), which includes two autonomy pools (1 pool per member including 5 storage locations) and a shared contingency pool (including 10 storage resources accessible for the 2 members). See the characteristics of cloud instances used in the building of FedIDS prototype in .

Table 1. FedIS prototype features.

3.2. IDS application

The IDS component was developed in one version called P-bot to emulate the behavior of a producer and another called E-bot for consumers. Both applications automatically produce Pub/Sub workload, which also automatically transport products to the federation. These applications were installed in physical computers in both Mexican and Spanish Clouds.

shows the features of cloud images and physical computers used to deploy the FedIDS prototype.

The security, metadata and storage modules were developed as web services (Php and Apache), each service includes its own database developed with Postgresql and communicate each other through an API-REST. The services are configured by using an administration GUI. Three versions of these services were developed and made available for the partners: In the first one, the images were encapsulated in containers by using Docker. This means that all dependencies are pre-installed and the only prerequisite is an installation of Docker in the servers of the partners. In the second one, the images were integrated into a virtual machine by using Virtual box and in the last one OpenStack images were instantiated for the evaluation of FedIDS prototype. The services can also be installed as a traditional web service if required by partners.

The IDS application and the security modules were developed in Java, whereas the reliability modules were developed in C++. Shared memory functions were developed in IDS to build an in-memory continuous dataflow for the frugality (Java), confidentiality (Java) and reliability (C language) modules. The bots that produce workload of images were also developed in Java.

The prototype also includes a geoportal that was launched as a cloud instance, where consumers can perform search and subscription requests for products uploaded by producers. This geoportal has been built by using FedIDS tools and it will be used for the dissemination of raster products, metadata and derivative products (maps automatically built from images Landsat5).

4. Methodology and experimental evaluation

In order to evaluate the performance of FedIDS prototype, we conducted two case studies based on the exchange of satellite images between Mexican and Spanish cloud sites. Satellite imagery, with format L1 and L2, was provided by the Mexican and European space agencies.

The first case study was based on a satellite imagery including products with format L1 captured by an antenna called AEM-ERIS (ERIS Citation2017) placed at Mexico. The satellite imagery included products captured by sensors such as MODIS (AQUA and TERRA) and Landsat L5 and was transported from ERIS server at Mex1 member to the federation, specifically to MexCinvestav-Cloud. In the prototype, the products were indexed as catalogs such as MODIS-AQUA, MODIS-TERRA and LandSatl5, which were published so that end-users associated to Mex1 and Mex2 site members could subscribe and retrieve them.

The second case study was based on a repository of products with format L2 proportionated by the ESAC, located at Villafranca del Castillo (Spain), in charge of the SMOS (Soil Moisture Ocean Salinity) (SMOS Citation2008) mission in 2009. This mission was created to generate the first Earth global salinity map. The products were stored in FedIDS prototype, specifically in Spanish Cloud, and were indexed as SMOS(L2) catalog, which was published so that end-users could subscribe/retrieve it.

shows the characteristics of the both satellite imagery and the products chosen to perform our evaluation experimentation.

Table 2. Characteristics of catalogs and products of AEM-Eris and ESA satellite imagery.

Finally, the results have been compared to solutions available in the literature to offer security and reliability as well as a traditional web/cloud-based service.

4.1. Experiments

The case studies were conducted in three stages: In the first stage, a P-bot takes, in a sequential manner, the products of catalogs of a satellite imagery and send them to IDS, which automatically performs PUT workflows to ensure them, upload them in the multi-cloud storage (SCMS) and publish them in the federation. In the second stage, the Fed service disperses the products among the cloud resources of the federation members by using caching functionality. In the last one, E-bots ask to Fed service for new product publications and IDS automatically launch GET workflows to retrieve them, perform decrypting/decoding tasks and store the recovered products in the folders created in the end-user's computer to manage the catalogs that have been subscribed by this end-user. The experiments finished when all the products in the published catalogs of an imagery have been uploaded by P-bot and completely downloaded by E-bots.

4.2. Studied solutions

In this section, we described the solutions implemented to conduct the experimental evaluation.

  • Phoenix: This is a distributed fault-tolerant web storage solution (Gonzalez and Marcelin-Jimenez Citation2011) that enables producers/end-users to upload/download products. This solution includes a fault tolerance based on information dispersal and can withstand the failure of two from five servers.

  • WebS: This is a conventional solution implemented on web servers for producers/end-users to upload/download products. Reliability was not developed for this solution.

  • SkyCDS: This is a cloud-based content delivery network solution (Gonzalez et al. Citation2015) that allows the synchronization of folders in end-user's computer with a virtual space used by a producer to publish products; as a result, each time an image is uploaded by the producer in the virtual space, SkyCDS retrieves and stores it in the end-user's folder. SkyCDS also withstand the outage/failure of two from five cloud resources.

  • FedIDS: This is the implementation of continuous workflows and federated services presented in this article, which also can withstand the failure of two from five cloud resources and one site federation member failure.

4.3. Metrics

The implemented solutions aforementioned to capture the next metrics:

  • Response time: This metric collects the time spent by services such as tokenization, Pub/Sub and storage operations as well as the processing operations such as encoding/decoding, encrypt/decrypt and distribution. This metric reflects the experience of service (UX) observed by end-users and producers side.

  • Ensuring overhead: This metric captures the cost added to the service experience of end-users and producers.

  • Caching overhead: This metric captures the time spent by caching functionality, which includes the product transport from the cloud storage resources of a given member to the resources of another. This metric represents the cost of inter-publication scenarios described in Section 2.2.

  • Throughput: The metric collects the median amount of MB/s processed, transported and stored during the PUT/GET end-to-end workflows.

5. Evaluation results

In this section, we describe the analysis of results of the experimental evaluation of both case studies.

5.1. AEM-ERIS: A case study based on satellite imagery captured by ERIS antenna

The response times observed by producers and end-users associated to MexCinvestav-Cloud when uploading and downloading products of the AEM-ERIS catalogs are analyzed in this section.

5.1.1. Service experience of producers and end-users

We first analyzed the impact of the image size on the service experience (UX) observed by producers and end-users as we intuitively expect that the more the size of the files, the more the response time for uploads and downloads.

In the AQUA catalog, the median size of products is 196 MB (92% of products are in the range of 200–300 MB), the median size of products of TERRA catalog is 216 MB (the size of 94.5 % of the products in this catalog is in the range between 100 and 300 MBs) and the median size of products of Landsat catalog was 280 MB (this is the size for all the products in this catalog). All the products of AEM-Eris imagery include metadata and jpg format files.

(a) shows the median response time (vertical axis) per image size (horizontal axis) produced by all the configurations when uploading products of the AEM-ERIS catalogs in the federation. In this experiment, WebS configuration is the reference for managing non-sensitive products, whereas Phoenix and SkyCDS are the reference for managing products in a reliable manner. In this experiment, the security module was setting off as the products of the AEM-ERIS catalogs have been labeled with free-access attributes and they are considered as non-sensitive for AEM.

Figure 6. The response time per product size for all the configurations: (a) upload operations and (b) download operations.

Figure 6. The response time per product size for all the configurations: (a) upload operations and (b) download operations.

(a) shows that FedIDS produces the best performance for producers in this performance comparative. In fact, FedIDS improves the response time of WebS solution in up 19.9%, SkyCDS in 21.27% and Phoenix in 59%. This means that producers not only can withstand cloud storage outages (two for this configuration) but they also can improve the performance of the storing of satellite imagery in comparison to use a conventional solution as WebS without any data protection against outages. The improvement achieved by FedIDS in response times represents, for instance, that a producer in this experiment spent 1:33 h to transport a satellite imagery of 114 products (21 GBs), whereas WebS 1:59, SkyCDS 2:01 and Phoenix 2:12 h. This improvement is significant because FedIDS represents a time-saving solution for agencies and partners to deliver catalogs/products to either end-users or costumers. Moreover, the FedIDS performance is better than optimized fault-tolerant solutions (SkyCDS) and traditional fault-tolerant solution based on Information dispersal (Phoenix).

(b) shows the response time of downloads performed by an end-user by using each configuration studied in this experiment. As it can be seen, the behavior of all configurations is similar to that observed in (a) but an increased improvement is achieved by FedIDS in comparison to WebS (25.15%), SkyCDS (51.55%) and Phoenix (62.46%). This means the UX of end-users is not only affected by the decoding procedure performed in their computers but they also observe an improvement in product retrieval procedures. This is an important result as agencies expect to attend more downloads that uploads in real scenarios, attending to a Pareto principle (80% of downloads and 20% of uploads Gonzalez et al. Citation2013).

shows the improvement of FedIDS in comparison with the configurations studied in this experiment both for uploads (-Up) and downloads (-Down). The improvement for uploads of products with a mid-large size (L3/L2 with a size between 40 and 120 MBs) is better than the median (28.5% WebS, 28.45% SkyCDS and 64% Phoenix) and close to the median (16.39% WebS, 17.98 % SkyCDS and 58.09 % Phoenix) for raster products of large size (L1 with a size of 400/800 MB). As it can be seen, FedIDS represents a quite suitable solution for managing a satellite imagery including products with different types of format. In the performance comparative of downloads, FedIDS keeps a performance close to the median and it improves performance for large size products. FedIDS produced the lowest overhead for uploads (6%) and downloads (0.2%) in comparison to WebS, which means that producers can ensure the confidentiality, privacy integrity and availability of their products and even observing a performance very close to a conventional solution based on Web services (WebS). Although SkyCDS and WebS-E show an overhead that could be acceptable in comparison to WebS for uploads (41.22 % and 34.97% of overhead, respectively), the overhead in download operations for SkyCDS and WebS-E is more than 150 % in comparison to traditional WebS. This behavior represents a drawback for both solutions as the service experience of end-users is significantly affected by this overhead.

Figure 7. The improvement percentage of FedIDS in comparison with studied configurations.

Figure 7. The improvement percentage of FedIDS in comparison with studied configurations.

5.2. SMOS: a case study of satellite imagery acquired by ESA

In this section, we analyzed service experience of producers and end-users associated to Spain1-Cloud when uploading and downloading products of the SMOS (L2) catalog, which can be considered as sensitive products. In these experiments, the privacy, confidentiality and availability of L2 products are ensured by producers before sending them to the cloud federation; as a result, the experiments only consider SkyCDS and FedIDS configurations as neither WebS-E nor Phoenix offer security and reliability for products at the same time.

(a) shows, in the vertical axis, the response time observed by users uploading 667 products in the ESA-SMOS catalog (horizontal axis). As it can be seen, FedIDS yields lower response times than SkyCDS when uploading L2 products (40% in median). This improvement is almost constant for all sizes of uploaded products and this behavior is also consistent with results previously shown.

Figure 8. The response time for uploads and downloads of products in ESA-SMOS catalog: (a) upload operations and (b) download operations.

Figure 8. The response time for uploads and downloads of products in ESA-SMOS catalog: (a) upload operations and (b) download operations.

(b) shows the response time for FedIDS and SkyCDS when downloading L2 products. FedIDS also improves the performance of SkyCDS (66.4% in median) for downloads and its performance behave is constant for all the image sizes. This is an important result as some products could be considered sensitive for either some observation missions conducted for agencies or organizations that are in the business of managing and delivering derivative products to end-users (consumers).

5.3. Analysis of performance and storage utilization of federation services

In this section, we describe the performance, in terms of response time of FedIDS from a member federation point of view and the storage utilization required by FedIDS to ensure EO products. The first aspect we observed was that the servers of the federation members only received requests such as authentication of tokens, Pub/Sub management and metadata verification because the ensuring of satellite imagery in terms of security and reliability was performed by computers of producers and end-users. The experimental evaluation revealed that the median times spent by member servers for authentication (security) and Pub/Sub management were 25.5 and 40.3 ms, respectively, for both cloud sites.

FedIDS produces operations of inter-publication of products when end-users associated to Mexican member subscribe products of the SMOS (L2) catalog stored in the Spanish cloud as well as when end-users associated to the Spanish member subscribe catalogs stored in the Mexican cloud. In this type of operation, the caching functionality of FedIDS, in advance, moves products between Spanish and Mexican clouds. This avoids end-users assuming the costs of image transportation in the response times of upload and download operations. The size of products moved by caching functionality was 54 MB in median and the median time spent per image caching was 95 s. The time spent by Fed service to transport products of AEM-ERIS catalogs (let us consider Landsat products of 275 MB in median) form MexCinvestav-Cloud to Spain1-Cloud was 7.2 min in median. We consider that the caching overhead time is quite acceptable as it is absorbed by federation members just once and it is not added to the response time observed by end-users; as a result, the response times for subscriptions and downloads of products in inter-publication scenarios for producers and end-users are similar to those already reported in Section 5.1.1. Moreover, the contingency plans of Fed service and reliability module of IDS application assume a faulty scenario as default configuration; as a result, the performance reported in Section 5.1.1 considers the unavailability of cloud storage resources and site outages.

shows the median throughput observed by end-users of two members associated to Mexican cloud site (Mex2 at the center of country) and (Mex1 southeastern site) when using each option in their contingency plan to download EO products. indicates that operations are sent to the regular cloud site (local) network. The throughput of Plan A shows the effectiveness of caching solution as end-users can observe response times depending on their local network and the connection to their private cloud. In , the member nearest to failed member (broker in the federation) receives the requests sent by producers/consumers associated to the failed cloud. The throughput of Plan B shows the costs observed by end-users when FedIDS withstanding the unavailability of the whole cloud site. In , the producers/consumers are attended by another broker in the federation (the nearest broker is unavailable). The throughput produced when producers/consumers using Plan C includes the costs of sending requests to other member in the federation. (These costs depend on the time of the day where the failure is detected, the load on the cloud of the member host in the failure moment, the external network quality and the load sent by the member under failure to the hosting member.)

Figure 9. Throughput observed by end-users of associated to Mexican member for each option in their contingency plan.

Figure 9. Throughput observed by end-users of associated to Mexican member for each option in their contingency plan.

shows, in the vertical axis, the resultant size of EO products of different sizes (horizontal axis) when FedIDS protecting them before to send them to the cloud. We recall that FedIDS ensures EO products for frugality based on compression, reliability based on dispersal codification as well as confidentiality, privacy and integrity based on digital secure envelope. The storage capacity required by products ensured by FedIDS is compared with the storage capacity required by products protected by dispersion (increasing 66% of overhead), compressed (reducing 43% from unprotected products) and unprotected products (WebS). FedIDS only requires 6% more storage than unprotected products (WebS), which meansthat EOMs can get more protection functionalities for their products with less storage capacity improving the storage utilization, which is critical for cloud environments.

Figure 10. Storage utilization of FedIDS product protection.

Figure 10. Storage utilization of FedIDS product protection.

5.4. Qualitative comparative analysis of FedIDS with other available solutions

In this section, we compare FedIDS features with available solutions that do not include end-to-end reliability and confidentiality features of FedIDS, but considering other features covered by FedIDS or that FedIDS should cover in the future.

Globus Platform-as-a-Service (Globus Citation2017) and FACE-IT (Montella et al. Citation2015) (based on Globus and galaxy) are solutions that include features as sharing and transportation of large file by using Pub/Sub patterns, which are also offered by FedIDS. The motivation of this type of solution is that partners/organizations can take advantage of a consolidated Grid infrastructure and technology (including GridFTP) as well as security and reliability protection of files that is relegated to a third party, which reduces the management and processing costs of the global solution.

Among the features that FedIDS would cover in the future are solutions that enable scientists to build workflows to process and analyze a large volume of data (Taylor et al. Citation2007; Montella et al. Citation2015), allowing continuous flow among the involved processes (DevOps Citation2017; Jenkins Citation2017).

The following features will be considered in the feature version of FedIDS:

  • Workflows through the federation : The design of FedIDS architecture presented in this paper only considers dataflows for ensuring reliability and confidentiality of images by IDS application, which runs in the end-user computers. Reduced workflows are deployed to transport the ensured images to the federation; as a result, it could be required by some partners to perform chained processing stages in a similar way to those workflows performed by Triana or FACE-IT.

  • A suite of protocols : IDS applications only include drivers for HTTP(S) and bit-torrent (under development), so the inclusion of GridFTP/mdtmFTP into a suite of protocols would be an option for IDS service.

The main qualitative benefits of FedIDS, in comparison with previously defined solutions, are the end-to-end applications of IDS that enable end-users (either consumers or producers of satellite imagery) to establish controls over images in-house, including the following benefits:

  • Reliability : In IDS, end-users ensure fault-tolerance (n-m) for image transactions, which in available solutions is relegated to the file management service (e.g. Globus+S3, FACE-IT+Galaxy+Globus, etc.).

  • Control, security and efficient storage utilization : In IDS, end-users can establish controls over the privacy (based on digital secure envelopes), integrity (based on digital signatures) and confidentiality (using attribute-based encryption) in-house. In FedIDS, the organizations take advantage of the end-to-end model to reduce the workload in their cloud resources and the overhead of storage sent to the cloud storage. In models of single cloud (e .g. Globus+S3), a potential vendor lock-in could arise as the producers relegate the ensuring of data to a cloud provider.

The modularity of FedIDS enables organizations to manage contingency pools by using content delivery or sharing tools (e.g. Globus, CDS), without losing control over the security and reliability of their imagery.

6. Related work

EUMETSAT (Citation2017) (European Organization for the Exploitation of Meteorological Satellites) archive managed catalogs from three generations of satellites including 1 TB of meteorological products, which were transferred per day. EUMETSAT also requires the user intervention, and only supports FTP downloads. In contrast to NEREIDS and EUMETSAT where the users need to check for new products each day, Higuero, Tirado, and De La Fuente (Citation2009) proposed a management scheme based on Pub/Sub patterns implemented as an image delivery system that distributes available contents to interested users, reducing the user work associated to the discovery and transfer of new products. In fact, FedIDS has been inspired by the HIDDRA scheme but it is focused on security and reliability in cloud federation environments, which is not the scope of HIDDRA solution.

EOSDIS (Citation2017) (Earth Observing System Data and Information System) is the NASA solution to the problem of distributing huge amounts of data to different users. It processes more than 21,000 subscriptions per day by using a publisher/subscriber paradigm to inform users about new products and by using FTP for product transfer. FedIDS differs from EOSDIS in two aspects: the first one refers to the data transfer that is performed through FTP EOSDIS, whereas in FedIDS a set of protocols are available (FTP included). In FedIDS, the producers and end-users collaborate in the procedures of coding and decoding of products associated to reliability and security services. In turn, the participation of end-users is not contemplated in EOSDIS.

Transport protocols, inspired mainly by FTP, are a key component in the life cycle of satellite imagery. For instance, GridFTP (Citation2017) (in union with Globus Citation2017 ) is a traditional large file transfer solution for GRID environments, which increases the throughput per I/O operation through the deployment of multiple concurrent data streams. FDT and mdtmFTP (Zhang et al. Citation2017) are protocols that also deploy multiple concurrent streams with acceptable performance in dedicated network scenarios (Zhang et al. Citation2017). In FedIDS, the connection drivers for the transportation of ensured images are exchangeable. This means that different connectors can be added to the IDS service, so administrators can be chosen from those best suitable to meet a given requirement of a specific partner. By now, IDS service only includes connection drivers for HTTP and HTTPS (bit-torrent drivers are under developing) to deliver images to the heterogeneous and multiple cloud resources in the federation. These protocols are quite suitable for computers of users to establish connections with minimal requirements avoiding dedicated networks and the management of certificates required by other protocols, which avoids being intercepted by firewalls. Nevertheless, in scenarios where a set of partners requiring increased throughput, having the resource availability (dedicated networks and an agreement for managing certificates for transportation) and the agreement to use a single file management as globus online (including reliability based on S3), the use of GridFTP/mdtmFTP would be an option for IDS service. In FedIDS, the management of files is relegated to local file systems to reduce requirements to be met by partners in the federation.

Federated networks have been widely studied in the past for grid environments for a set of partners (business units) to perform procedures in a joint manner by sharing data across a unified infrastructure. This type of scheme has been currently proposed for sharing geospatial data through federation of data infrastructure (Gonzalez et al. Citation2015). A specific implementation joins the GRID infrastructure of a set of organizations (DataNet Citation2017).

MODIS-Azure (Li et al. Citation2010), NASA Nebula Cloud Computing Platform (Riso and Gretchen Citation2010) and ESA Helix Nebula (Marx Citation2013) have shown that cloud storage could be a cost-effective middleware between FTP servers of observation mission and scientific community. Today, cloud services are becoming a solution for producers and manufacturers to manage satellite imagery (CloudPlanetLabs Citation2017; DigitalGlobe Citation2017; AirbusdsCloud Citation2017).

Cloud-based federations have been also designed (Goiri, Guitart, and Torres Citation2011; OnApp Citation2017) for public cloud storage providers as well as for private clouds (Gonzalez et al. Citation2013). Cloud content delivery systems or CDS (CloudFront Citation2017) and cloud storage (AWS Citation2017; Azure Citation2017) are available solutions for space agencies and government departments to outsource the management and storage of satellite imagery to a cloud service provider. ESA developed a hot-standby solution in which two private cloud infrastructures (Germany and Italy partners) have been deployed by cloud service provider to withstand the cloud site failures (Esacloud: flexibility to transform space exploration). In recent years, this hot-standby was outsourced to a public cloud provider (Orange Citation2014).

However, additional to outsource data storage in the aforementioned solutions, the producers and organizations are also outsourcing the control of their data, which could end in legal and security issues. These issues are faced up in FedIDS solution.

The multi-cloud storage approaches represent a solution to solve the inconveniences of having one single endpoint for uploads/downloads associated to cloud storage solutions (Spillner et al. Citation2011; Gonzalez and Marcelin-Jimenez Citation2011; Spillner, Müller, and Schill Citation2013; Gonzalez et al. Citation2015). This type of approach produces multiple upload and downloads endpoints, which minimize the side effects of storage service outages. The data distribution by using multiple cloud resources proposed for public (Spillner, Müller, and Schill Citation2013) and private (Gonzalez et al. Citation2013) cloud enables organizations to take advantage of multiple uploads/downloads for reducing risks of lock-in with a single provider using information dispersion schemes. This type of solution enables organization to reconstruct contents that are unavailable during the outage of a set of providers and to outsource data storage in a reliable manner but they are not focused on the building of workflows or data sharing spaces across the infrastructure of several organizations. FedIDS faces up to this type of issue and improves the performance of multi-cloud solutions by using an end-to-end approach where producers and end-users collaborate in the ensuring of products.

7. Conclusion

In this article, we presented the design, development and implementation of a Federated Cloud Storage Architecture (Fed) and Satellite Image Delivery Service (IDS) for Building Dependable Geospatial Platforms. Fed component enables a set of organizations to build dependable SDI through federation of independent cloud resources, whereas IDS enables the owners of images to establish in-house access control over acquisition and sharing operations to avoid violations of confidentiality and loosing of integrity. A federated geospatial platform deployed in two countries (Spain and Mexico) was built by using a FedIDS prototype, which was evaluated through two case studies: the first one based on satellite imagery captured by a Mexican antenna. The second one was based on satellite imagery provided by ESA's SMOS mission. The exchange of images between Mexican and Spanish sites and outage scenarios were also evaluated.

The evaluation shown a reduction of workload in the servers of the federation members, which only received requests such as authentication of tokens, Pub/Sub management and metadata verification. In FedIDS, the continuous data flows and in-memory storage mechanisms included in IDS service compensate that the computers of producers/end-users absorb the highest costs of ensuring and transporting of images. In fact, the evaluation revealed that the service experience of producers/end-users using FedIDS was not only affected but they also observed an improvement in the performance of product delivery/retrieval procedures in comparison with the performance produced by web-based security/reliability solutions considered in the experimental evaluation. The outages of a sub-set of federation members remain transparent for producers and end-users. The evaluation also revealed the efficiency of FedIDS in terms of reliability (end-users could get/share digital resources in outage scenarios), frugality (only 6% of extra storage was produced by FedIDS to withstand service outages when traditional methods such as dispersion and replication produce 66.7% and 200%, respectively) and confidentiality (the owners of digital resources could ensure integrity, privacy and access control to their products in collaborative workflows). In comparison with available solutions, FedIDS reduced the time observed by producers in 40% and end-users in 60% for delivery and acquisition of digital earth resources. FedIDS is currently being used in an ongoing project supported by a Space Agency to build a dependable federated geospatial platform in which a set of partners can share to each other in a secured, efficient and resilient manner, satellite images and derivative products created by researchers associated to disaster prevention projects. The goal of this project is to enhance the sharing of digital resources produced by the agency and its partners and to foster the information diffusion about virtual earth through a public-domain geoportal developed by using FedIDS.

We currently are working on a modular architecture for deployment of processing workflows in the form of value chains through the federation. We are also considering a processing architecture based on the elastic container-based model similar to Skluzacek, Chard, and Foster (Citation2010) to process a large volume of geospatial data as well as the inclusion of some available tools to FedIDS as services, such as Globus to our metadata layer and GrdiFTP/mdtmFTP to the IDS service and the storage/transportation layer.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by CONACYT GRANT Fondo Sectorial Mexican Space Agency-CONACYT Num. [262891] and European Union (EU) under the COST programme Action called “Network for Sustainable Ultrascale Computing (NESUS) [IC1305]”.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.