1,835
Views
6
CrossRef citations to date
0
Altmetric
Articles

A user perspective on future cloud-based services for Big Earth data

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1758-1774 | Received 29 Jun 2021, Accepted 14 Sep 2021, Published online: 27 Sep 2021

ABSTRACT

Cloud-based services introduce a paradigm shift in how users access, process and analyse Big Earth data. A key challenge is to align the current state of how users access, process and analyse the data with trends and roadmaps large data organisations layout. In addition, due to the increased availability of open data, a more diverse user base wants to take advantage of Earth science data leading to new user requirements. We run a web-based survey among Big Earth data users to better understand the motivation to migrate to cloud-based services as well as the challenges and opportunities that might arise. Results show an overall interest in moving to cloud-based services but air an insufficient literacy in cloud systems and a lack of trust due to security concerns and opacity of emerging costs. These gaps demand efforts on three levels. First, cloud services shall be targeted at intermediate users instead of policy- and decision-makers and over-engineered systems with a high level of abstraction should be avoided. Second, more substantial capacity-building efforts are required to decrease the existing gap in cloud skills and uptake. Third, a cloud certification mechanism could help in building up overall trust in cloud-based services.

1 Introduction

The volume, variety, and velocity of data from Earth-observing satellites and Earth system models have increased exponentially, and Terabytes of open data are disseminated on a daily basis. This exponential growth of Big Earth data, coupled with technological advancement, leads to a paradigm shift of how and where data is stored as well as how users access, process and use the data. The traditional data-centric approach of Earth Observation data management and analysis, where the download of large volumes of data to local machines is central and the storage and processing of this data is executed locally, hampers the full use of open Earth data available (Overpeck et al. Citation2011; Ramachandran et al. Citation2018; Sudmanns et al. Citation2019; Gomes, Queiroz, and Ferreira Citation2020). The ‘Moving Code’ paradigm brings a shift to the traditional data-centric approach. Instead of downloading and copying large volumes of Earth data to local machines, the data is accessed, processed and analysed on cloud-based services where the data is located (Gomes, Queiroz, and Ferreira Citation2020). This shift ‘brings users (and executable code) to the data’ instead of ‘data to the users’ (Sudmanns et al. Citation2019). Cloud computing is one of the major current trends in Earth Observation (European Commission Citation2019; Yang et al. Citation2019; National Aeronautics and Space Administration Citation2020). Many publicly funded or commercial cloud infrastructures have evolved in recent years, offering specific services for effectively managing and analysing Big Earth data (Camara et al. Citation2016; Gorelick et al. Citation2017; Ramachandran et al. Citation2018; Gomes, Queiroz, and Ferreira Citation2020). Cloud infrastructures differ in their level of specialisation, functionalities they offer and the legal entity of the cloud provider. Even though these systems allow a very effective and scalable processing of Big Earth data, the systems demand advanced technical knowledge from users to take full advantage (Gomes, Queiroz, and Ferreira Citation2020). For this reason, ‘user-centric’ data services have been developed, with the aim to still offer advanced access to and processing of Big Earth data, but hiding the underlying technical complexities of cloud-based systems through a layer of abstraction (Camara et al. Citation2016; Giuliani et al. Citation2019; Gomes, Queiroz, and Ferreira Citation2020). Gomes, Queiroz, and Ferreira (Citation2020) quote these solutions as ‘Platforms for big EO Data Management and Analysis’ and provide a list of popular platforms as examples. Among them are the Google Earth Engine (Gorelick et al. Citation2017), data cubes, e.g. the Open Data Cube (Killough Citation2018), and OpenEO, an open-source interface between EO data infrastructures and front-end applications (Pebesma et al. Citation2017; Schramm et al. Citation2021). These platforms differ in the underlying technologies they use, their level of openness, abstraction, and the data and functionalities they offer. Even though these platforms aim at improving the access, management and processing of Big Earth data, the level of abstraction comes at a cost of flexibility. Users are restricted to the type of data and functionalities the platform offers. This limitation becomes challenging, especially when users need to combine different sources of data, e.g. climate data from models and Earth Observation data from satellites (Wagemann et al. Citation2021).

The challenge with the user-centric paradigm is that the term user is not well-defined and additionally, through open data policies, users of Big Earth data have diversified (European Commission Citation2019). While CEOS identifies three key stakeholder groups of the Earth data value chain, (i) EO data providers, (ii) Big Data hosts and aggregators and (iii) data users, the European Commission only makes a distinction between intermediate and end users (Committee on Earth Observation Satellites Citation2019; European Commission Citation2011). In addition, Ramachandran et al. (Citation2018) classify cloud vendors as data users but consider them as a new class of data consumer with distinct requirements. Users are a driving force in the development of new ‘user-centric’ systems, but the different user definitions lead to a diversity of new (cloud-based) systems, all claiming to address user needs. In Europe, cloud uptake is still low and downloading large volumes of data seems still to be the prevailing mode of data access (Carter et al. Citation2018; European Commission Citation2020a; Wagemann et al. Citation2021). However, the European Commission considers cloud services as the way forward in order to efficiently store, process and disseminate large volumes of open data and to increase their uptake (European Commission Citation2020a). For this reason, the EC initiated several publicly funded cloud activities in recent years, including the European Open Science Cloud and the Copernicus Data Information and Access Services. The US government has already started in 2011 to publish a long-term high-level strategy to foster cloud adoption in federal agencies (Kundra Citation2011). As a result, the National Aeronautics and Space Administration (NASA) for example has developed a strategic vision to centralise key components of their Earth Observing System Data and Information System (EOSDIS) in a commercial cloud environment (National Aeronautics and Space Administration Citation2020). In Europe, the next step is ‘Destination Earth’, a bold multi-year initiative by the EC, which aims to build a ‘digital twin’ of planet Earth to provide forecasts of floods, droughts and fires at a 1 km spatial resolution, a scale many times finer than current models (Voosen Citation2020). Data at a finer spatial and temporal resolution results in exponentially growing data volumes and forces data organisations to migrate their existing ‘download-based’ systems to new (cloud-based) solutions to be able to effectively manage and disseminate the data also in the future. On the other hand, working with these massive amounts of data and new data systems constitutes an increasing problem for data users. This is one reason why Destination Earth aims at supporting users to co-design a digital twin platform rather than an isolated system (Bauer, Stevens, and Hazeleger Citation2021). In order to bridge this gap, it is crucial to better understand the current state of Big Earth data users and their future needs. These insights will help to tailor new cloud-based data systems to user needs and develop adequate capacity building activities to empower users to build up necessary skills (NASA Citation2019).

We run a web-based survey among users of Big Earth data to analyse the user’s perspective on cloud-based data systems for Big Earth data. The survey aimed to better understand the users’ motivation to migrate to and use cloud-based services in the future and how they wish to work with such systems. This paper is structured as follows: Section 2 outlines the methodology how the survey was conceptualized, implemented and analysed. Section 3 presents the survey results, which are then discussed in Section 4. In the end, we provide a conclusion.

2. Methods

We run a web-based survey on current and future user requirements of Big Earth data (Wagemann et al. Citation2020). This paper focuses on analysing questions related to the current and future use of data access systems. A total of nine questions is analysed. One question aimed at identifying the types of data access systems currently used and the ones of interest to be used in the future. The future data services section of the questionnaire aimed to identify whether data users are motivated to migrate to cloud-based data services in the future and, if yes, what aspects they consider important. Questions included the motivation to migrate to cloud services, preferences in the legal policy of cloud services, use of cloud services, the importance of security aspects and willingness to pay for cloud services. See Appendix 1 for the survey’s ethics statement.

The survey was implemented with EUSurvey, a tool offered by the European Commission to create online surveys and forms (European Commission Citation2021). The survey was open for responses during two time periods: the first survey period was between 12 November 2018 and 30 January 2019 where 213 responses were collected. The second period was between 11 April and 31 May 2019, during which another 18 responses were collected. The selection of the dissemination channels aimed to get the perspectives from different communities and practitioners within the geospatial field, who make use of Big Earth data in their application. Dissemination channels included social media channels, primarily Twitter and LinkedIn, professional mailing lists, and individual experts of stakeholder organisations within the geospatial field. Main hashtags used for social media posts were #earthobservation #geospatial #meteorological #climate #BigData #copernicus and #CloudComputing. The professional mailing lists included mailing lists run by the Google Earth Engine Developers Community, Open Geospatial Consortium (OGC), Division on Earth and Space Science Informatics of the European Geosciences Union, (ESSI EGU), Copernicus and ECMWF. Additionally, individual experts from ESA, OGC, the Group on Earth Observations (GEO), organisations of the United Nations working with geospatial data, e.g. the World Food Programme (WFP) and the Food and Agricultural Organisation (FAO) were contacted and asked to distribute the survey within their professional network.

Statistical significance tests were conducted in order to identify potential differences and preferences of cloud service use regionally (between Europe and the USA & Canada) and between different work sectors. We run a chi-square test and an analysis of the residuals in order to: (i) compare the use pattern of cloud services between Europe and the USA & Canada, (ii) compare the current use of data access systems among work sectors and (iii) compare the legal preference of cloud services between Europe and the USA & Canada and different work sectors. The levels of significance are indicated as follows: p < 0.1(-), p < 0.05(*), p < 0.01(**) and p < 0.001(***).

For the analysis, we used the statistical package R (R Foundation for Statistical Computing Citation2021), and for visualizations, the R package ‘ggplot2’ (Wickham Citation2016).

3. Results

A total of 231 responses were collected from 37 countries, with the majority coming from central Europe (Germany: n = 53 (23%); United Kingdom: n = 21 (9%); Italy: n = 16 (6.9%)) and the North American continent (United States of America: n = 34 (14.7%); Canada: n = 13 (5.6%)). 70% of the respondents were between 30 and 50 years old. University (n = 106, 45.9%) followed by Government (n = 46, 19.9%) and Established Company (n = 41, 17.7%), were the main work sectors indicated.

3.1. Current and future use of data systems

Downloading data is the prevailing mode of data access (70% of survey respondents) across all work sectors, followed by accessing data through a cloud-computing infrastructure or OGC web services (). Users working in research seem to emphasize download services, as more than twice as many use a download service compared to other data access services. Custom APIs, data cubes and virtual research infrastructures are predominantly used at universities and in the private sector and less in government and non-profit/intergovernmental organisations. In general, user-centric data services (data cubes or virtual research infrastructures) are the services least used by survey respondents.

Figure 1. Response to the question ‘How do you currently access large volumes of Big Earth data?’ divided by four work sectors (i) University / Research, (ii) Government, (iii) Private sector and (iv) Non-profit / Intergovernmental organisation. The bar plots indicate absolute numbers (n = 230, one entry was removed due to invalid responses).

Figure 1. Response to the question ‘How do you currently access large volumes of Big Earth data?’ divided by four work sectors (i) University / Research, (ii) Government, (iii) Private sector and (iv) Non-profit / Intergovernmental organisation. The bar plots indicate absolute numbers (n = 230, one entry was removed due to invalid responses).

Additionally, the survey respondents were asked to indicate which data access service they would like to use in the future and which service they are not interested in. Absolute numbers indicate a stronger future interest in cloud-based services and user-centric services, such as data cube technologies and virtual research infrastructures. These absolute numbers are put in relation when the ratio between the responses of ‘future use of service’ and ‘no interest in this service’ is built. A ratio higher than one indicates an increased interest in this data service in the future, whereas a ratio less than one indicates an amplified disinterest in the service. shows the ratios for each service per work sector. Download service and cloud-based services are the two data access modalities survey respondents across all work sectors would like to use in the future. Survey respondents working in government and the private sector further show an interest in using OGC web services. Notable is the high ratio of download services among survey respondents working at universities. This is significant at the 0.01 significance level (p-value: 0.003965**).

Table 1. Ratio of the response options ‘I would like to use this service in the future’ vs ‘I am not interested in this type of service’ of the question ‘How do you currently or how would you like in the future to access large volumes of Big Earth data?’ A ratio above one (numbers in bold) indicates a stronger interest in using the respective data access system in the future.

shows the data access preferences of survey respondents in Europe (top) and the USA & Canada (bottom). The current uptake of cloud services seems to be higher in the USA & Canada, where more than 50% of survey respondents in the USA & Canada currently already access data from a cloud service. In Europe, fewer survey respondents are already using cloud services, but more than half is interested in using cloud-based services in the future.

Figure 2. Data systems (i) currently used (dark-green), (ii) interested to use in the future (light-green) and (iii) of no interest (grey). Top graph shows rel. frequencies of survey respondents from Europe (n = 112) and graph on the bottom shows relative frequencies of survey respondents from the USA & Canada (n = 47).

Figure 2. Data systems (i) currently used (dark-green), (ii) interested to use in the future (light-green) and (iii) of no interest (grey). Top graph shows rel. frequencies of survey respondents from Europe (n = 112) and graph on the bottom shows relative frequencies of survey respondents from the USA & Canada (n = 47).

shows that the use pattern of cloud services between Europe and the USA & Canada is significantly different at a 0.1 significance level. Furthermore, the residual analysis shows a positive association with a future use of cloud services in Europe whereas the USA & Canada are stronger associated with a current use of cloud services.

Table 2. Chi-square results comparing use pattern of data systems between Europe and the USA & Canada (The use pattern of cloud services is significantly different between Europe and the USA & Canada at the 90% confidence level)

3.2. Motivation to migrate to cloud services

There is an overall interest and motivation to migrate to cloud services in the future (see ). Almost 70% of the respondents indicated an interest or strong interest compared to only 10%, who are not interested in a change. We asked those who are not interested in migrating, to provide a reason for their disinterest. Reasons include a lack of confidence in the systems, cost (it is more economic to process data on internal / local systems), risk of dependency to a commercial cloud provider, institutional policies, no added-value foreseen, intermittent internet access, lack of resources as well as security and privacy aspects.

Figure 3. Response in relative frequencies to the question: ‘How much are you interested in migrating your processing tasks to a cloud service (commercial cloud vendor or publicly funded cloud service) in the future?’. The Likert scale had five response options from `not at all interested` to `very interested` (n = 225, 6 respondents did not provide a response).

Figure 3. Response in relative frequencies to the question: ‘How much are you interested in migrating your processing tasks to a cloud service (commercial cloud vendor or publicly funded cloud service) in the future?’. The Likert scale had five response options from `not at all interested` to `very interested` (n = 225, 6 respondents did not provide a response).

3.3. Preference of legal policy of cloud services

When asked for a preference of the legal policy of the cloud service provider, a preference is given to publicly-funded cloud services in general (see – top). A third of the respondents would prefer the use of a publicly funded general cloud service, e.g. the European Open Science Cloud. Another 22% would prefer a publicly funded specialised cloud, e.g. the Copernicus DIAS reference service WEkEO. Opposite, a quarter of the respondents do not mind the cloud-service policy at all. Six survey respondents did not provide their preference.

Figure 4. Preference of legal policies of cloud services in %, differentiated between sub-groups: regional (Europe (n = 112) and North American continent (n = 47) (top)) and work sectors (bottom) (n = 225, six respondents did not provide a response). P-values of a chi-square test are significant at a 0.001 significance level (regional: 1.078e-05***, work sectors: 0.0004462***).

Figure 4. Preference of legal policies of cloud services in %, differentiated between sub-groups: regional (Europe (n = 112) and North American continent (n = 47) (top)) and work sectors (bottom) (n = 225, six respondents did not provide a response). P-values of a chi-square test are significant at a 0.001 significance level (regional: 1.078e-05***, work sectors: 0.0004462***).

The preference of a publicly funded cloud service tends to be stronger in Europe compared to the North American continent ( - top). Among all survey respondents from the North American continent, most indicated no specific preference followed by a preference of a commercial cloud vendor. In contrast, most survey respondents in Europe prefer a publicly funded general cloud, followed by a publicly funded specialized cloud and no specific preference. This association is reflected in a significant chi-square test comparing the preference of the legal policy between Europe and the USA & Canada (p-value: 1.078e-05***). The residual analysis confirms that the USA & Canada are positively associated with commercial cloud vendors, whereas Europe is positively associated with publicly funded cloud services.

Compared with work sectors ( – bottom), the preference for publicly-funded clouds is highest in publicly funded work sectors, such as university, government and intergovernmental organisations / non-profits. No specific preference is highest among respondents working in a start-up or an established company. The preference for commercial cloud vendors is lowest among survey respondents working at university. The significance testing confirms a correlation between work sectors and legal policy of cloud services (p-value: 0.0004462***). The analysis of residuals confirms a negative association with commercial cloud services and users working in research (work sector ‘university’), whereas the work sector ‘Start-up / Established company’ is positively associated with no specific preference of the legal policy (‘I do not mind’ response). Contrary, the work sector government is negatively associated with ‘I do not mind’, which leads to the conclusion that survey respondents in government mind the legal policy of cloud services.

3.4. Use of cloud services

Data access, processing and storage are rated equally high (around three-quarters or even higher) when asked about the use of cloud services. A bit more than half (53%) of the survey respondents indicates to use cloud services for all three services: data access, storage and processing. Additionally, two out of three, who would like to use the cloud for data processing, would prefer to have pre-defined algorithms and standard processing libraries available in the cloud. Only 3% do not have any application need. Developing own algorithms, deploying own applications in the cloud, provision of operational services and data transfer and migration of cloud services were mentioned as ‘other’ cloud service uses.

3.5. Statements to cloud-based services

We provided a list of five statements related to cloud services and asked the respondents to choose which ones would be true for them (). The option to upload own datasets and combine those with other datasets available on the cloud was rated most, followed by the need to export intermediary or final results out of the cloud. More respondents would like to work on a cloud system collaboratively (e.g. sharing libraries, open data, and contributing with sharing data and workflows) rather than working privately on the cloud. Finally, the geographic location of the cloud service (e.g. if the cloud server is either located in Europe or in the USA) seems to play a less important role.

Figure 5. Absolute (Bars) and relative frequencies (labels on the right – n = 226) of five given statements to cloud services.

Figure 5. Absolute (Bars) and relative frequencies (labels on the right – n = 226) of five given statements to cloud services.

3.6. Security aspects of cloud services

We asked the respondents to rate the level of risk (no risk at all, risk, major risk) of five security aspects related to cloud-based services: (i) data integrity (assurance of the accuracy and consistency of a data set over its entire life-cycle), (ii) data breaches (data information is copied, viewed, transmitted, stolen or used from unauthorized individuals), (iii) data loss (data information is destroyed and cannot be recovered), (iv) service unavailability (service is not continuously available) and (v) security of private / restricted data. At least two-thirds of the respondents rated all given security aspects as risk or major risk, with ‘service unavailability’, ‘data loss’ and ‘data security’ being the top three security concerns (). Other risks mentioned included e.g. a changing business model of the cloud vendor, low data transfer rates, general law restrictions related to a different geographic location of the cloud or migration to a different cloud provider.

Figure 6. Security aspects of cloud services and related level of risk. The right plot shows responses of ‘major risk’, ‘risk’ and ‘no risk’. The left plot shows rating for response option ‘Might be a risk, but not important for me’. Bars are absolute numbers, and labels show relative frequencies (n = 228, three respondents did not provide a response).

Figure 6. Security aspects of cloud services and related level of risk. The right plot shows responses of ‘major risk’, ‘risk’ and ‘no risk’. The left plot shows rating for response option ‘Might be a risk, but not important for me’. Bars are absolute numbers, and labels show relative frequencies (n = 228, three respondents did not provide a response).

3.7. Examples of data workflows to be executed in the cloud

A total of 70 respondents (30% out of 231 survey respondents) provided an example of a workflow they would like to execute in the cloud. The responses can be grouped in eight broader categories (see ). Those categories with more than ten responses are related to (i) improving the overall efficiency of data handling aspects, including data discovery, access, processing and sharing, (ii) time-series analysis and (iii) automating pre- and post-processing of large volumes of data. Other categories include analysis of global data fields, combining and comparing different data types, modelling, Machine-Learning application and preparing, hosting and serving data. See (Wagemann et al. Citation2020) for individual responses to each category.

Table 3. The reported categories base on free-text responses of the question to provide an example of a data workflow to be executed on a cloud-based service. Out of the 70 responses, eight broader categories could be built.

3.8. User’s perspective on an estimation of technical cloud requirements

Only one out of four (23.8%, n = 55) survey respondents would be able to estimate technical requirements needed for data storage and processing in the cloud, such as storage space, RAM and number of CPUs. We categorized the responses to data volume, number of cloud instances, memory and number of CPUs (). Most respondents would need access to a data volume between 1 and 10 TB and up to 50 parallel instances to scale processing. Though, many highlighted a preference to use container-orchestration systems that manage the scaling of applications based on requirements and user loads. Regarding the memory and number of CPUs per instance, either 64 GB or less than 10 GB combined with a lower number of CPUs (<10) is preferred. Other requirements were mentioned, e.g. Pangeo software stack, PyTorch, Tensorflow, network bandwidth, local SSD storage for caching and private network.

Figure 7. Technical requirements for data access and processing in the cloud – responses to the open question were categorized in (i) data volume, (ii) number of instances, (iii) memory and (iv) number of CPUs.

Figure 7. Technical requirements for data access and processing in the cloud – responses to the open question were categorized in (i) data volume, (ii) number of instances, (iii) memory and (iv) number of CPUs.

3.9. Willingness to pay for cloud services

A total of 226 respondents provided a response to the question ‘Would you be willing to pay for services in the cloud’. Almost half (111 respondents, 49.1%) would make it dependent on the cost of the cloud service, 28.8% (65 respondents) would not be willing to pay for cloud services, and 22.1% (50 respondents) would be willing to pay for services in the cloud.

Those who would be willing to pay for services on the cloud (n = 161 – respondents who would pay for processing and who make it dependent on the cost combined) were additionally asked to specify the services they would be willing to pay for (see ). Almost 80% (128 respondents) would pay for processing services, followed by a bit more than two-third (67.7%, 109 respondents) who would be willing to pay for data storage. Services respondents would be less likely to pay for are service support and data down- and upload (less than 40%). Other mentions included the willingness to pay in proportion of the revenue made by selling the products or services or for server space.

Figure 8. Types of cloud services 161 users are willing to pay for. Bars indicate absolute number, labels on the bottom indicate %.

Figure 8. Types of cloud services 161 users are willing to pay for. Bars indicate absolute number, labels on the bottom indicate %.

We further asked those willing to pay for cloud services (n = 161) for their price limits, up to which they would pay (). Half of them (81 respondents, 50.3%) prefer a cost model based on a monthly/ annual subscription fee. 37.3% would be willing to pay up to 1000 Euro / US Dollars or less.

Figure 9. Price limit up to which 161 survey respondents would be willing to pay for cloud services. Bars indicate absolute number, labels on the right indicate %.

Figure 9. Price limit up to which 161 survey respondents would be willing to pay for cloud services. Bars indicate absolute number, labels on the right indicate %.

4. Discussion

We presented findings of a user survey conducted among users of Big Earth data with a focus on the current and future interest in using cloud-based services. The survey results primarily reflect the perspective of Big Earth data users in Europe and the USA & Canada. Even though our results are not representative for users of Big Earth data worldwide, we are certain to provide valuable indications on how Big Earth data users might use cloud-based services in the future. One could question whether the relatively low number of total survey respondents (n = 231) allows for drawing general conclusions about users of Big Earth data. Our response rate is similar to those from other studies (between ∼100 to ∼300) on open science and open data sharing (Abele-Brehm et al. Citation2019; Scherp et al. Citation2020) or capacity-building in Earth Observation (Chasmer, Ryerson, and Coburn Citation2021). These studies argue that lower response rates still offer valuable insights for a sub-population on a respective topic, as long as the sub-population is well understood. Wagemann et al. (Citation2021) discussed the representativity and sub-population of the survey in detail. In general, they conclude that the total number of users of Big Earth data worldwide and the expected response rate are hard to estimate. However, they argue that the representativity of the data set is underlined by the distribution of age (∼70% are in the age group between 30 and 50) and the distribution of work sectors (∼50% indicated to work at university). Both variables are likely to reflect the general distribution of Big Earth data users. Researchers have traditionally been the main user group of Big Earth data and only with the introduction of open data policies, other sectors started gaining interest. For this reason, it is still expected that a larger percentage of users work in academia (Wagemann et al. Citation2021). Further, the survey introduction addressed ‘users working with large volumes of environmental data’ and mentioned that it is about the interest to migrate to future cloud-based services. For this reason, it is reasonable to assume that only users who face challenges in handling and processing Big Earth data and who are in some way interested in infrastructural aspects of data management and processing participated in the survey.

In general, results of this study reveal a great interest in using cloud-based services in the future (70% of all survey respondents are open to migrate). Given the great success of cloud computing in Europe, the USA (National Aeronautics and Space Administration Citation2020), and China (Huadong Citation2018), users in other parts of the world will likely rely on cloud-based services as well, even though regional differences exist. For example, in the USA and Canada, cloud uptake seems be more advanced than in Europe, where the current use of cloud-based services is still lower than the interest in using them in the future. These findings align with other European-focused surveys and reports on the understanding and uptake of cloud services. Results from the National Initiatives Survey by the European Open Science Cloud (EOSC) consortium show a high future interest in EOSC but a current low familiarity of EOSC activities (Bodlos et al. Citation2020). The European Data Strategy by the European Commission (EC) certifies, in general, a low cloud uptake across Europe. However, the EC considers cloud services an essential part of the European Data economy (European Commission Citation2020a). For this reason, Europe has put a strong emphasis on the development of publicly funded cloud solutions in recent years, with cloud initiatives such as the European Open Science Cloud (EOSC) (Bodlos et al. Citation2020), the Copernicus Data Information and Access Services (ECMWF Citation2018) or the European Weather Cloud, a community cloud for the meteorological and climate community (Pappenberger and Palkovic Citation2020). A new multi-year initiative of the EC called Destination Earth (DestinE) is expected to start in 2021. DestinE aims to construct ‘digital twins’ of the Earth, highly accurate federated cloud-based models that help to better monitor and predict environmental change and human impact (European Commission Citation2020b; Bauer, Stevens, and Hazeleger Citation2021). In addition, commercial and general-purpose cloud solutions, such as Amazon Web Services, the Google Cloud Platform, and Google Earth Engine originate primarily from big technology companies based in the USA.

Survey results revealed distinct preferences of legal cloud service policies by region and work sector. Survey respondents in Europe prefer publicly funded cloud solutions, whereas survey respondents from the USA and Canada are positively associated with commercial cloud services. These regional preferences can be attributed to different perceptions of privacy, security and trust in cloud computing in Europe and the USA. In Europe, data protection and privacy is considered a fundamental human right and, since 2018, strictly regulated under the ‘General Data Protection Regulation (GDPR)’. The USA does not have a coordinated approach. The need for data privacy and security is more considered ‘avoiding harm to people in specific contexts’ instead of actively protecting personal data (Pearson and Yee Citation2013). Public discussions contesting the compliance of US-based technology firms with European data protection and privacy regulations have indeed decreased the overall trust in big technology firms in Europe. A survey among European citizens in 2011 about attitudes on data protection revealed that services offered by public authorities and institutions, including the European Commission, are trusted more than commercial service providers (European Commission Citation2011). A natural conclusion would be that users from Europe, besides a preference for public funded cloud services, also strongly emphasize the geographic location of the cloud server. However, the survey respondents have rated the geographic location of a cloud server as the aspect least important. Even though the question was expressed in a more general context and not restricted to security aspects only, the results indicate that users of Big Earth data might prioritise other aspects when it comes to using cloud-based services. It could also mean that users of Big Earth data are simply not aware of the impact local legislations can have on privacy issues and management. In any case, the use of a cloud service is always a trade-off between multiple criteria, including security, privacy, compliance, costs and benefits (Pearson and Yee Citation2013; Ogunlolu and Rajanen Citation2019). Most likely, if there were two systems offering the same data access and processing functionalities, users from Europe would put more trust in the European cloud service. Due to a lack of alternatives, users trade privacy and security concerns for cloud service benefits, such as advanced data access and computing capabilities (Polyviou and Pouloudi Citation2015; Ogunlolu and Rajanen Citation2019).

On work sector level, survey results still show a strong need for downloading data and a negative association with commercial cloud services for users working at university. In general, the private sector (start-ups and established companies) does not seem to adhere to the legal policies as strictly as the public sector. These observations can be explained by the fact that the private sector is often more independent in the choices of technology it uses. In addition, it often has to offer ‘operational’ services and, for this reason, shows a preference for commercial cloud services with clear Service Level Agreements. In contrast, the public sector and universities are often constrained by institutional or research policies, e.g. the research funding is linked with using a certain cloud service. Further, the preference for public clouds by the research community can also be attributed to the community’s general tendency using open(-source) solutions. The preference for downloading data among researchers can most likely be attributed to having maximum control on datasets and processing.

The roadmap in the Big Earth data landscape foresees in the next ten years that users of Big Earth data will exclusively access and process open data on cloud-based services. In particular, an arbitrary ad-hoc combination of different heterogeneous data sets will generate the input for Artificial Intelligence (AI) applications (European Commission Citation2020b; ECMWF Citation2021). This is ambitious and a paradigm shift in many aspects for users, considering the current state of how users of Big Earth data access, process, and analyse data (Bauer, Stevens, and Hazeleger Citation2021; Wagemann et al. Citation2021). Despite the general interest in migrating to cloud services in the future, the survey unveils a gap between skills required to use cloud services and the users’ current skill set. Only one out of four of the survey respondents can specify technical requirements for cloud-based processing. This gap in technical expertise is why user-friendly platforms such as data cubes or Virtual Research Infrastructures have been developed. They introduce a layer of abstraction to hide technical complexities (Camara et al. Citation2016; Killough Citation2018; Gomes, Queiroz, and Ferreira Citation2020). In the example of data cubes, technical complexities are reduced by offering pre-processed data (analysis-ready data (ARD)) that can be analysed with a list of pre-defined application algorithms (Killough Citation2018). It seems tough that such systems benefit smaller specialised user groups but are too restraining in general. The results of our survey further unveil a particular disinterest in such platforms. Cloud services, specifically those that offer Infrastructure-as-a-Service and Platform-as-a-Service functionalities, provide great flexibility but at the same time require an understanding of network administration and system engineering. Even though cloud providers also try to hide technical complexities to facilitate the use of these systems, users are usually confronted with questions related to number of instances, storage space, type of VMs, etc. during the configuration process. This leads to the basic question of whether domain users, such as researchers and data scientists, are expected to develop the necessary technical capacities or if it would be better to build interdisciplinary teams bringing together domain researchers and system engineers / network administrators.

Survey results further reveal an overall scepticism in cloud security and emerging costs. Two out of three rated the given security aspects related to data breaches or service unavailability as risk or high risk, and about half of the survey respondents make their willingness to pay for cloud services dependent on the emerging costs. These results indicate that besides the interest in cloud-based services, a lack of trust and skills currently hinder adoption of cloud-based services (Singh and Chatterjee Citation2017). In order to improve adoption and acceptance and to build up overall trust in cloud-based services, user capacities need to be strengthened (Giuliani et al. Citation2019; European Commission Citation2020a). The challenge thereby is to achieve a balance between addressing current user needs and, at the same time, building up capacities for the new generation of data systems, such as Destination Earth (DestinE). For this reason, training activities have to address current user needs and conceptualise how users are supposed to work with systems like DestinE and Big Earth data in the future. Computational notebooks, e.g. from the project Jupyter, have proven to support effectively reproducible data analysis, rapid prototyping and interactive training that appeals to a wide variety of users with their existing skills literacy (Abernathey Citation2018; Barba et al. Citation2019; Giuliani et al. Citation2019; Kim and Henke Citation2021). A JupyterLab / Jupyterhub interface deployed in the cloud facilitates data access and processing, with users hardly noticing the cloud environment (Perkel Citation2018, Citation2021). The importance for survey respondents to combine their own data sets with datasets on the cloud and a preference to work collaboratively on the cloud supports the urge for stronger efforts in the interoperability of cloud services, as the authors already argued for in (Wagemann et al. Citation2021). However, progress in interoperability and reproducibility between data services is slow. Besides technical aspects, an organisational and cultural change is required to fully embrace interoperability and reproducibility between (cloud-based) data services (Craglia and Nativi Citation2018).

5. Conclusion

Results from our web-based survey among Big Earth data users demonstrate a general interest and motivation to use cloud-based data services in the future. At the same time, results air regional differences between Europe and the USA & Canada, an insufficient literacy in cloud-based services and a lack of trust in terms of security and emerging costs. In order to align the current state of how users access, process, and analyse Big Earth data with the ambitious roadmaps large data organisations lay out for the next ten years, the authors recommend the following actions to cloud service providers, data organisations, and funding bodies:

  • Tailor services to intermediate users: Community and specialised cloud services often target policy- and decision-makers as end-users, resulting in the evolution of user-centric over-engineered data services trying to hide any technical complexity. Instead of targeting policy- and decision-makers, the real users of cloud-based services will be researchers, data scientists or domain experts, who have the necessary knowledge and skills to gain insights and information from the increasing availability of Big Earth data.. These insights are of great value for policy- and decision-makers to make evidence-based decisions.

  • Develop cloud certification standards: a lack of trust in terms of emerging costs and data security hinders a broad adoption of using cloud-based services. A trustable cloud certification with a quality assignment to cloud-based services could help to build up overall trust and to increase cloud uptake among users.

  • Invest in capacity-building: more substantial and more tailored capacity-building efforts are required to close the existing (cloud) skills gap. Training activities should follow the principles of reproducibility, collaboration, and open science with the help of interactive tools, such as computational notebooks. In addition, a modular approach with a combination of self-paced and online learning activities will help to accommodate different skills levels and learning preferences.

Ethics approval

The survey was accompanied with an Ethics statement compliant to the EU’s 2016 General Data Protection Regulation (GDPR). The authors did not seek additional ethics approval by an ethics committee, as no personalised data were collected, and all questions were optional. The data is being made available only in anonymised form. Survey participants were informed that the information might be used as anonymised collections of data in scientific publications or presentations.

Supplemental material

Supplementary_Material

Download MS Word (15.5 KB)

Acknowledgements

The authors would like to thank all participants of the web-based survey for their time and interest.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The questionnaire and an anonymised version of the survey responses are available on Zenodo under the DOI 10.5281/zenodo.4075058.

References

  • Abele-Brehm, A. E., M. Gollwitzer, U. Steinberg, and F. D. Schönbrodt. 2019. “Attitudes Toward Open Science and Public Data Sharing: A Survey Among Members of the German Psychological Society.” Social Psychology 50 (4): 252–260.
  • Abernathey, R. 2018. Step-by-Step Guide to Building a Big Data Portal.
  • Barba, L., L. J. Barker, D. S. Blank, J. Brown, A. B. Downey, T. George, L. J. Heagy, et al. 2019. Teaching and Learning with Jupyter. https://jupyter4edu.github.io/jupyter-edu-book/.
  • Bauer, P., B. Stevens, and W. Hazeleger. 2021. “A Digital Twin of Earth for the Green Transition.” Nature Climate Change 11: 80–83.
  • Bodlos, A., L. Hönegger, L. Kaczmirek, V. Beckmann, V. Breton, G. Romier, J. van Wezel, et al. 2020. EOSC-Pillar D3.1 Summary report of the EOSC-Pillar National Initiatives Survey.
  • Camara, G., L. F. Assis, G. Ribeiro, K. R. Ferreira, E. Llapa, and L. Vinhas. 2016. “Big Earth Observation Data Analytics: Matching Requirements to System Architectures”. In: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data - BigSpatial ‘16. Presented at the the 5th ACM SIGSPATIAL International Workshop, Burlingame, California: ACM Press, 1–6.
  • Carter, S., J. Verbesselt, A. Dostalova, W. Wagner, M. Schramm, and M. Mohr. 2018. openEO D08: Preliminary User Requirements report.
  • Chasmer, L. E., R. A. Ryerson, and C. A. Coburn. 2021. “Educating the Next Generation of Remote Sensing Specialists: Skills and Industry Needs in a Changing World.” Canadian Journal of Remote Sensing (AHEAD-OF-PRINT): 1–16. https://www.tandfonline.com/doi/epub/https://doi.org/10.1080/07038992.2021.1925531?needAccess=true.
  • Committee on Earth Observation Satellites. 2019. CEOS Analysis Ready Data Strategy - October 2019. Strategy paper No. 1.0.
  • Craglia, M., and S. Nativi. 2018. “Mind the Gap: Big Data vs. Interoperability and Reproducibility of Science.” In Earth Observation Open Science and Innovation, edited by P.-P. Mathieu, and C. Aubrecht, 121–141. Cham: Springer International Publishing.
  • ECMWF. 2018. Copernicus releases DIAS data access platforms on anniversary.
  • ECMWF. 2021. ECMWF Roadmap to 2030.
  • European Commission. 2011. Attitude on Data Protection and Electronic Identity in the European Union. Special Europbarometer No. 359.
  • European Commission. 2019. Copernicus Market Report - February 2019.
  • European Commission. 2020a. A European strategy for data.
  • European Commission. 2020b. Shaping Europe’s digital future: Destination Earth (DestinE).
  • European Commission. 2021. EU Survey [online]. https://ec.europa.eu/eusurvey/ [Accessed 16 May 2021].
  • Giuliani, G., G. Camara, B. Killough, and S. Minchin. 2019. “Earth Observation Open Science: Enhancing Reproducible Science Using Data Cubes.” Data 4 (4): 147.
  • Gomes, V., G. Queiroz, and K. Ferreira. 2020. “An Overview of Platforms for Big Earth Observation Data Management and Analysis.” Remote Sensing 12 (8): 1253.
  • Gorelick, N., M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore. 2017. “Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone.” Remote Sensing of Environment 202: 18–27.
  • Huadong, G. 2018. “Steps to the Digital Silk Road.” Nature 554 (7690): 25–27.
  • Killough, B. 2018. Overview of the Open Data Cube Initiative. In: IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium. Presented at the IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia: IEEE, 8629–8632.
  • Kim, B., and G. Henke. 2021. “Easy-to-Use Cloud Computing for Teaching Data Science.” Journal of Statistics and Data Science Education 29 (sup1): S103–S111.
  • Kundra, V. 2011. Federal Cloud Computing Strategy.
  • NASA. 2019. EOSDIS Data in the Cloud: User requirements.
  • National Aeronautics and Space Administration. 2020. Earthdata Cloud Evolution.
  • Ogunlolu, I., and D. Rajanen. 2019. Cloud Computing Adoption in Organizations: A Literature Review and a Unifying Model 2019: 1–12.
  • Overpeck, J. T., G. A. Meehl, S. Bony, and D. R. Easterling. 2011. “Climate Data Challenges in the 21st Century.” Science 331 (6018): 700–702.
  • Pappenberger, F., and M. Palkovic. 2020. Progress towards a European Weather Cloud, (Number 165).
  • Pearson, S., and G. Yee. 2013. Privacy and Security for Cloud Computing. London: Springer.
  • Pebesma, E., W. Wagner, M. Schramm, A. Von Beringe, C. Paulik, M. Neteler, J. Reiche, et al. 2017. OpenEO - a Common, Open Source Interface Between Earth Observation Data Infrastructures and Front-End Applications.
  • Perkel, J. M. 2018. “Why Jupyter is Data Scientists’ Computational Notebook of Choice.” Nature 563 (7729): 145–146.
  • Perkel, J. M. 2021. “Ten Computer Codes That Transformed Science.” Nature 589 (7842): 344–348.
  • Polyviou, A., and N. Pouloudi. 2015. Understanding Cloud Adoption Decisions in the Public Sector. In: 2015 48th Hawaii International Conference on System Sciences. Presented at the 2015 48th Hawaii International Conference on System Sciences (HICSS), HI, USA: IEEE, 2085–2094.
  • Ramachandran, R., C. Lynnes, K. Baynes, K. Murphy, J. Baker, J. Kinney, A. Gold, et al. 2018. “Recommendations to Improve Downloads of Large Earth Observation Data.” Data Science Journal 17: 2.
  • R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
  • Scherp, G., D. Siegfried, K. Biesenbender, and C. Breuer. 2020. The Role of Open Science in Economics. Results from an Online Survey among Researchers in Economics at German Higher Eduation Institutions in 2019. Hamburg: Kiel.
  • Schramm, M., E. Pebesma, M. Milenković, L. Foresta, J. Dries, A. Jacob, W. Wagner, et al. 2021. “The openEO API–Harmonising the Use of Earth Observation Cloud Services Using Virtual Data Cube Functionalities.” Remote Sensing 13 (6): 1125.
  • Singh, A., and K. Chatterjee. 2017. “Cloud Security Issues and Challenges: A Survey.” Journal of Network and Computer Applications 79: 88–115.
  • Sudmanns, M., D. Tiede, S. Lang, H. Bergstedt, G. Trost, H. Augustin, A. Baraldi, and T. Blaschke. 2019. “Big Earth Data: Disruptive Changes in Earth Observation Data Management and Analysis?” International Journal of Digital Earth 13 (7): 832–850.
  • Voosen, P. 2020. “Europe Builds 'Digital Twin' of Earth to Hone Climate Forecasts.” Science 370 (6512): 16–17.
  • Wagemann, J., S. Siemen, B. Seeger, and J. Bendix. 2020. User Requirements of Big Earth Data - Survey 2019.
  • Wagemann, J., S. Siemen, B. Seeger, and J. Bendix. 2021. “Users of Open big Earth Data – An Analysis of the Current State.” Computers & Geosciences 104916.
  • Wickham, H. 2016. ggplot2: Elegant Graphics for Data Analysis. 2nd ed. 2016. Cham: Springer International Publishing.
  • Yang, C., M. Yu, Y. Li, F. Hu, Y. Jiang, Q. Liu, D. Sha, M. Xu, and J. Gu. 2019. “Big Earth Data Analytics: a Survey.” Big Earth Data 3 (2): 83–107.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.