6,597
Views
48
CrossRef citations to date
0
Altmetric
Review Article

Big Earth data analytics: a survey

, , , , , , , & show all
Pages 83-107 | Received 27 Feb 2019, Accepted 18 Apr 2019, Published online: 17 May 2019

ABSTRACT

Big Earth data are produced from satellite observations, Internet-of-Things, model simulations, and other sources. The data embed unprecedented insights and spatiotemporal stamps of relevant Earth phenomena for improving our understanding, responding, and addressing challenges of Earth sciences and applications. In the past years, new technologies (such as cloud computing, big data and artificial intelligence) have gained momentum in addressing the challenges of using big Earth data for scientific studies and geospatial applications historically intractable. This paper reviews the big Earth data analytics from several aspects to capture the latest advancements in this fast-growing domain. We first introduce the concepts of big Earth data. The architecture, various functionalities, and supporting modules are then reviewed from a generic methodology aspect. Analytical methods supporting the functionalities are surveyed and analyzed in the context of different tools. The driven questions are exemplified through cutting-edge Earth science researches and applications. A list of challenges and opportunities are proposed for different stakeholders to collaboratively advance big Earth data analytics in the near future.

1. Introduction

The 21st century has witnessed the increasing availability of information about the Earth surface, atmosphere, ocean, solid earth and beyond through cutting-edge technologies, including new satellite observations, in-situ sensors, Internet of Things, and social sensing contributed by human beings (NASA, Citation2017a). The massive amount of spatiotemporal data obtained can be used to understand our Earth system as a whole for addressing scientific challenges: such as climate change and global warming (Faghmous & Kumar, Citation2014), increasing intensity and frequent of disasters and better preparedness for dealing with them (Yu, Huang, Qin, Scheele, & Yang, Citation2019; Yu, Yang, & Li, Citation2018), ecological change and reduction of impacts (e.g. invasive species, Jeltsch et al., Citation2013). The data have been used for dealing with the application challenges such as urban and land planning for sustainable development (Yu, Liu, Wu, Hu, & Zhang, Citation2010), heat island and transportation congestions in urban areas (Jenerette et al., Citation2016), and (food) security concerns and preparedness for vulnerable populations (Xu et al., Citation2017). To address these challenges, a lifecycle of data processing from collection to visualization is being developed; a workforce is being trained to better handle them for efficient processing; and relevant policies are being put in place to facilitate the open, sharing, and transformation of data into actionable knowledge for better decision support (NASA, Citation2017a). Big data was introduced to capture the characteristics, processing, workforce, policies, challenges, and physical-social environment of handling a large amount of data about our home planet. Characterized by the five Vs (https://bigdatawg.nist.gov/): volume, velocity, variety, veracity and value, big data intrinsically reflects the nature of Earth data.

Various research and development efforts have been made to tackle the challenges brought by storing, transmitting, processing, analyzing, managing, and sharing big Earth data. As one of the most prominent systems, for example, Google Earth Engine is a cloud-based platform for planetary-scale geospatial analysis supported by Google’s cloud infrastructure (Gorelick et al., Citation2017). ArcGIS GeoAnalytics Server was released in 2016 to allow users to perform feature analysis using distributed computing (Wright et al., Citation2016). The Science Data Analytics Platform (SDAP), originally funded by National Aeronautics and Space Administration (NASA), was recently launched as an Apache Incubator project, the goal of which is to accelerate the study of the Earth’s physical oceanography (Apache, Citation2017). On top of SDAP, many applications have been developed including the State of the Ocean (SOTO) (PO.DAAC, Citation2017) and the Sea Level Portal (NASA, Citation2017b). The planetary defense framework gateway is developed to support the decision-making on mitigating Near-Earth-Object’s impact on the Earth (Yang et al., Citation2017b). The Chinese Academy of Science also funded a project to integrate Earth science data for addressing Earth science challenges and grand applications (Guo, Citation2019). In the public health domain, a big Earth data analytics framework is proposed to make better-informed health-related decisions by integrating big Earth data and health data (Raghupathi & Raghupathi, Citation2014). Focused on social media data, the Harvard Center for Geographic Analysis (CGA) developed a big spatiotemporal data visualization platform, the Billion Object Platform, with the purpose of lowering barriers for scholars who wish to access large, streaming, spatiotemporal data about the Earth surface (Kakkar, Lewis, Smiley, & Nunez, Citation2017).

With sizes, locations, algorithms, and designs of data vary in different domains, questions were raised on how the Earth science and application are and will be impacted. Specifically, how big Earth data and its analytics advance our understanding of the Earth system? How can we utilize our understanding of the Earth system (physics of natural phenomenon) to coordinate observations and models to target specific domains (such as climate, natural resource, environment, agriculture, security) for timely data? How can we adopt machine learning and/or artificial intelligence to analyze the data for Earth science and applications? How do scientists think their models do, be able to leverage, and get quality models out of the data and analytics for better prediction of the future state for complex systems? How can we develop and sustain social engineering to facilitate stakeholders to collaborate on big Earth data analytics for real science? To address these challenges, initiatives, such as Earthdata cloud 2021 (Ramachandran, Lynnes, Bingham, & Quam, Citation2018) and the pre-guide key national project of Earth science big data (Guo, Citation2019), were established to leverage the advancements in big data, computing infrastructure, analytical tools from both commercial and academic domains. A key to these efforts is the strategies to analyze the big Earth data to facilitate answering those scientific and application questions.

To better capture the recent advancements of big Earth data analytics and enlighten the future research directions, we conducted this survey for the landscape of big Earth data analytics.

2. Architecture

Each Earth data system has its own architecture and system approach. A generic multidimensional architecture supporting the lifecycle of transforming data to knowledge is illustrated in . This generic architecture contains the technological aspect in the front face, science and application domains on the top face (detailed in section 4), and stakeholders participation on the right face (detailed in section 5). The technological aspect (detailed in section 3) including infrastructural supports for processing, data stores for archive and access, data analytical methods for information extraction and knowledge generation, interfaces for user interaction.

Figure 1. A generic system architecture of big Earth data analytics.

Figure 1. A generic system architecture of big Earth data analytics.

2.1. Infrastructural support

Most big Earth data analytical systems have already or are being migrated to a cloud computing environment for rapid prototyping, result sharing, and reproducible research (Peng, Citation2011). Some choose the private cloud as it allows for full control (Doelitzscher, Sulistio, Reich, Kuijs, & Wolf, Citation2011), but most adopt the public cloud where a third-party cloud provider performs the updates and maintenance of computing resources (Varia & Mathew, Citation2014). For example, Mapbox uses Landsat on Amazon Web Services to power Landsat-live, a browser-based map that is constantly refreshed with the latest imagery from the Landsat 8 satellite (Yang, Yu, Hu, Jiang, & Li, Citation2017a). An emerging trend is to use a hybrid cloud, a combination of these two paradigms that inherits the advantages of both to put the sensitive data/systems in a private cloud while supplying a service to the public cloud for public service (Jin et al., Citation2017).

Cloud computing can support sustainable archive, access to different computing node types, virtual desktops, and collaboration on data analytics. But for large scale, tightly coupled big data analytics or modeling, high-performance computing is still the solution for modeling, colocation of computing and data, data assimilation and inverse problems (Huang et al., Citation2013). For example, NASA has been planning to go up to support 1.6 Exabytes data with a 0.75 km resolution and global coverage for climate data (Lee, Citation2018). This means to integrate datasets from global Goddard Earth Observing System Model (GEOS), Global Modeling and Assimilation Office (GMAO), and other sources with sufficient computing and storage capacity to a) provide data/analytical/knowledge services, b) support artificial intelligence/machine learning/deep learning for inference, and c) engage PB level data to support comprehensive analytics and data fusion.

Graphics processing units (GPU) computing has boosted the simulation and analytics of Earth and space phenomena demonstrating significant speedups than conventional central computer processors (Madhukar, Citation2019). For example, the calculation of aerosol optical depth from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data using GPU can be 43 times faster than the one using central processing units (Liu et al., Citation2016). Numerical simulation can also be accelerated using GPU computing. The large-scale simulation of seismic wave propagation on GPU was 45-fold faster than CPU whilst maintaining a precise accuracy (Okamoto, Takenaka, Nakamura, & Aoki, Citation2013).

Recent computing advancements also distribute some computing tasks to the edge of the infrastructure, for example, the smart things at the edge of the Internet of Things, and the mobile devices of mobile Internet. They are termed as mobile computing and edge computing to conduct early processing or preprocessing of data collected at the sensor side and to provide end visualization and facilitate user interaction.

While the computing infrastructure powers big data analytics, network and security infrastructure as well as monitoring, scheduling, managing, and integration infrastructure enables the computing and analytics to be operated in a smooth, dynamic, safe, and easy-to-use fashion.

2.2. Data sources, ingestion, and store

Another important module of the system architecture is the data store, which is responsible for archive and access to Earth data archived. Traditionally, Earth science data can be categorized into the atmosphere, ocean, land, hydrology, and socio-economic data according to their disciplines (Acker & Leptoukh, Citation2007). New data sources in the Big Data era are expanded to real-time location tracking, observations of the urban environment, and social media data from citizens (Mayer-Schönberger & Cukier, Citation2013).

Depending on the nature and usage of Earth data, they are traditionally stored in a file system, relational, or No-SQL database. For example, real-time location tracking data are usually stored in a Relational Database Management System (RDBMS) (Tian, Jiang, Chen, Li, & Mu, Citation2014). Several efforts have been made to store geospatial coverages when structured as arrays with an array-based database as the coverages are not well suited to traditional RDBMSs (Baumann, Citation2014). Indices, such as the latest spatiotemporal index (Li et al., Citation2017a), are built on top of the data to accelerate the data access. Some have attempted publishing data in the form of “Linked Data” with the linking technologies provided by the Web to enable the virtual integration of a globally distributed database (Goodwin, Dolbear, & Hart, Citation2008).

The access to data store includes generally two methods: real-time import and batch import. Real-time processing requires continual input, constant processing, and steady output of data, while the batch import is less time-sensitive (Marz & Warren, Citation2015). A good example is an Earth data search engine where a Web crawler constantly collects data over the Internet and stores data into a pre-defined location (Bambacus et al., Citation2017). The data storage supports access to different formats through interoperable APIs (Application Programming Interface) and conducts preprocessing including re-projection, fusion, upscaling/downscaling, and data fusion.

2.3. Data discovery and analytics

As a prior step to performing any data analytical tasks, traditional data discovery relies on open source technologies such as Solr and Elasticsearch (Nogueras-Iso, Zarazaga-Soria, Béjar, Álvarez, & Muro-Medrano, Citation2005). Metadata of these data are often stored in a full-text search engine (e.g. Apache Lucene) (Jiang, Yang, Xia, & Liu, Citation2016), which can be searched like a google search engine. Recent endeavors started to integrate smart capabilities, e.g. query understanding, ranking, and recommendation, based on artificial intelligence advancements (Jiang et al., Citation2018, Citation2017; Li, Goodchild, & Raskin, Citation2014; Wiegand & García, Citation2007). Common Earth data analytical functions range in complexity from simple numerical functions to raster and vector operations, visualization and exploration, and machine learning. More details of analytical functions will be reviewed in the next session.

Distributed computing technologies are widely adopted across different existing systems (Agrawal, Das, & El Abbadi, Citation2011) for big data analytics. Apache Spark and Hadoop MapReduce are two typical open source distributed solutions for big data analytics. The former is usually much faster as the latter reads and writes from disk more often (Zaharia et al., Citation2012). For example, Li et al. (Citation2016) proposed a workflow to accelerate the Weblog mining process using Spark.

2.4. Interfaces

The top layer is the interface of the big Earth data analytical systems to support various types of end users: scientists, students, engineers, and decision makers (Nativi et al., Citation2011). Considering their different backgrounds, multiple interfaces (i.e. client libraries, code editor, Web portal) have been designed to allow various users to interact with a variety of system functions such as search, access, analytics, and visualization. For example, Google Earth Engine’s services can be accessed and controlled through an Internet-accessible API and an associated web-based interactive development environment that enables rapid prototyping and visualization of results (Gorelick et al., Citation2017). ArcGIS API for Python, in contrast to arcpy, serves an analytical library for working with geospatial data and analysis algorithms, powered by web GIS. A new form of service is to provide the climate modeling process as a computational service (Li et al., Citation2017b). Web-based programmable interaction is also becoming a norm (Li et al., Citation2016).

To present a comprehensive survey, the data analytics is detailed in Section 3. The science and application aspect is detailed in Section 4. Section 5 describes the stakeholders and their engagement in the challenges and opportunities of advancing big Earth data analytics.

3. Big Earth data analytics

Big Earth data analytics include the analytical lifecycle of preparing, reducing, analyzing, mining, and visualizing large amounts of spatiotemporal and spectral data, encompassing a variety of data types (Kempler & Mathews, Citation2017). The volume, velocity, variety, and veracity in the acquired data pose grant challenges in data processing for value (Yang et al., Citation2017a). The analytical process enables the discovery of patterns, correlations, principles, knowledge and other information for better understanding our Earth system and responding to problems induced by global and regional changes (Bhattacharyya & Ivanova, Citation2017). The following sections summarize the literature from different aspects of big Earth data analytics.

3.1. Data preprocessing

Among the massive volume and rapidly streaming big Earth data, relevant data are examined and cleaned by preprocessing the raw data, which may be redundant, inconsistent, and noisy data (Bhattacharyya & Ivanova, Citation2017). As an important early phase, preprocessing consumes around 50–80% of the entire time for data analytics (Kempler & Mathews, Citation2017). Data preprocessing mainly focuses on extraction, transformation, quality evaluation, reduction, and augmentation (Theodorou, Jovanovic, Abelló, & Nakuçi, Citation2017; Wang, Hu, Sha, & Han, Citation2017). summarizes the reviewed preprocessing methods.

Table 1. Different forms of data preprocessing.

The extraction techniques (e.g. filtering, anomaly detection, ratios, and outlier removal) are applied to identify the data of interest. For example, the noise points should be removed from LiDAR data to improve data quality (Wang et al., Citation2017). Classification and rule learning can be used to identify and aggregate the datasets of interest from various large volumes of datasets (e.g. raster, vector, or social media) for the environment- and human dynamics-related studies (Hu et al., Citation2015; Zhang et al., Citation2017).

Data transformation methods (e.g. normalization, smoothing, interpolation, coordinate transformation, time series) are necessary to prepare data for further data analytics in terms of spatiotemporal resolution, data format, coordinate projection, and others (Friedman, Hastie, & Tibshirani, Citation2008; Jenerette et al., Citation2016; Li, Kamarianakis, Ouyang, Turner, & Brazel, Citation2017; Sugumaran, Burnett, & Blinkmann, Citation2012).

Quality evaluation will help inform the accuracy of the analytical results either by improving the data quality (e.g. bias correction) or choosing better data resources (e.g. sensitivity analysis and Taylor Diagram) (Chun & Guldmann, Citation2014; Taylor, Citation2001).

Data reduction seeks to reduce data redundancy and duplication while optimizing data storage. The reduction in big data at the early stage enhances the data management and data quality. Therefore, it improves the indexing, storage, analysis, and visualization operations of big data systems (Rehman et al., Citation2016). The methods for data reduction can be categorized into network theory, redundancy elimination, dimension reduction, and data mining. Methods chosen for reduction depend on the objective of specific Earth data analytics. For example, for the climate dynamics study, Donges, Zou, Marwan, and Kurths (Citation2009) constructed climate networks from the global climatological data set using the linear Pearson correlation coefficient and the nonlinear mutual information as a measure of dynamical similarity between regions. Stateczny and Wlodarczyk-Sielicka (Citation2014) utilized artificial neural networks to reduce big hydrographic data acquired from the deep seas. Lidar data can be processed by using clustering, noise detection, vertex decimation approaches, and feature selections to reduce the data size (Rehman et al., Citation2016).

Data augmentation is a common technique used in data mining. It refers to the creation of altered copies of each instance within a training dataset as there is usually a lack of data to train a model, especially to fine-tune the huge number of parameters in a deep neural network. In the context of satellite imagery processing, data sub-samples can be obtained by rotating the original image, changing lighting conditions, cropping differently, etc.

3.2. Data analytical methods

After preprocessing, the main focus of data analytics is to reveal hidden patterns, unknown correlations, and other useful information from a large volume of heterogeneous data to facilitate Earth science study. Big Earth data analytics support all aspects of Earth science research, such as hypothesis and data discovery-driven methods, dynamical models, and goal driven decisions (Kempler & Mathews, Citation2017). The involved methods can be categorized into model simulation and prediction, statistics, machine learning, and deep learning ().

Table 2. Big Earth data analytical methods.

3.2.1. Model simulation and prediction

Numerical models have long been used to simulate and predict the status of the Earth including the solid Earth, land surface, biosphere, atmosphere, and oceans within a certain period of time. With the increasing availability of Earth observation and sensing data, the predicting capabilities of the numerical models have been improved through assimilating observation with simulation, which significantly improved the simulation accuracy (Courtier et al., Citation1994). For example, air quality forecasting models predict the level of pollutant concentration in the atmosphere that is harmful to human health (Binkowski & Roselle, Citation2003). The general circulation models predict the global circulation of a planetary atmosphere and ocean to better understand the climate and project climate change. Extreme weather events can also be predicted using numerical simulation, such as tropical cyclone and wildfire (Shuman, Citation1989). Beyond physical processes, social processes occurring on the Earth can also be simulated and predicted using agent-based modeling or cellular automata leveraging big data collected through IoTs or social media. For example, transportation problems, such as travel demand and traffic congestion, can be predicted and resolved through agent-based modeling (Bernhardt, Citation2007).

3.2.2. Traditional statistical methods

Traditional statistical methods, usually based on certain assumptions, are widely used to discover the loose and complex relationships between variables and improve our understanding of the geographical distribution and frequency distribution of big Earth data (Borradaile, Citation2013). It uses quantitative methods to sample geospatial datasets, handle the orientation data, regionalized variables, study multivariate systems, and identify cyclicity and patterns (Cressie, Citation2015). Simple statistics (e.g. mean and variance) can characterize Earth data. The confidence range quantifies the confidence of the estimation/analytical results. Hypothesis testing provides ways to compare the distribution of two sample datasets. Various correlation, interpolation, and extrapolation methods enable the prediction from known values (Kalkhan, Citation2011). Regression methods (e.g. logistic regression, linear/non-linear, hierarchy regression) detect the underlying relationships in the Earth systems (Zhou et al., Citation2017). Clustering methods (e.g. DBSCAN, Density-based Clustering, WaveCluster) group similar spatial objects into classes, benefitting the identification of areas of similar land use in Earth observation data (Han et al., Citation2001). Classification methods (e.g. regression tree, autocorrelation, spectral schemes) identify pixel values from Earth observation into a particular category (e.g. land cover) (Getis, Citation1999). Statistical model reduction methods (e.g. Bayesian, approximation theory, error estimation, stochastic modeling, and Monte Carlo methods) reduce the computational complexity of mathematical models in numerical simulations (Lieberman et al., Citation2010). However, statisticians need to understand how the data is collected, statistical properties of the estimator (e.g. p-value), and the underlying data distribution (Alpaydin, Citation2014) to effectively apply statistics to Earth data.

3.2.3. Machine learning methods

Evolving from artificial intelligence, machine learning methods develop models that are based on characteristics and features learned from empirical data and can infer unknown problems and discover unknown patterns (Sellars et al., Citation2013). Machine learning methods generally have the advantage over traditional statistical methods in non-linear relationship understanding, and this advantage can be leveraged to model high-dimensional and non-linear data with complex interactions and missing values, which is particularly the case for big Earth data (Thessen, Citation2016). Derived from statistical methods, regression, classification, clustering can also be used as machine learning methods, thus the exact division between machine learning and statistical methods is not always clear. For example, Artificial Neural Networks can produce regression on approximating and predicting ecological conditions (Franceschini et al., Citation2019). Machine learning classifiers including Random Forest, Support Vector Machines, and Bayesian Classifiers can produce the probability of an observation belonging to a specific class of Earth process, such as landslide (Hong et al., Citation2016). Clustering can group observations based on similarity, which is useful in detecting rare events such as fire (Chakraborty & Paul, Citation2010; Khatami et al., Citation2017). Fuzzy inference and some tree-based machine learning methods (e.g. Decision Tree) can extract a set of rules from the observation to make predictions, such as forest cover and change (Sexton et al., 2016).

3.2.4. Deep learning methods

Deep learning methods, evolving from machine learning, offer unique capabilities in extracting and presenting features at different and detailed levels from the Earth data (Manning, Citation2015; LeCun, Bengio, & Hinton, Citation2015). These features and characteristics are extremely important in Earth data classification and segmentation tasks. Due to its more powerful expression and parameter optimization capability, deep learning has achieved great performance in computer vision, natural language processing, recommendation systems, and others (Collobert & Weston, Citation2008; Krizhevsky et al., Citation2012; Schmidhuber, Citation2015). For example, the deep convolutional neural networks (CNNs), e.g. AlexNet (Krizhevsky et al., Citation2012), VGGNet (Chatfield et al., Citation2014), and PlacesNet (CitationZhou et al., 2014), can perform satisfying results in classifying scenes from high resolution remote sensing imagery into categories such as airport, bridge, desert, forest, and so on. Beyond image classification, objects can be detected and segmented from Earth datasets using deep learning techniques (Cimpoi et al., Citation2015; Girshick et al., Citation2014). Deep learning methods can also help increase the computational efficiency of numerical simulations (e.g. weather prediction) whilst maintaining reasonable accuracy (Wang et al., Citation2018).

We selected popular tools to analyze how they support different big Earth data analytics and compared them () from aspects of scalability, analytical methods, programming languages, and graphical user interface (GUI).

Table 3. Selected big Earth data analytical tools.

4. Driving sciences and applications

The big Earth data analytics are critical to answer many scientific and application questions. These questions assist us in better understanding and determining the future directions of geosciences. The analytical functions (described in Section 3) provide the capability of extracting knowledge from the raw Earth data, and the knowledge can be used for decision-making and problem-solving. Based on the literature review, we summarized the domains that can directly benefit from big Earth data analytics and the best practices conducted (). The questions relate to global warming & climate change, natural resources & environment, precision agriculture & land evaluation, hazards & risk, security & defense, and public interests.

Table 4. Domains supported by big Earth data analytics and exemplary methods and best practices.

4.1. Global warming & climate change

Fundamentally, climate science is a field focused on studying large-scale changes in the land, atmosphere, oceans, and cryosphere over long temporal periods (years, decades, centuries) (Faghmous & Kumar, Citation2014). A variety of Earth science data is needed to investigate the response of those features to the climate changes (Guo et al., Citation2015). With the development of Earth observation technology, a large amount of scientific big data have been generated through various aspects of Earth observation, geophysics, geochemistry, geological surveying, and ground sensor networks (Guo et al., Citation2017).

Utilizing multiple Earth observation platforms, comprehensive, precise, continuous, and various information can be provided to simultaneously and dynamically monitor Earth surface. The information is obtained from multiscale assimilations (De Lannoy et al., Citation2012), ground-based observation (Dorigo et al., Citation2015), and satellite observation (Liang et al., Citation2013). The multi-source technology offers higher precision, spatiotemporally stability, and extends the data dimension to monitor the dynamics of Earth surface (Asner et al., Citation2012), including drought (Zhang & Jia, Citation2013), water vapor (Liu et al., Citation2013), land surface temperature (Maimaitiyiming et al., Citation2014), and vegetation (Guay et al., Citation2014). These data can be used to analyze climate change factors and their spatiotemporal patterns (Faghmous & Kumar, Citation2014). Climate science focuses more on understanding natural phenomena systematically, not predicting a certain event, thus emphasizing on the explainable of the data analytical methods (Caldwell et al., Citation2014). Therefore, multi-source, high dimensional climate datasets are of critical needs to explain the spatiotemporal patterns and significant correlation between climate inducing factors.

4.2. Natural resources & environment

Natural resources have been over-exploited by human kind, causing loss and degradation of habitats and depleting biological diversity (Smil, Citation2013). Human beings, especially the marginalized and vulnerable communities, need to adapt to the rapidly changing environment and its corresponding adverse circumstance, leading to the attention of natural resource conservation and sustainable use of biological diversity (Collen et al., Citation2013). The capability to monitor the impact of biological diversity and global environmental change is crucial to designing effective adaptation and mitigation strategies to prevent further loss of natural resources (Pettorelli et al., Citation2014). This requires the scientific community to obtain datasets and assess the spatiotemporal changes in the distribution of atmospheric, ocean, and land surface conditions, and the distribution and function of the natural resource. Big Earth data are the source for mapping the distribution of natural resources, especially over large areas, including forest cover change (Hansen et al., Citation2013), vegetation cover (Karnieli et al., Citation2013), and biodiversity dynamics (Jeltsch et al., Citation2013; Kuenzer et al., Citation2014).

Environmental pollution requires big Earth data to monitor and assess in the long term. Satellite observations, for example, are used in the analysis of European nighttime lights over 15 years, showing complex patterns of light pollution (Bennie et al., Citation2014), provide insight into global long-term changes in air, water, and soil pollution (Fingas & Brown, Citation2014; Lehmann et al., Citation2015; Lin et al., Citation2015; Schmidt et al., Citation2015; Van Donkelaar et al., Citation2015).

4.3. Precision agriculture & land evaluation

Precision agriculture, defined as “a management strategy that uses information technology to bring data from multiple sources to bear on decisions associated with crop production” (National Research Council [NRC], Citation1997), requires crop information spanning different spectral, spatiotemporal resolutions to gather information on crop area, type, condition, calendar, and yield (Whitcraft et al., Citation2015). Crop information with high spatiotemporal resolutions is required for in-field monitoring (Kross et al., Citation2015). Unmanned Aerial Vehicles (UAV)-based remote sensing offers great possibilities to acquire in a fast and easy way to ingest field data for precision agriculture applications (Candiago et al., Citation2015). UAV platforms (multi-rotors, swinglet, model helicopters, etc.), coupled with imaging, ranging, and positioning sensors, are able to collect multispectral imagery at cm-level resolution and offer great possibilities in the precision farming domain (Bendig et al., Citation2012; Guo et al., Citation2012; Primicerio et al., Citation2012).

Land evaluation, defined as “the assessment of land performance considering the economics of the enterprises, the social consequences for the people of the area and the country concerned, and the consequences, beneficial or adverse, for the environment” (George, Citation2005), serves as an essential part of land use planning. EO has enabled the monitoring of land evaluation in a spatially and temporally continuous way with a global coverage by providing vegetation productivity and/or loss as proxies (de Jong, de Bruin, Schaepman, & Dent, Citation2011). With big Earth data increasingly available on public cloud platforms such as Google Earth Engine, land evaluation, such as agricultural suitability assessment, can be conducted globally online (Yalew, Van Griensven, & van der Zaag, Citation2016).

4.4. Hazards & risk

Satellite remote sensing technology provides a quantitative opportunity for pre-disaster detection and post-disaster damage assessment to assist response operations (Plank, Citation2014; Skakun et al., Citation2014; Yamazaki & Liu, Citation2016). Visual damage assessment can be utilized for qualitative confirming damage areas (Mas et al., Citation2015). Change detection using time series satellite imagery is widely used for post-disaster damage assessment (Pradhan et al., Citation2016; Skakun et al., Citation2014). The recent access to a wide range of software, very high-resolution satellite imagery, and active and passive sensors facilitate the collection of data and the analysis and mapping of disaster events within a few hours. The application of remote sensing in disaster management follows the trend towards higher resolution (Ehrlich, Kemper, Blaes, & Soille, Citation2013), multidimensional (Kostyuchenko, Citation2015), and multi-technique (Chini et al., Citation2013; Pradhan et al., Citation2016). UAV networks serve as the most efficient situational awareness with higher resolution and faster capture and processing time (Ezequiel et al., Citation2014; Kruijff et al., Citation2012; Murphy & Stover, Citation2008; Robinson & Lauf, Citation2013).

Passive crowdsourcing via social media has emerged as a tool to communicate information in times of emergency (Kaplan & Haenlein, Citation2010). Social media is increasingly used by both NGO’s and government emergency management agencies to determine public sentiment and reaction to an event (Kavanaugh et al., Citation2012). It is evident that the multidirectional flows of communication and information that crisis crowdsourcing online platforms facilitate can make response and recovery efforts more efficient and effective (Roche et al., Citation2013).

4.5. Security & defense

Big Earth data have been traditionally used for security and defense intelligence applications, through crisis management analysis for enhanced surveillance and proactive decision-making. Surveillance satellites, including IKONOS and GeoEye-1, provide high-resolution imagery for crisis monitoring, change detection, and critical location identification (NRC, Citation2013). The usage of Unmanned Aerial Systems (UAS) has demonstrated its value for emergency management by providing real-time, data-rich descriptions of the location and movement of a certain crisis or accident (McMullen et al., Citation2016). Interweaving remote sensing with social sensing constitutes a key advancement in space and security domain, where useful information can be derived not only from EO products but also from their combination with news articles and the user-generated content from social media (Xu et al., Citation2017). Moreover, surveillance videos provide the opportunity to discover valuable information and predict crime from high-velocity datasets. Intelligent systems that facilitate efficient management of surveillance videos have been developed (Xu et al., Citation2016). Distributed storage and computing, data retrieval of huge and heterogeneous data, provide multiple optimized strategies to enhance the utilization of resources and efficiency of tasks.

4.6. Public interests

Big Earth data connects tightly with government administration and people’s life. For example, NASA sets up a particular platform for civil Earth observations, Earth Observing System Data and Information System (EOSDIS). It aims to make federal civil Earth observing data more discoverable, usable, and accessible (NASA, Citation2017a). Small satellites are increasingly developed to monitor a specific region with high spatiotemporal resolutions for the purpose of daily life support, education, and communication of news and media (Helvajian & Janson, Citation2009).

5. Challenges and opportunities

Through the holistic big Earth data analytical review, it is found that there is broad and deep research on utilizing big Earth data analytics to drive understanding and utilizing the Earth system towards a Digital Earth vision for the next decades. Collaboration is a key to facilitate the moving from data to actionable knowledge by leveraging existing assets with incentives to facilitate sharing among big Earth data stakeholders. Existing challenges and opportunities including but not limited to data intensity for data scientists, analytical complexity for methodologists, regulation and cultural complexities for policymakers, system engineering for industry and engineers, and scientific challenges for scientists.

5.1. Data scientists

Big data handling methods are developed from different aspects and individual domains. How to integrate the methods to enable the handling of practical big Earth data in the volume of hundreds of Petabytes is still a grand challenge to the computing infrastructure, analytical methods, and user manipulation of the data and systems.

Knowledge-based smart analytics is becoming a requirement in integrated big Earth data analytics with knowledge derived from earth science theories, data collected, data and information usage. The facilitation of knowledge discovered to be used in the discovery, access, analytics, presentation also drives the adoption of information and knowledge extracted for decision support. Past years’ investigations have matured the process and demonstrated the early success of utilizing knowledge base and artificial intelligence methodologies to facilitate big Earth data analytics. This should be a focus of the coming years as stressed by NASA, Defense Advanced Research Projects Agency (DARPA), and other agencies or initiatives.

New analytical methods should be developed from the three different aspects of acquiring new data, the emerging new computing infrastructure, and most important the driving needs of sciences and application. New innovations from relevant domains should be leveraged with minimized concerns. Methods can be identified and analyzed from diagnostic analytics, descriptive analytics, predictive analytics, to prescriptive analytics.

5.2. Geoscientists

Driving science quests and application challenges are the keys to the advancement of big Earth data analytics as the broadness of Earth science and the integration of Earth data with other domain data could always produce new values for answering new science questions and addressing new application challenges. This aspect includes the engagement of stakeholders with a composed list of funded/planned/existing tools, projects. It would also be preferred to redefine the possibility of a new question to be answered or challenges to be addressed with new known capabilities.

Spatiotemporal understanding sets up the principle and foundation for the integration of big Earth data cross sensors, domains, applications, and usages. While it is pretty matured in the model and simulation analytics, it needs to be expanded and serve as a core for integrative big Earth data analytics from other aspects for integrated sciences. The spatiotemporal understanding also helps us understand the importance of healthiness & cost of the integrated system supporting big Earth data analytics.

New ideas, observations, simulations, needs, technologies, analytics, presentations are interweaved to drive the advancement of each other to support the overall advancement of big Earth data analytics in order to answer grand scientific questions and address bold geoscience engineering challenges.

5.3. Engineers

While a large number of advanced research projects have been funded to develop new tools and algorithms to facilitate new big Earth data analytics, the engineering process for fusing the technology developed into operations includes of much engineering investigation of adopting cloud capability, open source, machine learning, advanced search, cloud optimization, and data transformation. This is where industry can be best leveraged.

Grand system architecting is still in its infancy for dealing with integrated Earth system understanding and application development at the regional, national, and global scales but with applicability to our daily life or solve practical problems. New architecture methods should be adopted by leveraging existing assets, respecting the copyright of data/tools and other intellectual properties, curation of the lifecycle of data, analytics, and systems. Workflow management and on-demand analytics have the potential to leverage such assets with reusable premises, e.g., compatible cloud containers, workflows, container/image repository.

The pricing for big Earth data analytics should be built into the architecture systematically considering moving big data in/out the cloud, changing price with glacier and spot instances, doing analytics in the cloud, data-proximal archiving, moving computer next to big data, adopting cloud for analytics, maximum analytics capability at minimum cost, improving existing analytics & leverage external analytics capabilities from commercial and others. This aspect also requires monitoring the system and understanding the backbone of the complete architecture.

Everything-as-a-service should be considered for better architecture to easily leverage assets in a Plug-and-play fashion for the integration of data, algorithms, and interoperable apps in the cloud. Compliance and security for ensuring the compliance of the big Earth data system to relevant security measure is a challenge. For example, it took Amazon Web Service more than 5 years to get to FEDRAMP medium security and it is still tough to handle high security level in an operational public cloud. The interweave between compliance and the open, usability of big Earth data complicates the systematic process. For example, a large data system is experiencing tough security measures for uneven service and performance, significant interface coordination, limited on-demand, fragmentation, discipline-specific support and tools, various user supports, optimization for archive, search, and distribution.

Computing intensity is still a demand for processing intensive data, especially with complex process engaged, such as modeling and comprehensive analytics. This may be shown as supporting a diverse user community with analyses without egressing data, facilitating multi-data comparison and fusion, supporting batch interactive/steaming, providing cost constraints and cost-sharing, processing next to data, optimizing for multidisciplinary research, integrating data access and disciplinary support, bridging with commercial capability and multi-agency needs in a convenient way.

5.4. Policymakers

Big Earth data analytics are often of great practical importance to human society and policy change. The United Nations identified 17 sustainable development goals as the society’s most urgent needs and many among them are related to Earth science studies, including goals focused on water, energy, resilient infrastructure, and sustainable industrialization, safe and resilient cities and settlements, climate change, the ocean, and the land (United Nations [UN], Citation2015). To achieve these goals, different policy and management choices should be discussed and the consequences of these choices should be studied and predicted. Bearing with the new technologies and the increasing availability of new sensing data, policy solutions will be proposed and verified through standardization, progress monitoring, and new policy indicator integration to better understand the human-natural system and the probable consequences of different policy choices.

Another challenge for policymakers is the understandability and accessibility of scientific research outcomes. Open data, source, system, domain, and other aspects would help adopt Earth data for addressing challenges that are not tractable in the past. This requires both technology, policy/laws, culture, and other adoptions. On the technical side, standardized APIs, well-documented systems/source code, and interoperable applications are critical. Policy decisions should be made to facilitate the openness of data for different regions. Culture is a long-term challenge that needs to be addressed in a sustainable, smooth, but concrete-advancement fashion. This challenge urges Earth scientists to disseminate science results, as well as providing actionable, sustainability-relevant knowledge to policymakers.

In conclusion, big Earth data analytics is a fast-evolving domain and contributing directly to the implementation of many initiatives, such as digital Earth, digital china, smart cities, and the United Nation’s 17 sustainable domains. The comprehensive understanding of big Earth data analytics would help us better position a solution and research aspects. The concrete advancement of this domain requires the collaboration of various stakeholders to take the driven from science areas, leverage existing infrastructures, ingest existing and new datasets, and develop new analytical methods.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the National Science Foundation [OAC-1835507 and IIP-1841520].

Notes

References

  • Acker, J. G., & Leptoukh, G. (2007). Online analysis enhances use of NASA earth science data. Eos, Transactions American Geophysical Union, 88(2), 14–17.
  • Agrawal, D., Das, S., & El Abbadi, A. (2011, March). Big data and cloud computing: Current state and future opportunities. Proceedings of the 14th International Conference on Extending Database Technology (pp. 530–533). Uppsala, Sweden: ACM.
  • Ahmad, A., Paul, A., Rathore, M., & Chang, H. (2016). An efficient multidimensional big data fusion approach in machine-to-machine communication. ACM Transactions on Embedded Computing Systems (TECS), 15(2), 39.
  • Alpaydin, E. (2014). Introduction to machine learning. Cambridge, MA: MIT press.
  • Apache (2017). The Science Data Analytics Platform (SDAP) proposal [online]. Retrieved from https://wiki.apache.org/incubator/SDAPProposal
  • Asner, G. P., Knapp, D. E., Boardman, J., Green, R. O., Kennedy-Bowdoin, T., Eastwood, M., … Field, C. B. (2012). Carnegie Airborne observatory-2: Increasing science data dimensionality via high-fidelity multi-sensor fusion. Remote Sensing of Environment, 124, 454–465.
  • Bambacus, M., Yang, C. P., Leung, R. Y., Barbee, B., Nuth, J. A., Seery, B., … Xu, M. (2017). A Planetary Defense Gateway for Smart Discovery of relevant Information for Decision Support.
  • Batty, M. (2007). Cities and complexity: Understanding cities with cellular automata, agent-based models, and fractals. Cambridge, MA: The MIT press.
  • Baumann, P. (2014). Rasdaman: Array databases boost spatio-temporal analytics. Computing for Geospatial Research and Application (COM. Geo), 2014 Fifth International Conference (p. 54). Washington, DC.
  • Bendig, J., Bolten, A., & Bareth, G. (2012). Introducing a low-cost mini-UAV for thermal-and multispectral-imaging. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 39(B1), 345–349.
  • Bennie, J., Davies, T. W., Duffy, J. P., Inger, R., & Gaston, K. J. (2014). Contrasting trends in light pollution across Europe based on satellite observed night time lights. Scientific Reports, 4, 3789.
  • Bernhardt, K. (2007). Agent-based modeling in transportation. Artificial Intelligence in Transportation, 72(E-C113).
  • Bhattacharyya, S., & Ivanova, D. (2017). Scientific computing and big data analytics: Application in climate science. In S. Mazumder, R. S. Bhadoria & G. C. Deka (Eds.), Distributed computing in big data analytics (pp. 95–106). Cham: Springer.
  • Binkowski, F. S., & Roselle, S. J. (2003). Models‐3 Community Multiscale Air Quality (CMAQ) model aerosol component 1. Model description. Journal of Geophysical Research: Atmospheres, 108, D6.
  • Borradaile, G. J. (2013). Statistics of earth science data: Their distribution in time, space and orientation. Berlin, Germany: Springer Science & Business Media.
  • Caldwell, P. M., Bretherton, C. S., Zelinka, M. D., Klein, S. A., Santer, B. D., & Sanderson, B. M. (2014). Statistical significance of climate sensitivity predictors obtained by data mining. Geophysical Research Letters, 41(5), 1803–1808.
  • Camara, G., Assis, L. F., Ribeiro, G., Ferreira, K. R., Llapa, E., & Vinhas, L. (2016, October). Big earth observation data analytics: Matching requirements to system architectures. Proceedings of the 5th ACM SIGSPATIAL International Workshop on Analytics For Big Geospatial Data (pp. 1–6). Burlingname, CA: ACM.
  • Candiago, S., Remondino, F., De Giglio, M., Dubbini, M., & Gattelli, M. (2015). Evaluating multispectral images and vegetation indices for precision farming applications from UAV images. Remote Sensing, 7(4), 4026–4047.
  • Chakraborty, I., & Paul, T. K. (2010, June). A hybrid clustering algorithm for fire detection in video and analysis with color based thresholding method. In 2010 International Conference on Advances in Computer Engineering (pp. 277–280). Bangalore, India: IEEE.
  • Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014, September 1–5). Return of the devil in the details: Delving deep into convolutional nets. Proceedings of the British Machine Vision Conference, Nottingham, UK.
  • Chini, M., Piscini, A., Cinti, F. R., Amici, S., Nappi, R., & DeMartini, P. M. (2013). The 2011 Tohoku (Japan) Tsunami inundation and liquefaction investigated through optical, thermal, and SAR data. IEEE Geoscience and Remote Sensing Letters, 10(2), 347–351.
  • Chun, B., & Guldmann, J. M. (2014). Spatial statistical analysis and simulation of the urban heat island in high-density central cities. Landscape and Urban Planning, 125, 76–88.
  • Cimpoi, M., Maji, S., & Vedaldi, A. (2015, June 7–12). Deep filter banks for texture recognition and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA. (pp. 3828–3836).
  • Collen, B., Pettorelli, N., Baillie, J. E., & Durant, S. M. (Eds.) (2013). Biodiversity monitoring and conservation: Bridging the gap between global commitment and local action. Cambridge, UK: John Wiley & Sons, Wiley-Blackwell.
  • Collobert, R., & Weston, J. (2008, July). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (pp. 160–167). Helsinki, Finland: ACM.
  • Courtier, P., Thépaut, J. N., & Hollingsworth, A. (1994). A strategy for operational implementation of 4D‐Var, using an incremental approach. Quarterly Journal of the Royal Meteorological Society, 120(519), 1367–1387.
  • Cressie, N. (2015). Statistics for spatial data. Hoboken, NJ: John Wiley & Sons.
  • de Jong, R., de Bruin, S., Schaepman, M., & Dent, D. (2011). Quantitative mapping of global land degradation using Earth observations. International Journal of Remote Sensing, 32(21), 6823–6853.
  • De Lannoy, G. J., Reichle, R. H., Arsenault, K. R., Houser, P. R., Kumar, S., Verhoest, N. E., & Pauwels, V. R. (2012). Multiscale assimilation of advanced microwave scanning radiometer–EOS snow water equivalent and moderate resolution imaging spectroradiometer snow cover fraction observations in Northern Colorado. Water Resources Research, 48, 1.
  • Delen, D., Tomak, L., Topuz, K., & Eryarsoy, E. (2017). Investigating injury severity risk factors in automobile crashes with predictive analytics and sensitivity analysis methods. Journal of Transport & Health, 4, 118–131.
  • Doelitzscher, F., Sulistio, A., Reich, C., Kuijs, H., & Wolf, D. (2011). Private cloud for collaboration and e-Learning services: From IaaS to SaaS. Computing, 91(1), 23–42.
  • Donges, J. F., Zou, Y., Marwan, N., & Kurths, J. (2009). Complex networks in climate dynamics. The European Physical Journal-Special Topics, 174(1), 157–179.
  • Dorigo, W. A., Gruber, A., De Jeu, R. A. M., Wagner, W., Stacke, T., Loew, A., … Kidd, R. (2015). Evaluation of the ESA CCI soil moisture product using ground-based observations. Remote Sensing of Environment, 162, 380–395.
  • Ehrlich, D., Kemper, T., Blaes, X., & Soille, P. (2013). Extracting building stock information from optical satellite imagery for mapping earthquake exposure and its vulnerability. Natural Hazards, 68(1), 79–95.
  • Ezequiel, C. A. F., Cua, M., Libatique, N. C., Tangonan, G. L., Alampay, R., Labuguen, R. T., … Loreto, A. B. (2014, May). UAV aerial imaging applications for post-disaster assessment, environmental management and infrastructure development. Unmanned Aircraft Systems (ICUAS), 2014 International Conference (pp. 274–283). Orlando, FL: IEEE.
  • Faghmous, J. H., & Kumar, V. (2014). A big data guide to understanding climate change: The case for theory-guided data science. Big Data, 2(3), 155–163.
  • Fingas, M., & Brown, C. (2014). Review of oil spill remote sensing. Marine Pollution Bulletin, 83(1), 9–23.
  • Franceschini, S., Tancioni, L., Lorenzoni, M., Mattei, F., & Scardi, M. (2019). An ecologically constrained procedure for sensitivity analysis of artificial neural networks and other empirical models. PloS one, 14(1), e0211445.
  • Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441.
  • George, H. (2005). An overview of land evaluation and land use planning at FAO. Rome, Italy: FAO. FAO (ed.).
  • Getis, A. (1999). Spatial statistics. Geographical Information Systems, 1, 239–251.
  • Ghiringhelli, L. M., Carbogno, C., Levchenko, S., Mohamed, F., Huhs, G., Lüders, M., … Scheffler, M. (2017). Towards efficient data exchange and sharing for big-data driven materials science: Metadata and data formats. Npj Computational Materials, 3(1), 46.
  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014, June 23–28). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA. (pp. 580–587).
  • Goodwin, J., Dolbear, C., & Hart, G. (2008). Geographical linked data: The administrative geography of great britain on the semantic web. Transactions in GIS, 12(s1), 19–30.
  • Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27.
  • Guay, K. C., Beck, P. S., Berner, L. T., Goetz, S. J., Baccini, A., & Buermann, W. (2014). Vegetation productivity patterns at high northern latitudes: A multi‐sensor satellite data assessment. Global Change Biology, 20(10), 3147–3158.
  • Guo, H. (2019). Big earth data service sharing platform was issued by Chinese academy of sciences. Retrieved from http://news.sciencenet.cn/htmlnews/2019/1/422131.shtm
  • Guo, H., Liu, Z., Jiang, H., Wang, C., Liu, J., & Liang, D. (2017). Big earth data: A new challenge and opportunity for digital earth’s development. International Journal of Digital Earth, 10(1), 1–12.
  • Guo, L., Turner, A. G., & Highwood, E. J. (2015). Impacts of 20th century aerosol emissions on the South Asian monsoon in the CMIP5 models. Atmospheric Chemistry and Physics, 15(11), 6367–6378.
  • Guo, T., Kujirai, T., & Watanabe, T. (2012). Mapping crop status from an unmanned aerial vehicle for precision agriculture applications. In International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 39( B1), 25 August – 01 September, Melbourne, Australia.
  • Han, J., Kamber, M., & Tung, A. K. (2001). Spatial clustering methods in data mining. In H. J. Miller & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 188–217). Milton Park, Abingdon-on-Thames: Taylor & Francis.
  • Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A. A., Tyukavina, A., … Kommareddy, A. (2013). High-resolution global maps of 21st-century forest cover change. Science, 342(6160), 850–853.
  • Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7(2), 191–205.
  • Helvajian, H., & Janson, S. (2009). Small satellites: Past, present, and future. Reston, VA: American Institute of Aeronautics and Astronautics, Inc.
  • Hong, H., Pradhan, B., Jebur, M. N., Bui, D. T., Xu, C., & Akgun, A. (2016). Spatial prediction of landslide hazard at the Luxi area (China) using support vector machines. Environmental Earth Sciences, 75(1), 40.
  • Hu, H., Wen, Y., Chua, T. S., & Li, X. (2014). Toward scalable systems for big data analytics: A technology tutorial. IEEE Access, 2, 652–687.
  • Hu, Y., Gao, S., Janowicz, K., Yu, B., Li, W., & Prasad, S. (2015). Extracting and understanding urban areas of interest using geotagged photos. Computers, Environment and Urban Systems, 54, 240–254.
  • Huang, Q., Yang, C., Liu, K., Xia, J., Xu, C., Li, J., … Li, Z. (2013). Evaluating open-source cloud computing solutions for geosciences. Computers & Geosciences, 59, 41–52.
  • Jeltsch, F., Bonte, D., Pe’er, G., Reineking, B., Leimgruber, P., Balkenhol, N., … Zurell, D. (2013). Integrating movement ecology with biodiversity research-exploring new avenues to address spatiotemporal biodiversity dynamics. Movement Ecology, 1(1), 6.
  • Jenerette, G. D., Harlan, S. L., Buyantuev, A., Stefanov, W. L., Declet-Barreto, J., Ruddell, B. L., … Li, X. (2016). Micro-scale urban surface temperatures are related to land-cover features and residential heat related health impacts in Phoenix, AZ USA. Landscape Ecology, 31(4), 745–760.
  • Jiang, Y., Li, Y., Yang, C., Hu, F., Armstrong, E. M., Huang, T., … Finch, C. J. (2018). Towards intelligent geospatial data discovery: A machine learning framework for search ranking. International Journal of Digital Earth, 11(9), 956–971.
  • Jiang, Y., Li, Y., Yang, C., Liu, K., Armstrong, E. M., Huang, T., … Finch, C. J. (2017). A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example. International Journal of Geographical Information Science, 31(11), 2310–2328.
  • Jiang, Y., Yang, C., Xia, J., & Liu, K. (2016). Polar CI portal: A cloud-based polar resource discovery engine. In T. C. Vance, N. Merati, C. Yang & M. Yuan (Eds.), Cloud computing in ocean and atmospheric sciences (pp. 163–185). Cambridge, MA: Academic Press.
  • Jin, B., Song, W., Zhao, K., Wei, X., Hu, F., & Jiang, Y. (2017). A high performance, spatiotemporal statistical analysis system based on a Spatiotemporal Cloud Platform. ISPRS International Journal of Geo-Information, 6(6), 165.
  • Kakkar, D., Lewis, B., Smiley, D., & Nunez, A. (2017). The billion object platform (bop): A system to lower barriers to support big, streaming, spatio-temporal data sources. In Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings (Vol. 17, No. 1, p. 15). Boston, MA.
  • Kalkhan, M. A. (2011). Spatial statistics: Geospatial information modeling and thematic mapping. Boca Raton, FL: CRC Press.
  • Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of social media. Business Horizons, 53(1), 59–68.
  • Karnieli, A., Bayarjargal, Y., Bayasgalan, M., Mandakh, B., Dugarjav, C., Burgheimer, J., … Gunin, P. D. (2013). Do vegetation indices provide a reliable indication of vegetation degradation? A case study in the Mongolian pastures. International Journal of Remote Sensing, 34(17), 6243–6262.
  • Kavanaugh, A. L., Fox, E. A., Sheetz, S. D., Yang, S., Li, L. T., Shoemaker, D. J., … Xie, L. (2012). Social media use by government: From the routine to the critical. Government Information Quarterly, 29(4), 480–491.
  • Kempler, S., & Mathews, T. (2017). Earth science data analytics: Definitions, techniques and skills. Data Science Journal, 16.
  • Khatami, A., Mirghasemi, S., Khosravi, A., Lim, C. P., & Nahavandi, S. (2017). A new PSO-based approach to fire flame detection using K-Medoids clustering. Expert Systems with Applications, 68, 69–80.
  • Kostyuchenko, Y. V. (2015). Geostatistics and remote sensing for extremes forecasting and disaster risk multiscale analysis. In Numerical methods for reliability and safety assessment (pp. 439–458). Cham, Switzerland: Springer.
  • Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., & Poeppel, D. (2017). Neuroscience needs behavior: Correcting a reductionist bias. Neuron, 93(3), 480–490.
  • Kravchenko, Y. G., Tishaieva, A. M., & Perkhaliuk, R. I. (2018, May). The convolution neural network for automatic objects detection in earth satellite imagery. 17th International Conference on Geoinformatics-Theoretical and Applied Aspects. Kiev, Ukraine.
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the Twenty-Sixth Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA (pp. 1097–1105), 3–8 December.
  • Kross, A., McNairn, H., Lapen, D., Sunohara, M., & Champagne, C. (2015). Assessment of RapidEye vegetation indices for estimation of leaf area index and biomass in corn and soybean crops. International Journal of Applied Earth Observation and Geoinformation, 34, 235–248.
  • Kruijff, G. J. M., Pirri, F., Gianni, M., Papadakis, P., Pizzoli, M., Sinha, A., … Priori, F. (2012, November). Rescue robots at earthquake-hit Mirandola, Italy: A field report. In 2012 IEEE international symposium on safety, security, and rescue robotics (SSRR) (pp. 1–8). College Station, TX: IEEE.
  • Kuenzer, C., Ottinger, M., Wegmann, M., Guo, H., Wang, C., Zhang, J., … Wikelski, M. (2014). Earth observation satellite sensors for biodiversity monitoring: Potentials and bottlenecks. International Journal of Remote Sensing, 35(18), 6599–6647.
  • Kumara, B. T., Paik, I., Zhang, J., Siriweera, T. A. S., & Koswatte, K. R. (2015, June). Ontology-based workflow generation for intelligent big data analytics. In Web Services (ICWS), 2015 IEEE International Conference (pp. 495–502). New York, NY: IEEE.
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
  • Lee, T. D. (2018). NASA HPC for big earth data. Presentation at the NASA Big Earth Data workshop, Annapolis, MD, Mar. 3–5, 2018.
  • Lehmann, A., Giuliani, G., Mancosu, E., Abbaspour, K. C., Sözen, S., Gorgan, D., … Ray, N. (2015). Filling the gap between earth observation and policy making in the Black Sea catchment with enviroGRIDS. Environmental Science & Policy, 46, 1–12.
  • Li, J., Shum, H. P., Fu, X., Sexton, G., & Yang, L., (2016, July). Experience-based rule base generation and adaptation for fuzzy interpolation. In 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (pp. 102–109). Vancouver, BC, Canada: IEEE
  • Li, S., Dragicevic, S., Castro, F. A., Sester, M., Winter, S., Coltekin, A., … Cheng, T. (2016). Geospatial big data handling theory and methods: A review and research challenges. ISPRS Journal of Photogrammetry and Remote Sensing, 115, 119–133.
  • Li, W., Goodchild, M. F., & Raskin, R. (2014). Towards geospatial semantic search: Exploiting latent semantic relations in geospatial data. International Journal of Digital Earth, 7(1), 17–37.
  • Li, X., Kamarianakis, Y., Ouyang, Y., Turner, B. L., II, & Brazel, A. (2017). On the association between land system architecture and land surface temperatures: Evidence from a Desert Metropolis—Phoenix, Arizona, USA. Landscape and Urban Planning, 163, 107–120.
  • Li, Z., Hu, F., Schnase, J. L., Duffy, D. Q., Lee, T., Bowen, M. K., & Yang, C. (2017a). A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce. International Journal of Geographical Information Science, 31(1), 17–35.
  • Li, Z., Yang, C., Huang, Q., Liu, K., Sun, M., & Xia, J. (2017b). Building Model as a Service to support geosciences. Computers, Environment and Urban Systems, 61, 141–152.
  • Liang, S., Zhao, X., Liu, S., Yuan, W., Cheng, X., Xiao, Z., … Qu, Y. (2013). A long-term Global LAnd Surface Satellite (GLASS) data-set for environmental studies. International Journal of Digital Earth, 6(sup1), 5–33.
  • Lieberman, C., Willcox, K., & Ghattas, O. (2010). Parameter and state model reduction for large-scale statistical inverse problems. SIAM Journal on Scientific Computing, 32(5), 2523–2542.
  • Lin, C., Li, Y., Yuan, Z., Lau, A. K., Li, C., & Fung, J. C. (2015). Using satellite remote sensing data to estimate the high-resolution distribution of ground-level PM 2.5. Remote Sensing of Environment, 156, 117–128.
  • Liu, J., Feld, D., Xue, Y., Garcke, J., Soddemann, T., & Pan, P. (2016). An efficient geosciences workflow on multi-core processors and GPUs: A case study for aerosol optical depth retrieval from MODIS satellite data. International Journal of Digital Earth, 9(8), 748–765.
  • Liu, Z., Wong, M. S., Nichol, J., & Chan, P. W. (2013). A multi‐sensor study of water vapour from radiosonde, MODIS and AERONET: A case study of Hong Kong. International Journal of Climatology, 33(1), 109–120.
  • Madhukar, M. (2019). Earth science [Big] data analytics. In N. Dey, C. Bhatt & A. S. Ashour (Eds.), Big data for remote sensing: Visualization, analysis and interpretation (pp. 99–128). Cham: Springer.
  • Maimaitiyiming, M., Ghulam, A., Tiyip, T., Pla, F., Latorre-Carmona, P., Halik, Ü., … Caetano, M. (2014). Effects of green space spatial pattern on land surface temperature: Implications for sustainable urban planning and climate change adaptation. ISPRS Journal of Photogrammetry and Remote Sensing, 89, 59–66.
  • Manning, C. D. (2015). Computational linguistics and deep learning. Computational Linguistics, 41(4), 701–707. doi:10.1162/COLI_a_00239
  • Manogaran, G., Vijayakumar, V., Varatharajan, R., Kumar, P. M., Sundarasekar, R., & Hsu, C. H. (2018). Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering. Wireless Personal Communications, 102(3), 2099–2116.
  • Marz, N., & Warren, J. (2015). Big data: Principles and best practices of scalable realtime data systems. Shelter Island, NY: Manning Publications Co.
  • Mas, E., Bricker, J., Kure, S., Adriano, B., Yi, C., Suppasri, A., & Koshimura, S. (2015). Field survey report and satellite image interpretation of the 2013 Super Typhoon Haiyan in the Philippines. Natural Hazards & Earth System Sciences, 15(4).
  • Matott, L. S., Babendreier, J. E., & Purucker, S. T. (2009). Evaluating uncertainty in integrated environmental models: A review of concepts and tools. Water Resources Research, 45(6), 1–14.
  • Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Boston, MA: Mifflin Harcourt.
  • McMullen, S. A., McMullen, M. J., Forster, P., Ison, D., & Clark, P. J. (2016, September). Emergency management: Exploring hard and soft data fusion modeling with unmanned aerial systems and non-governmental human intelligence mediums. In Proceedings of SAI Intelligent Systems Conference (pp. 502–520). Springer, Cham.
  • Murphy, R. R., & Stover, S. (2008). Rescue robots for mudslides: A descriptive study of the 2005 La Conchita mudslide response. Journal of Field Robotics, 25(1‐2), 3–16.
  • NASA (2017a). An overview of EOSDIS EARTHDATA. Retrieved from https://earthdata.nasa.gov/about
  • NASA (2017b). Data analysis tool - beta version [online]. Retrieved from https://sealevel.nasa.gov/data/data-analysis-tool
  • National Research Council (NRC). (1997). Precision agriculture in the 21st century: Geospatial and information technologies in crop management. Washington, DC: The National Academies Press. doi:10.17226/5491
  • National Research Council (NRC). (2013). Future US workforce for geospatial intelligence. Washington, DC: National Academies Press.
  • Nativi, S., Khalsa, S., Domenico, B., Craglia, M., Pearlman, J., Mazzetti, P., & Rew, R. (2011). The brokering approach for earth science cyberinfrastructure. In NSF EarthCube white paper.  Retrieved from https://www.earthcube.org/document/2011/brokering-approach-earth-science-ci
  • Nelson, A. (2017). Crop pests: Crop-health survey aims to fill data gaps. Nature, 541(7638), 464.
  • Nogueras-Iso, J., Zarazaga-Soria, F. J., Béjar, R., Álvarez, P. J., & Muro-Medrano, P. R. (2005). OGC Catalog Services: A key element for the development of Spatial Data Infrastructures. Computers & Geosciences, 31(2), 199–209.
  • Okamoto, T., Takenaka, H., Nakamura, T., & Aoki, T. (2013). Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition. In D. A. Yuen, L. Wang, X. Chi, L. Johnsson, W. Ge & Y. Shi (Eds.), GPU solutions to multi-scale problems in science and engineering (pp. 375–389). Berlin, Heidelberg: Springer.
  • Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227.
  • Pettorelli, N., Laurance, W. F., O’Brien, T. G., Wegmann, M., Nagendra, H., & Turner, W. (2014). Satellite remote sensing for applied ecologists: Opportunities and challenges. Journal of Applied Ecology, 51(4), 839–848.
  • Plank, S. (2014). Rapid damage assessment by means of multi-temporal SAR—a comprehensive review and outlook to Sentinel-1. Remote Sensing, 6(6), 4870–4906.
  • PO.DAAC (2017). State of the Ocean (SOTO) [ online]. Retrieved from https://podaac.jpl.nasa.gov/node/450
  • Pradhan, B., Tehrany, M. S., & Jebur, M. N. (2016). A new semiautomated detection mapping of flood extent from TerraSAR-X satellite image using rule-based classification and taguchi optimization techniques. IEEE Transactions on Geoscience and Remote Sensing, 54(7), 4331–4342.
  • Primicerio, J., Di Gennaro, S. F., Fiorillo, E., Genesio, L., Lugato, E., Matese, A., & Vaccari, F. P. (2012). A flexible unmanned aerial vehicle for precision agriculture. Precision Agriculture, 13(4), 517–523.
  • Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems, 2(1), 3.
  • Ramachandran, R., Lynnes, C., Bingham, A. W., & Quam, B. M. (2018). Enabling analytics in the cloud for earth science data. [ online]. Retrieved from https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20180002954.pdf
  • Rehman, M. H., Liew, C. S., Abbas, A., Jayaraman, P. P., Wah, T. Y., & Khan, S. U. (2016). Big data reduction methods: A survey. Data Science and Engineering, 1(4), 265–284.
  • Robinson, W. H., & Lauf, A. P. (2013, January). Resilient and efficient MANET aerial communications for search and rescue applications. In Computing, Networking and Communications (ICNC), 2013 International Conference on (pp. 845–849). San Diego, CA: IEEE.
  • Roche, S., Propeck-Zimmermann, E., & Mericskay, B. (2013). GeoWeb and crisis management: Issues and perspectives of volunteered geographic information. GeoJournal, 78(1), 21–40.
  • Sayood, K. (2017). Introduction to data compression. Burlington, MA: Morgan Kaufmann.
  • Schlapfer, D., Nieke, J., & Itten, K. I. (2007). Spatial PSF nonuniformity effects in airborne pushbroom imaging spectrometry data. IEEE Transactions on Geoscience and Remote Sensing, 45(2), 458–468.
  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
  • Schmidt, A., Leadbetter, S., Theys, N., Carboni, E., Witham, C. S., Stevenson, J. A., … Delaney, L. (2015). Satellite detection, long‐range transport, and air quality impacts of volcanic sulfur dioxide from the 2014–2015 flood lava eruption at Bárðarbunga (Iceland). Journal of Geophysical Research: Atmospheres, 120(18), 9739–9757.
  • Sellars, S., Nguyen, P., Chu, W., Gao, X., Hsu, K. L., & Sorooshian, S. (2013). Computational earth science: Big data transformed into insight. Eos, Transactions American Geophysical Union, 94(32), 277–278.
  • Shuman, F. G. (1989). History of numerical weather prediction at the national meteorological center. Weather and Forecasting, 4(3), 286–296.
  • Singha, K., Guntukub, S. C., Thakura, A., & Hota, C. (2014). Big data analytics framework for peer-to-peer botnet detection using random forests. Information Sciences, 278, 488–497.
  • Skakun, S., Kussul, N., Shelestov, A., & Kussul, O. (2014). Flood hazard and flood risk assessment using a time series of satellite images: A case study in Namibia. Risk Analysis, 34(8), 1521–1537.
  • Smil, V. (2013). Harvesting the biosphere: How much we have taken from nature. Cambridge, MA: The MIT Press.
  • Stateczny, A., & Wlodarczyk-Sielicka, M. (2014, July). Self-organizing artificial neural networks into hydrographic big data reduction process. In International Conference on Rough Sets and Intelligent Systems Paradigms (pp. 335–342). Springer, Cham.
  • Sugumaran, R., Burnett, J., & Blinkmann, A. (2012, November). Big 3d spatial data processing using cloud computing environment. In Proceedings of the 1st ACM SIGSPATIAL international workshop on analytics for big geospatial data (pp. 20–22). Redondo Beach, CA: ACM.
  • Taylor, K. E. (2001). Summarizing multiple aspects of model performance in a single diagram. Journal of Geophysical Research: Atmospheres, 106(D7), 7183–7192.
  • Theodorou, V., Jovanovic, P., Abelló, A., & Nakuçi, E. (2017). Data generator for evaluating ETL process quality. Information Systems, 63, 80–100.
  • Thessen, A. (2016). Adoption of machine learning techniques in ecology and earth science. One Ecosystem, 1, e8621.
  • Tian, J., Jiang, Y., Chen, Y., Li, W., & Mu, N. (2014, June). Automated human mobility mode detection based on gps tracking data. In 2014 22nd International Conference on Geoinformatics (pp. 1–6). Kaohsiung, Taiwan, China: IEEE.
  • United Nations (UN) (2015). Sustainable development goals. Retrieved from https://www.undp.org/content/dam/undp/library/corporate/brochure/SDGs_Booklet_Web_En.pdf
  • Van Donkelaar, A., Martin, R. V., Brauer, M., & Boys, B. L. (2015). Use of satellite observations for long-term exposure assessment of global concentrations of fine particulate matter. Environmental Health Perspectives, 123(2), 135.
  • Varatharajan, R., Manogaran, G., & Priyan, M. K. (2018). A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimedia Tools and Applications, 77(8), 10195–10215.
  • Varia, J., & Mathew, S. (2014). Overview of amazon web services. Amazon Web Services. Retrieved from https://docs.aws.amazon.com/aws-technical-content/latest/aws-overview/aws-overview.pdf
  • Wang, C., Hu, F., Sha, D., & Han, X. (2017). Efficient LiDAR point cloud data managing and processing in a hadoop-based distributed framework. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4, 121.
  • Wang, Z., Xiao, D., Fang, F., Govindan, R., Pain, C. C., & Guo, Y. (2018). Model identification of reduced order fluid dynamics systems using deep learning. International Journal for Numerical Methods in Fluids, 86(4), 255–268.
  • Whitcraft, A. K., Becker-Reshef, I., & Justice, C. O. (2015). A framework for defining spatially explicit earth observation requirements for a global agricultural monitoring initiative (GEOGLAM). Remote Sensing, 7(2), 1461–1481.
  • Wiegand, N., & García, C. (2007). A task‐based ontology approach to automate geospatial data retrieval. Transactions in GIS, 11(3), 355–376.
  • Wright, D. J., Raad, M., Hoel, E., Park, M., Mollenkopf, A., & Trujillo, R. (2016, December). Feature geo analytics and big data processing: Hybrid approaches for earth science and real-time decision support. In AGU Fall Meeting Abstracts. San Francisco, CA.
  • Xie, H., Zhou, X., Vivoni, E. R., Hendrickx, J. M., & Small, E. E. (2005). GIS-based NEXRAD Stage III precipitation database: Automated approaches for data processing and visualization. Computers & Geosciences, 31(1), 65–76.
  • Xu, Z., Mei, L., Hu, C., & Liu, Y. (2016). The big data analytics and applications of the surveillance system using video structured description technology. Cluster Computing, 19(3), 1283–1292.
  • Xu, Z., Mei, L., Lu, Z., Hu, C., Luo, X., Zhang, H., & Liu, Y. (2017). Multi-modal description of public security events using surveillance and social data. IEEE Transactions on Big Data. doi:10.1109/TBDATA.2017.2656918
  • Xuan, P., Ligon, W. B., Srimani, P. K., Ge, R., & Luo, F. (2017). Accelerating big data analytics on HPC clusters using two-level storage. Parallel Computing, 61, 18–34.
  • Yalew, S. G., Van Griensven, A., & van der Zaag, P. (2016). AgriSuit: A web-based GIS-MCDA framework for agricultural land suitability assessment. Computers and Electronics in Agriculture, 128, 1–8.
  • Yamazaki, F., & Liu, W. (2016, September). Remote sensing technologies for post-earthquake damage assessment: A case study on the 2016 Kumamoto earthquake. In Proceedings of the 6th Asia Conference on Earthquake Engineering (6ACEE), Cebu City, Philippines (pp. 22–24).
  • Yang, C., Yu, M., Hu, F., Jiang, Y., & Li, Y. (2017a). Utilizing cloud computing to address big geospatial data challenges. Computers, Environment and Urban Systems, 61, 120–128.
  • Yang, C. P., Yu, M., Xu, M., Jiang, Y., Qin, H., Li, Y., … Seery, B. (2017b, March). An architecture for mitigating near earth object’s impact to the earth. In 2017 IEEE Aerospace Conference (pp. 1–13). Big Sky, MT: IEEE.
  • Yu, B., Liu, H., Wu, J., Hu, Y., & Zhang, L. (2010). Automated derivation of urban building density information using airborne LiDAR data and object-based method. Landscape and Urban Planning, 98(3), 210–219.
  • Yu, M., Huang, Q., Qin, H., Scheele, C., & Yang, C. (2019). Deep learning for real-time social media text classification for situation awareness–Using Hurricanes Sandy, Harvey, and Irma as case studies. International Journal of Digital Earth, 1–18.
  • Yu, M., Yang, C., & Li, Y. (2018). Big data in natural disaster management: A review. Geosciences, 8(5), 165.
  • Yu, X., Wu, X., Luo, C., & Ren, P. (2017). Deep learning in remote sensing scene classification: A data augmentation enhanced convolutional neural network framework. GIScience & Remote Sensing, 54(5), 741–758.
  • Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., Mccauley, M., … Stoica, I. (2012). Fast and interactive analytics over Hadoop data with Spark. Usenix Login, 37(4), 45–51.
  • Zhang, A., & Jia, G. (2013). Monitoring meteorological drought in semiarid regions using multi-sensor microwave remote sensing data. Remote Sensing of Environment, 134, 12–23.
  • Zhang, X. M., He, G. J., Zhang, Z. M., Peng, Y., & Long, T. F. (2017). Spectral-spatial multi-feature classification of remote sensing big data based on a random forest classifier for land cover mapping. Cluster Computing, 20(3), 2311–2321. doi:10.1007/s10586-017-0950-0
  • Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Proceedings of the Twenty-eighth Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada (pp. 487–495), 8–13 December.
  • Zhou, W., Wang, J., & Cadenasso, M. L. (2017). Effects of the spatial configuration of trees on urban heat mitigation: A comparative study. Remote Sensing of Environment, 195, 1–12.