1,325
Views
4
CrossRef citations to date
0
Altmetric
Original Research Article

A unified representation method for interdisciplinary spatial earth data

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 126-145 | Received 11 Jan 2022, Accepted 14 Jun 2022, Published online: 12 Jul 2022

ABSTRACT

Unified representation of spatial earth data is an essential scientific issue. The analysis and mining of interdisciplinary spatial earth data resources can help discover hidden scientific knowledge, and even reveal the intrinsic relationship among different disciplines. However, the different description methods and inner structures among interdisciplinary spatial earth data bring significant challenges to unified data management and collaborative analysis in earth environment research. To address this issue, this paper proposes a unified representation method for interdisciplinary spatial earth data. First, this paper establishes a general metadata model and realizes the unified description of interdisciplinary data. Second, an entity data organization model is presented, which can realize the unified organization of entity data with different inner structures. Finally, we introduce the Spatial Earth Data Format (SEDF), a data format based on HDF5 for implementing the data organization model of interdisciplinary spatial earth data. Data representation experiments and validation are conducted to verify the availability and practicability of the proposed data representation method. The results suggest the powerful ability to represent spatial earth data efficiently and ensure data integrity, which is convenient for data management and application.

1. Introduction

With the rapid development of computer and ground- and space-based observation technologies, significant discoveries and breakthroughs have been achieved (Li, Wang, Wei, & Lin, Citation2019; Ye et al., Citation2015). As an essential means of carrying out scientific research, deepening space exploration activities bring unprecedented opportunities and challenges to the field of geoscience (Guo, Citation2017; Qiu, Wang, & Ma, Citation2020). A series of space exploration missions have been set up at home and abroad at the moment, and massive spatial earth data resources, including remote sensing satellite images and space environment monitoring data, have been produced and acquired (Wu, Liu, Qiao, & Jie, Citation2012; Yao et al., Citation2020). Furthermore, the earth system can be vertically divided into the surface, near space, and near-earth space from the perspective of space; this research named the monitoring data in different spheres as spatial earth data. These data resources have different representation methods and cover various disciplines, such as geography, atmospheric science, space physics, and astronomy, which provide fundamental data support for the scientific research of earth and space environment, especially research on resource investigation and space physical process (Camporeale, Citation2019; Wang, Jia, Yin, & Tian, Citation2019).

There is abundant information in the massive amount of multidisciplinary spatial earth data, which hides undiscovered scientific knowledge. Therefore, further mining the knowledge and discovering the scientific law of various vertical spheres are current research hotspots, wherein the first is to solve the problem of organizing and managing spatial earth data in a unified spatial and temporal framework (Sudmanns et al., Citation2020). Spatial earth data consist of metadata and entity data. For one thing, the methods and guidelines for describing data are different from each other, resulting in the diverse structure and content of metadata files, which hinders the interoperability among multidisciplinary data (Wang et al., Citation2019). For another, there is also a difference in the multidisciplinary data organization structure and the data format, which leads to complicated data processing and analysis (Yan, Chen, Chen, & Liang, Citation2020). The above differences in data representation are inconvenient for the unified management, collaborative application, and sharing of spatial earth data. Consequently, realizing the unified representation of heterogeneous data is of great significance.

The primary focus of this study is to solve the unified representation of heterogeneous spatial earth data in interdisciplinary earth environment research, so as to facilitate the data collaborative application and analysis; therefore, a unified representation framework is introduced. First, through an in-depth investigation and research of common metadata standards and specifications in different disciplines, a general metadata model is established using Unified Modelling Language (UML), which is applicable to multiple types of spatial earth data and realizes the unified description of data. Then, a data organization model that is suitable for various spatial earth data and realizes the unified organization of data is built. Furthermore, based on the investigation of the existing scientific data formats, the Spatial Earth Data Format (SEDF) is proposed and designed based on the HDF5 data format. We also implement the conversion between SEDF and other data formats. The experimental results prove that the proposed data representation method can realize the unified representation of spatial earth data, including metadata and entity data, which provides a theoretical basis for unified management and analysis and is significant for interdisciplinary research.

The remainder of this paper is organized as follows: Section 2 describes the background of the research, including the metadata model, data organization model, and the corresponding research works. Section 3 elaborates the main contents of the proposed unified data representation method. Section 4 presents the experiments conducted and the results. Section 5 concludes and discusses the paper.

2. Background

2.1. Metadata model

Metadata is descriptive information about data that can help to obtain a better understanding of data (Duval, Hodgins, Sutton, & Weibel, Citation2002; Green & Bossomaier, Citation2003). Establishing metadata models is one of the focuses in the field of data science, which is also the premise and guarantee of data standardization (Chan & Zeng, Citation2006). To promote data application, different disciplines often establish their respective metadata models along with various structures and contents, which brings inconvenience to data exchange, integration, and unified management (Li & Huang, Citation2017). In the field of geoscience, research on building geospatial metadata standards has been a research hot spot at home and abroad. National and federal standard organizations, such as the International Organization for Standardization (ISO), Federal Geographic Data Committee (FGDC) and National Aeronautics and Space Administration (NASA), have set up working groups to discuss the formulation of standards from different aspects. Currently, the main geospatial metadata standards include the geographical information metadata standard (ISO 19115) (ISO/TC211, Citation2019), the Content Standard for Digital Geospatial Metadata (CSDGM) (NASA, Citation2002). In space physics and astronomy, the data mainly follow Space Physics Archive Search and Extract (SPASE) (NASA, Citation2020).

(1) ISO 19115

ISO 19115 is developed by ISO/TC211 and defines how to describe geographical information and service. It uses metadata entities and elements based on UML. There are two parts: ISO 19115–1 (2014) and ISO 19115-2 (Citation2019). The former is the fundamental part for describing geographic information resources and defines a series of metadata elements, properties and their relationships, while the latter is documented to augment the former to provide data acquisition and processing information for geographical resources.

(2) CSDGM

The CSDGM is developed by the FGDC in 1998, with the objective of providing a common set of terminology and definitions for digital geospatial data. It is organized in a hierarchical structure with sections, data elements and compound elements. This standard consists of seven main sections: Identification, Data Quality, Spatial Data Organization, Spatial Reference, Entity and Attribute, Distribution, Metadata Reference, and three auxiliary sections: Citation, Time Period and Contact.

(3) SPASE

SPASE is developed by space physics data holding organizations funded by NASA. This model provides a unified description of Heliophysics resources based on the Extensible Markup Language (XML) to help researchers retrieve data of interest. Specifically, it defines a set of terms to describe data including scientific context, source provenance, content and location. This model has been widely used in space physical exploration missions at present.

Based on common metadata standards, researchers have carried out different research under different application scenarios. Fan et al. integrated and managed remote sensing metadata in a distributed data center spatial infrastructure based on ISO 19115 (Fan, Yan, Ma, & Wang, Citation2017). Takahashi et al. introduced a conceptual model for earth observation data to better manage metadata (Takahashi, Tatedoko, Kinutani, & Yoshikawa, Citation2009). Gebhardt et al. managed and disseminated spatial data in a web-based information system based on ISO 19115 (Gebhardt et al., Citation2010).

However, the above metadata standards have not taken attribute information, such as data category and data copyright, into account and the data quality information and spatial information are incomplete. This study concentrates on building a general metadata model that is suitable for spatial earth data from different disciplines to realize the unified description of spatial earth data. While defining the basic attribute information of spatial earth data, the other description is supplemented. The unified and complete description of spatial earth data can be realized through the general metadata model, which is the premise of unified data management.

2.2. Data storage format

The data storage format is the carrier of scientific data that can store and distribute data. The selection of data format depends on the requirements of different disciplines. Several research institutions and organizations construct scientific data formats that mainly include Geo-Tag Image File Format (GeoTIFF), Hierarchical Data Format (HDF), Network Common Data Format (NetCDF) and IONosphere map Exchange format (IONEX).

(1) TIFF and GeoTIFF

Before introducing GeoTIFF, it is necessary to understand the structure of TIFF (Murray & VanRyper, Citation1996). TIFF is developed by Aldus Corporation and Microsoft for the purpose of providing a public image file format standard. It consists of three parts: file header, file directory and image data. The latest version is TIFF 6.0. Due to the flexible storage pattern and supporting various image modes, TIFF has been increasingly used to store and distribute raster geographic data. However, there is no fixed structure in the format to store geospatial information, and when the raster data is transferred from one system to another, the user should confirm the geographic location information in advance; thus, limitations still exist in applications such as cartography and mapping.

Faced with the deficiencies mentioned above, GeoTIFF came into being (Ritter & Ruth, Citation1997). Dr. Niles Ritter from NASA’s Jet Propulsion Laboratory encoded geographical information using a series of keys in TIFF format and named the format the GeoTIFF 1.0 standard. The GeoTIFF file structure inherits the TIFF 6.0 standard, so strictly speaking, GeoTIFF is a special type of TIFF. Currently, GeoTIFF is adopted by NASA to store earth science data.

(2) HDF

HDF is created by the NCSA (National Center for Supercomputing Application), and stores and distributes scientific data in a hierarchical structure to meet the demand of exchanging data among computing platforms (Habermann & Folk, Citation2014). This format is independent of computer architecture and provides multiple compression algorithms, such as GZIP, LZF and SZIP; thus, the data storage efficiency and transmission speed can be improved. The newest data format version is HDF5 (Koranne, Citation2011). Compared with HDF4, HDF5 overcomes limitations and supports larger files and more data types; this is the largest difference between HDF5 and other image data formats. Currently, HDF is adopted by NASA and NOAA as their standard data storage format.

HDF5 contains two basic objects: group and dataset, and other objects contain the data space, data type, property and attribute. Data space defines the rank and dimension of scientific data. Data type is the expression of data such as integer. Property defines the parameters of data block and compression. Attribute is the additional description of the scientific dataset.

(3) NetCDF

NetCDF is an array-oriented data format proposed by the scientists of UCAR (University Corporation for Atmospheric Research) (Rew & Davis, Citation1990). It provides an interface between the application and real-time meteorological data. Due to its flexibility, NetCDF is currently widely used to store and distribute scientific data in various disciplines including atmosphere, ocean and space physics. The newest version is NetCDF 4.0 based on the HDF5 library.

From a mathematical standpoint, the data stored in NetCDF is a single-valued function with multiple variables:fx,y,z=value. x, y and z represent dimensions, and value is the observation value of the sensor. Dimension can be used to represent elements that have actual physical meaning, such as time, elevation, longitude, and latitude. The observation value is used to represent the physical phenomenon, and generally, it is a multi-dimensional dataset.

(4) IONEX

IONEX is the data exchange format for Total Electron Content (TEC) maps in the ionosphere and is provided by the International GPS Service for Geodynamics (IGS) (Schaer, Gurtner, & Feltens, Citation1998). It supports the exchange of 2- and 3-dimensional TEC maps given in a geographic grid. The IONEX file consists of two parts: header and data. The former contains the basic information of TEC data, such as the time range, interval, and station identifier. The latter contains practical values of each TEC map in each geographical coordinate.

Based on the existing scientific data format, researchers have tried to build new data formats to meet the requirements of various disciplines. Some researchers organize data from the conceptual level. Sun et al. proposed a unified geospatial data ontology model to facilitate data integration and sharing (Sun et al., Citation2019). Zhu et al. introduced a unified representation method for 3-dimensional city models to realize the description of complicated objects (Zhu, Li, & Zhang, Citation2007). However, this method pays more attention to the theoretical realization of data unified representation, and obstacles stand in the way of practical application. Others study from the data itself. For instance, Wang provided a general data representation method for scientific data based on XML (Wang et al., Citation2008). Krischer et al., Könnecke et al. and de Buyl et al. established new data exchange and archival formats based on HDF5, which achieve the efficient storage, organization and application of seismic data, neutron, X-ray and muon experiments, and neutron, X-ray and molecular data, respectively (de Buyl, Colberg, & Höfling, Citation2014; Krischer, Smith, Lei, Lefebvre, & Tromp, Citation2017; Könnecke et al., Citation2015). Faced with the deficiencies of the Flexible Image Transport System in the process of storing astronomical data, Greenfield et al. proposed a new data format based on Yet Another Markup Language, and proved its applicability through experiments (Greenfield, Droettboom, & Bray, Citation2015). In addition, as a data organization framework for building Digital Earth, the Discrete Global Grid System (DGGS) takes the whole earth as the research object and subdivides the geospace evenly into discrete grids (Sahr, White, & Kimerling, Citation2003). The DGGS model realizes the unified organization of entity data and its existing works focus on the grid encoding, coordinate transformation, precision evaluation system, etc. (Lin, Zhou, Xu, Zhu, & Lu, Citation2018; Ma et al., Citation2021; Wang, Ben, Zhou, & Zheng, Citation2021).

The above methods start from metadata or entity data and achieve the unified description or organization of scientific data. However, the spatial earth data in this paper involves in various spheres of earth system, such as image, text, and multi-dimensional array; thus, currents methods cannot realize the authentic and complete unified representation of spatial earth data in essence. With the premise of combining data science theory and the requirement of practical application, this paper proposes a data unified representation method that establishes a general metadata model and entity data organization model, which can solve the unified representation problem in the process of interdisciplinary data management and application.

3. Unified representation method for spatial earth data

The core of the proposed unified representation method in this paper consists of the general metadata model, entity data organization model, and data storage format. By establishing the general metadata model and entity data organization model, this study realizes unified metadata description and entity data organization, respectively. Finally, SEDF is presented to store interdisciplinary spatial earth data.

3.1. General metadata model

To realize the unified description of various spatial earth data, including surface, near-earth space, and near space monitoring data, the first is to establish the general metadata model (GMM). Spatial earth metadata play an important role during data organization, integration, management, and distribution (Li, Zhang, Zhang, Wang, & Tian, Citation2016). In this study, it includes the attribute information created for the data file, such as the identifier, category, time range, and spatial range (Nogueras-Iso, Zarazaga-Soria, & Muro-Medrano, Citation2005).

Through the investigation of ISO 19115–2, CSDGM, and SPASE and according to the characteristics of interdisciplinary spatial earth data, this paper designs a description rule based on the Unified Modeling Language, which describes spatial earth data from nine aspects, such as identification, platform, representation, time, space, quality, copyright, distribution, and extension.

GMM=[Identification, Platform, Representation, Time, Space, Quality, Copyright, Distribution, Extension].

The general metadata model can be established using the above rule, as shows.

Figure 1. Structure of the general metadata model.

Figure 1. Structure of the general metadata model.

As can be observed from , there are nine UML classes in the model: GMM_Identification, GMM_Platform, GMM_Representation, GMM_Time, GMM_Space, GMM_Quality, GMM_Distribution, GMM_Copyright, and GMM_Extension. Each class in the GMM contains subclasses or elements. Detailed information for each element in the GMM is shown in .

Table 1. Elements in the general metadata model for spatial earth data.

The general metadata model provides many elements for describing spatial earth data. Compared with other metadata models, in addition to basic information, the proposed model takes spatial information, data quality, and copyright into account and then redefines them. The key distinctions are as follows.

  • From the perspective of data type, interdisciplinary heterogeneous spatial earth data with different formats can be described in the same metadata framework;

  • From the perspective of spatial information, we consider both the geographical boundary and elevation. Because the geographic boundary of spatial earth data is mostly an irregular geometric polygon, the longitude and latitude values cannot be managed structurally. Therefore, while saving all the latitude and longitude values of the data boundaries, this study calculates the minimum enclosing rectangle of the data vector boundary and stores the longitude and latitude values of the four vertices, which ensures the data availability and facilitates data application. In addition, the elevation of spatial environment data is usually a range, while that of surface environment is a definite value. The metadata model also saves the maximum elevation, minimum elevation and average elevation;

  • From the perspective of data quality, in addition to the cloud cover information, the data citation information, such as the citation type and description, is also included. In this study, whether the data are cited by paper, patent, monograph, etc., is taken as a way to evaluate the quality of spatial earth data.

  • From the perspective of data copyright, we define the data owner and the data provider.

3.2. Entity data organization model

Faced with large-scale spatial earth data, the core of data collaborative application is to process and analyze data quickly and efficiently. The data processing and analysis are influenced by the data organization method, and the ultimate goal of realizing the unified organization of data is to help scientists focus on scientific research and avoid exerting too much energy into the cumbersome data processing process. According to different application scenarios, realizing the collaborative analysis of space earth data often requires the combination of different types of observation data.

There are independent data management systems and methods for data acquisition and processing in various disciplines. Due to the complicated data acquisition and processing process, collaborative analysis for multiple disciplines is complicated, resulting in the low utilization efficiency of data resources. Managing different data types in a unified form is better for collaborative application and analysis (Wu, Shen, Wang, & Wu, Citation2020). This research establishes a data organization model for spatial earth data to realize unified organization. shows the data organization model.

Figure 2. Data organization model of spatial earth data.

Figure 2. Data organization model of spatial earth data.

In , two components are included in the organization model: dimension and observation data. The dimension is the basis for grouping spatial earth data; usually, it is a physical quantity, such as the time dimension, data type dimension, and elevation dimension. Once the dimension is determined, spatial earth data can be organized and understood in the corresponding framework. Observation data are the monitoring data in various spheres of the earth system, which need to be stored, such as remote sensing data (optical image, radar image, etc.), atmospheric data (wind field, wind temperature, etc.), and ionospheric data (total electron content, electron density, etc.). Moreover, according to the practical application scenario, the stored data can be a dataset or a single data. Specifically, for interdisciplinary research on earth environment, it is often necessary to obtain data from different disciplines. Under this condition, the observation data component is a dataset composed of multiple data. While for some particular studies, such as the extraction of regional feature information, the analysis of atmospheric environment and total electron content, a single data can meet the demand. In this situation, the observation data component is a single data.

According to the data organization model, the logical structure is obtained. As shows.

Figure 3. Logical structure of spatial earth data organization model.

Figure 3. Logical structure of spatial earth data organization model.

The logical structure is organized into two layers: dimension layer and observation data layer. In the dimension layer, spatial earth data can be classified and organized according to dimension information, such as time (time point or time range), data category and elevation. The dimension information is determined by the requirement of the application scenario, which is not limited to the types listed above. In the observation layer, the observation data are stored. This organization method can be understood as a tree structure, and the function of the dimension is similar to the index. After the observation data are stored, they can be located by searching the dimension.

3.3. Data storage format

There are two ways to implement the entity data organization model in data format. One is to use the existing data format that provides mature interfaces. This method can ensure data readability and realize data processing and analysis quickly. The other is to develop a new data format and the corresponding basic function library. This method requires considerable human work, materials and time resources. Furthermore, it will take time for users to become familiar with the new data format, which will influence the data processing and scientific research progress. In conclusion, this study chooses the former method to implement the entity data organization model.

This study implements the spatial earth data organization model using a determined data format, which is named as SEDF (Spatial Earth Data Format). SEDF organizes data with a hierarchical structure, and the contents are approximately organized into three sections:

  • Description of interdisciplinary spatial earth data is stored in the metadata group, which initially is an XML file;

  • Observation data, also called entity data, including image, multidimensional array, and text record, are stored in the entity data group, with different data formats being the original form;

  • Other description of spatial earth data that is not contained in the metadata file is stored in other information group.

The storage structure is shown in .

Figure 4. Storage structure of SEDF.

Figure 4. Storage structure of SEDF.

(1) HDF5 data structure

Through the analysis of existing scientific data formats, it can be found that space environmental data mainly adopt NetCDF and IONEX format, and surface environment data mainly adopt HDF, GeoTIFF, and NetCDF. HDF is a data format that can store different data types, including multi-dimensional array, image, and text (Poinot, Citation2010). Because of its features, this paper organizes the spatial earth data using HDF5 and redefines its rule of the group so that the subdirectory of the root group can no longer store entity data directly.

(2) Spatial earth metadata

Although the HDF5 data format can realize the self-description of spatial earth data by defining its attributes, including data type and data space, viewing its metadata information always requires professional software, such as HDFView and HDFExporer, which is complicated. While acquiring data and applying them, generally, the metadata information must be analyzed first. Thus, the research produces a metadata file based on XML specification and encapsulates it in the HDF5 data format.

There are three parts in the metadata file: file header, XML declaration, and metadata content. The file header indicates that the file is a metadata file of spatial earth data; the XML declaration indicates that the file follows the XML specification, which defines the version and coding format; the metadata content includes attributes, metadata elements, and corresponding values.

(3) Spatial earth entity data (set)

Different types of spatial earth entity data (set), such as remote sensing image data, atmospheric data, and ionospheric data, are stored in this part with the expression of a multidimensional array, image, and text data. Entity data (set) are the key to the data organization model, which can be classified and stored according to grouping rules. The unified storage of various spatial earth data provides convenience for collaborative data analysis in different applications.

(4) Other information

Information that is not included in the metadata file but necessary for data processing, analysis and application is explained in this part. For example, illustration of storing a single file or multiple files, data application scenario, or other information. The other information group is optional. If there is no additional information used to describe the spatial earth data, this part will not be embodied in the SEDF.

4. Experiment and results

4.1. Experimental data

The experimental data consist of various spatial earth data that include the data in the surface environment, near-earth space, and near space. This study chooses optical remote sensing image data, surface reflectance data, atmospheric wind field data, and ionospheric total electron content data in different formats as typical representatives to conduct experiments. presents the details.

Table 2. Detailed information on the experimental data.

4.2. Unified representation of spatial earth data

4.2.1. Metadata description

To verify the unified description capability of the general metadata model for spatial earth data, this study utilizes the proposed metadata model to describe various specific spatial earth data displayed in . shows the structure and content of one of the spatial earth metadata files, and the other types of metadata files are shown in Figure S1, Appendix A.

Figure 5. Content and structure of the XML metadata file for Landsat 8.

Figure 5. Content and structure of the XML metadata file for Landsat 8.

As shown in , the unified description of various spatial earth data can be realized with the proposed general metadata model, which proves the effectiveness of the model. From the metadata file, basic information, such as data name, category and spatial range, can be obtained.

4.2.2. Entity data organization

Spatial earth entity data are a numerical reflection of the objective world monitored by sensors, which has significant research value (Sudmanns et al., Citation2020). To verify the unified organization capability of the data organization model for spatial earth data, this study utilizes the data organization model to organize various spatial earth data in the SEDF. shows the storage structure and content of single and multiple types of spatial earth data using the Panoply Data Viewer (https://www.giss.nasa.gov/tools/panoply/) provided by NASA.

Figure 6. Storage content and structure of the SEDF file.

Figure 6. Storage content and structure of the SEDF file.

In , “exper.h5” is the data name after format conversion. Based on SEDF, data formats such as NetCDF, HDF, GeoTIFF and INOEX are converted and stored in groups according to data types. For instance, the “Atmosphere” group stores atmospheric wind field data; the “Ionosphere” group stores ionospheric total electron content data; the “Remote Sensing” group stores surface environment data, including remote sensing image data and product. In summary, single and multiple types of spatial earth data can be organized in a unified framework with SEDF based on the data organization model.

4.3. Validation

After the unified representation of spatial earth data, including metadata and entity data, the data quality should be evaluated. This study validates the data quality from two aspects: data integrity and data visualization.

4.3.1. Data integrity

This research compares all the values stored in the original format with those in SEDF. Taking three sample points of each data as an example; the result is shown in . Specifically, gray values of Landsat 8 remote sensing data, reflectance values of MODIS surface reflectance data, wind field values of meridional wind data and values of total electron content data are compared, respectively.

Table 3. Comparison of values in the original format and SEDF.

It can be observed from that for interdisciplinary spatial earth data, none of the values have been changed during data conversion, and the data accuracy and integrity are ensured.

4.3.2. Data visualization

After checking the spatial earth data integrity, this study validates the data quality from the perspective of data visualization. To display spatial earth data vividly and intuitively, we develop a tool to visualize different types of spatial earth data based on WebGL (Evangelidis, Papadopoulos, Papatheodorou, Mastorokostas, & Hilas, Citation2018). The metadata elements are parsed first, and the visualization effect is shown in .

Figure 7. Visualization of spatial earth data.

Figure 7. Visualization of spatial earth data.

As can be observed from , there is no loss of data features during data visualization, and the efficiency of data rendering is high, which proves that the unified representation of spatial earth data can ensure data quality.

5. Conclusions and discussion

To overcome the deficiencies existing in the unified representation of interdisciplinary spatial earth data in earth environment research and enhance data management and collaborative analysis, this paper proposes a unified representation method that includes a general metadata model, entity data organization model, and data storage format. Through conducting the unified representation experiments and validation on interdisciplinary spatial earth data, the availability and practicability of the proposed method are proved. The following conclusions can be drawn:

  • By establishing the general metadata model, this study realizes the unified description of spatial earth data in XML, which proves the effectiveness of the model;

  • By building the data organization model, this study achieves the unified organization of spatial earth entity data using SEDF according to the grouping rules, which proves the availability of the model;

  • The validation, including data integrity checking and data visualization, shows that all the values remain unchanged during the data conversion and that SEDF can ensure data accuracy.

The proposed method in this study can realize the unified representation of interdisciplinary spatial earth data, such as geography, atmospheric science and space physics, which provides a new idea for data management and has practical value. However, the limitations of the proposed method are mainly reflected in the following two aspects. First, due to the complexity of the earth system, the experimental data in this study are still insufficient. Therefore, for future work, we plan to investigate more spatial earth data structures with various disciplines and enrich the data format to verify the applicability of the general metadata model and the entity data organization model. In addition, to further verify the availability, the proposed method will be applied in earth environment research to realize the collaborative analysis of spatial earth data, including typhoon and earthquake.

Supplemental material

Supplemental Material

Download MS Word (198.7 KB)

Acknowledgments

The authors would like to thank USGS Earth Explorer (https://earthexplorer.usgs.gov/). The Level-1 and Atmosphere Archive & Distribution System Distributed Active Archive Center (https://ladsweb.modaps.eosdis.nasa.gov/), NOAA Physical Sciences Laboratory (ftp://ftp.cdc.noaa.gov/) and IGS Data Center of Wuhan University (ftp://igs.gnsswhu.cn/pub/gps/products/) for providing experimental data. The authors would also like to thank the anonymous reviewers and editors for commenting on this paper.

Disclosure statement

No potential conflict of interest was reports by the authors.

Data availability statement

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/20964471.2022.2091310.

Additional information

Funding

This work was supported by Open Science-oriented Interoperable Global Earth Observation System of Systems (grant number 2019YFE0126400) and Programme of Cooperation on the Analysis of Carbon Satellites Data (grant number 131211KYSB20180002).

Notes on contributors

Shuang Wang

Shuang Wang is currently pursuing the Ph.D. degree in signal and information processing with the University of Chinese Academy of Sciences. She received her M.S. degree from the School of Information Science and Technology, Beijing Forestry University. Her research interests include data engineering and geospatial data management.

Jian Wang

Jian Wang is a senior engineer in Aerospace Information Research Institute, Chinese Academy of Sciences. He received his Ph.D. degree from Institute of Software, Chinese Academy of Sciences. His research areas include high-performance geographic computing, parallel computing, remote sensing big data infrastructure.

Qin Zhan

Qin Zhan is an associate professor in Aerospace Information Research Institute, Chinese Academy of Sciences. She received her Ph.D. degree of photogrammetry and remote sensing from Wuhan University. Her research interests include digital earth, big earth data, natural disaster, and data visualization.

Lianchong Zhang

Lianchong Zhang is the deputy director of National Earth Observation Data Center of China (NODA), the deputy director of ChinaGEOSS Data Sharing Network, and an Assistant Professor in Aerospace Information Research Institute, Chinese Academy of Sciences. He received Ph.D. degree in signal and information processing from the Institute of Remote Sensing and Digital Earth of Chinese Academy of Sciences in 2019 and received his postdoctoral training from Aerospace Information Research Institute, Chinese Academy of Sciences in 2022. His favorite research areas concern high performance remote-sensing image-processing technology and the big earth data.

Xiaochuang Yao

Xiaochuang Yao is working in the College of Land Science and Technology, China Agricultural University. He received his Ph.D. degree from China Agricultural University in 2017. His research interests include spatial big data and agricultural applications.

Guoqing Li

Guoqing Li is the director of National Earth Observation Data Center of China (NODA), the director of ChinaGEOSS Data Sharing Work, and a professor of Aerospace Information Research Institute, Chinese Academy of Sciences. He received Master of Science and Ph.D. of Science from Chinese Academy of Sciences in 1999 and 2005 respectively, majored in Cartography and Geographic Information System. He also has taken the visiting studies in ESA/ESRIN in 2007–2008 and Purdue University in 2010. His favorite research areas concern high-performance remote-sensing image-processing technology and big earth data. His main focus is currently on next-generation spatial data infrastructure and nature disaster data management.

References

  • Camporeale, E. (2019). The challenge of machine learning in space weather: Nowcasting and forecasting. Space Weather, 17(8), 1166–1207. doi:10.1029/2018SW002061
  • Chan, L., & Zeng, M. (2006). Metadata interoperability and standardization–a study of methodology part I. D-Lib Magazine, 12(6), 1082–9873. doi:10.1045/june2006-chan
  • de Buyl, P., Colberg, P. H., & Höfling, F. (2014). H5MD: A structured, efficient, and portable file format for molecular data. Computer Physics Communications, 185(6), 1546–1553. doi:10.1016/j.cpc.2014.01.018
  • Duval, E., Hodgins, W., Sutton, S., & Weibel, S. L. (2002). Metadata principles and practicalities. D-Lib Magazine, 8(4), 1–10. doi:10.1045/april2002-weibel
  • Evangelidis, K., Papadopoulos, T., Papatheodorou, K., Mastorokostas, P., & Hilas, C. (2018). 3D geospatial visualizations: Animation and motion effects on spatial objects. Computers & Geosciences, 111, 200–212. doi:10.1016/j.cageo.2017.11.007
  • Fan, J., Yan, J., Ma, Y., & Wang, L. (2017). Big data integration in remote sensing across a distributed metadata-based spatial infrastructure. Remote Sensing, 10(1), 7. doi:10.3390/rs10010007
  • Gebhardt, S., Wehrmann, T., Klinger, V., Schettler, I., Huth, J., Künzer, C., & Dech, S. (2010). Improving data management and dissemination in web based information systems by semantic enrichment of descriptive data aspects. Computers & Geosciences, 36(10), 1362–1373. doi:10.1016/j.cageo.2010.03.010
  • Green, D., & Bossomaier, T. (2003). Online GIS and spatial metadata. Electronic Library, 21(3), 266. doi:10.1108/02640470310462489
  • Greenfield, P., Droettboom, M., & Bray, E. (2015). ASDF: A new data format for astronomy. Astronomy & Computing, 12, 240–251. doi:10.1016/j.ascom.2015.06.004
  • Guo, H. (2017). Big earth data: A new frontier in earth and information sciences. Big Earth Data, 1(1–2), 4–20. doi:10.1080/20964471.2017.1403062
  • Habermann, T., & Folk, M. (2014). The hierarchical data format (HDF): A foundation for sustainable data and software. AGU Fall Meeting Abstracts.
  • ISO/TC211. (2019). ISO 19115-2:2019. Geographic information-metadata-Part 2: Extensions for acquisition and processing. ISO/TC 211.
  • Könnecke, M., Akeroyd, F. A., Bernstein, H. J., Brewster, A. S., Campbell, S. I., Clausen, B., … Wuttke, J. (2015). The NeXus data format. Journal of Applied Crystallography, 48(1), 301–305. doi:10.1107/S1600576714027575
  • Koranne, S. (2011). Hierarchical data format 5: HDF5. In Handbook of open source tools (pp. 191–200). Boston, MA: Springer. doi:10.1007/978-1-4419-7719-9-10.
  • Krischer, L., Smith, J., Lei, W., Lefebvre, M., & Tromp, J. (2017). An adaptable seismic data format. Geophysical Journal International, 207(2), 1003–1011. doi:10.1093/gji/ggw319
  • Li, G., Zhang, H., Zhang, L., Wang, Y., & Tian, C. (2016). Development and trend of earth observation data sharing. Journal of Remote Sensing, 20, 979–990. doi:10.11834/jrs.20166173
  • Li, G., & Huang, Z. (2017). Data infrastructure for remote sensing big Data: Integration, management and on-demand service. Journal of Computer Research and Development, 54(2), 267–283. doi:10.7544/issn1000-1239.2017.20160837
  • Li, C., Wang, C., Wei, Y., & Lin, Y. (2019). China’s present and future lunar exploration program. Science, 365(6450), 238–239. doi:10.1126/science.aax9908
  • Lin, B., Zhou, L., Xu, D., Zhu, A., & Lu, G. (2018). A discrete global grid system for earth system modeling. International Journal of Geographical Information Science, 32(4), 711–737. doi:10.1080/13658816.2017.1391389
  • Ma, Y., Li, G., Yao, X., Cao, Q., Zhao, L., Wang, S., & Zhang, L. (2021). A precision evaluation index system for remote sensing data sampling based on hexagonal discrete grids. ISPRS International Journal of Geo-Information, 10(3), 194. doi:10.3390/ijgi10030194
  • Murray, J. D., & VanRyper, W. (1996). Encyclopedia of graphics file formats. Sebastopol: O’Reilly.
  • NASA. (2002). US FGDC content standard for digital geospatial metadata: Extensions for remote sensing metadata. NASA.
  • NASA. (2020). Space physics archive search and extract. NASA.
  • Nogueras-Iso, J., Zarazaga-Soria, F. J., & Muro-Medrano, P. R. (2005). Geographic information metadata for spatial data infrastructures. In Resources, interoperability and information retrieval (pp. 31-88). Berlin, Heidelberg: Springer.
  • Poinot, M. (2010). Five good reasons to use the hierarchical data format. Computing in Science & Engineering, 12(5), 84–90. doi:10.1109/MCSE.2010.107
  • Qiu, J., Wang, Q., & Ma, J. (2020). Deep space exploration technology. Infrared and Laser Engineering, 49(5), 1–10. doi:10.3788/IRLA20201001
  • Rew, R., & Davis, G. (1990). NetCDF: An interface for scientific data access. IEEE Computer Graphics and Applications, 10(4), 76–82. doi:10.1109/38.56302
  • Ritter, N., & Ruth, M. (1997). The GeoTiff data interchange standard for raster geographic images. International Journal of Remote Sensing, 18(7), 1637–1647. doi:10.1080/014311697218340
  • Sahr, K., White, D., & Kimerling, A. J. (2003). Geodesic discrete global grid systems. Cartography and Geographic Information Science, 30(2), 121–134. doi:10.1559/152304003100011090
  • Schaer, S., Gurtner, W., & Feltens, J. (1998). IONEX: The ionosphere map exchange format version 1. In Proceedings of the IGS AC workshop. Darmstadt, Germany.
  • Sudmanns, M., Tiede, D., Lang, S., Bergstedt, H., Trost, G., Augustin, H., … Blaschke, T. (2020). Big earth data: Disruptive changes in earth observation data management and analysis? International Journal of Digital Earth, 13(7), 832–850. doi:10.1080/17538947.2019.1585976
  • Sun, K., Zhu, Y., Pan, P., Hou, Z., Wang, D., Li, W., & Song, J. (2019). Geospatial data ontology: The semantic foundation of geospatial data integration and sharing. Big Earth Data, 3(3), 269–296. doi:10.1080/20964471.2019.1661662
  • Takahashi, A., Tatedoko, M., Kinutani, H., & Yoshikawa, M. (2009). Metadata management for integration and analysis of earth observation data. In Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication (pp. 123–130). Suwon, Korea.
  • Wang, F., Thiel, F., Furrer, D., Vergara-Niedermayr, C., Qin, C., Hackenberg, G., and Wang, M. (2008). An adaptable xml based approach for scientific data management and integration. In Proceedings of SPIE 6919, Medical Imaging 2008: PACS and Imaging Informatics, 69190K (pp. 141–150). San Diego, United States. doi: 10.1117/12.773154
  • Wang, L., Jia, M., Yin, D., & Tian, J. (2019). A review of remote sensing for mangrove forests: 1956–2018. Remote Sensing of Environment, 231, 111223. doi:10.1016/j.rse.2019.111223
  • Wang, S., Li, G., Yao, X., Zeng, Y., Pang, L., & Zhang, L. (2019). A distributed storage and access approach for massive remote sensing data in mongodb. ISPRS International Journal of Geo-Information, 8(12), 533. doi:10.3390/ijgi8120533
  • Wang, R., Ben, J., Zhou, J., & Zheng, M. (2021). A generic encoding and operation scheme for mixed aperture three and four hexagonal discrete global grid systems. International Journal of Geographical Information Science, 35(3), 513–555. doi:10.1080/13658816.2020.1763363
  • Wu, W., Liu, W., Qiao, D., & Jie, D. (2012). Investigation on the development of deep space exploration. Science China Technological Sciences, 55(4), 1086–1091. doi:10.1007/s11431-012-4759-z
  • Wu, Z., Shen, Y., Wang, H., & Wu, M. (2020). An ontology-based framework for heterogeneous data management and its application for urban flood disasters. Earth Science Informatics, 13(2), 377–390. doi:10.1007/s12145-019-00439-3
  • Yan, J., Chen, X., Chen, Y., & Liang, D. (2020). Multistep prediction of land cover from dense time series remote sensing images with temporal convolutional networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 5149–5161. doi:10.1109/JSTARS.2020.3020839
  • Yao, X., Li, G., Xia, J., Ben, J., Cao, Q., Zhao, L., … Zhu, D. (2020). Enabling the big earth observation data via cloud computing and DGGS: Opportunities and challenges. Remote Sensing, 12(1), 62. doi:10.3390/rs12010062
  • Ye, P., Yang, M., Peng, J., Li, Q., Dong, Y., Zhang, Z., … Zou, L. (2015). Review and prospect of atmospheric entry and earth reentry technology of China deep space exploration. Scientia Sinica Technologica, 45(3), 229–238. doi:10.1360/N092015-00049
  • Zhu, Q., Li, F., & Zhang, Y. (2007). Unified representation of 3D city models. Journal of Changan University, 27(1), 54–58. doi:10.3321/j.issn:1671-8879.2007.01.013