609
Views
3
CrossRef citations to date
0
Altmetric
Research Articles

A machine learning approach to extracting spatial information from geological texts in Chinese

ORCID Icon, ORCID Icon, , , , , & show all
Pages 2169-2193 | Received 19 Jul 2021, Accepted 05 Jun 2022, Published online: 15 Jun 2022
 

Abstract

Texts have become an important spatial data resource. Interpretation of unstructured geoscience texts using natural language processing methods can effectively facilitate the discovery and retrieval of geographic information. Yet studies on the extraction of spatial information from textual geoscience data are limited compared to digital geoscience data. In this work, a machine learning approach is proposed for mining spatial relations in Chinese geological texts. The approach views spatial relation extraction as a sequence labeling problem, avoids the division of relation categories, and enables mining fine-grained spatial relations. The extracted geological texts commonly describe three-dimensional spatial relations among regions, strata, and lithologies. The extracted spatial relations are classified into three major categories (topological relations, absolute directional relations and relative directional relations) and 14 subcategories. We validated the proposed model with a test dataset, constructed visual displays of the extracted spatial relations on different topics, and quantified the uncertainty in the process from spatial entity recognition to spatial relation extraction. With the detailed portrayal of these spatial relations, this study provides support for solving theoretical and practical problems of cognition, prediction, decision-making, and evaluation in geoscience.

Acknowledgements

Thanks to Professor Christophe Claramunt and all reviewers for their careful reading and insightful comments on this article. And thanks to Professor May Yuan for her guidance in writing.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data and codes availability statement

The simulation data and codes that support the findings of this study are available at https://doi.org/10.6084/m9.figshare.19326413. ‘dataFlag.py’ uses the BIOES method to label the sentences in relation_data.txt; ‘change_w2id’ is used to convert all sentences after labeling to ids and set the batch size, with sentences in each batch padded to the same length; ‘model.py’ is the body of the model we build; In ‘main.py’ we can adjust the parameters of the model, as well as the input of data and the running of the model. After the model is trained, we can input a new set of data relationship_data_test.txt and use ‘recognize_relation.py’ to get the result of RE. The geological text data can available from https://geocloud.cgs.gov.cn/. The geological reports that support the practical experiment of this study cannot be made publicly due to data use restrictions.

Additional information

Funding

This work was supported by the National Key Research and Development Program of China [No.2016YFB0502300] and Geological Survey Project [No.12120114074001].

Notes on contributors

Deping Chu

Deping Chu is a PhD at China University of Geosciences, Wuhan. His research interests include geoscience big data mining, geoscience knowledge graph and geographic information science. The main contributions to the paper are conceptualization, methodology, writing-original draft.

Bo Wan

Bo Wan is a professor at China University of Geosciences, Wuhan. His research interests include geographic information science, geospatial modeling techniques and geoscience process simulation. The main contributions to the paper are writing-review, editing and Supervision.

Hong Li

Hong Li is a PhD at China University of Geosciences, Wuhan. Her research interests include geospatial modeling techniques and geoscience process simulation. The main contributions to the paper are writing-review and supervision.

Shuai Dong

Shuai Dong is a master at China University of Geosciences, Wuhan. His research interests include text data mining and machine learning. The main contribution to the paper is supervision.

Jinming Fu

Jinming Fu is a master at China University of Geosciences, Wuhan. His research interests include geospatial modeling techniques and machine learning. The main contribution to the paper is supervision.

Yiyang Liu

Yiyang Liu is a master at China University of Geosciences, Wuhan. His research interests include machine learning and data standardization. The main contribution to the paper is supervision.

Kuan Huang

Kuan Huang is a senior engineer at Wuhan Zondy Cyber. His research interests include underground space simulation and visualization. The main contribution to the paper is supervision.

Hui Liu

Hui Liu is a senior engineer at Wuhan Zondy Cyber. His research interests include underground space simulation and visualization. The main contribution to the paper is supervision.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.