Abstract
Texts have become an important spatial data resource. Interpretation of unstructured geoscience texts using natural language processing methods can effectively facilitate the discovery and retrieval of geographic information. Yet studies on the extraction of spatial information from textual geoscience data are limited compared to digital geoscience data. In this work, a machine learning approach is proposed for mining spatial relations in Chinese geological texts. The approach views spatial relation extraction as a sequence labeling problem, avoids the division of relation categories, and enables mining fine-grained spatial relations. The extracted geological texts commonly describe three-dimensional spatial relations among regions, strata, and lithologies. The extracted spatial relations are classified into three major categories (topological relations, absolute directional relations and relative directional relations) and 14 subcategories. We validated the proposed model with a test dataset, constructed visual displays of the extracted spatial relations on different topics, and quantified the uncertainty in the process from spatial entity recognition to spatial relation extraction. With the detailed portrayal of these spatial relations, this study provides support for solving theoretical and practical problems of cognition, prediction, decision-making, and evaluation in geoscience.
Acknowledgements
Thanks to Professor Christophe Claramunt and all reviewers for their careful reading and insightful comments on this article. And thanks to Professor May Yuan for her guidance in writing.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data and codes availability statement
The simulation data and codes that support the findings of this study are available at https://doi.org/10.6084/m9.figshare.19326413. ‘dataFlag.py’ uses the BIOES method to label the sentences in relation_data.txt; ‘change_w2id’ is used to convert all sentences after labeling to ids and set the batch size, with sentences in each batch padded to the same length; ‘model.py’ is the body of the model we build; In ‘main.py’ we can adjust the parameters of the model, as well as the input of data and the running of the model. After the model is trained, we can input a new set of data relationship_data_test.txt and use ‘recognize_relation.py’ to get the result of RE. The geological text data can available from https://geocloud.cgs.gov.cn/. The geological reports that support the practical experiment of this study cannot be made publicly due to data use restrictions.
Additional information
Funding
Notes on contributors
Deping Chu
Deping Chu is a PhD at China University of Geosciences, Wuhan. His research interests include geoscience big data mining, geoscience knowledge graph and geographic information science. The main contributions to the paper are conceptualization, methodology, writing-original draft.
Bo Wan
Bo Wan is a professor at China University of Geosciences, Wuhan. His research interests include geographic information science, geospatial modeling techniques and geoscience process simulation. The main contributions to the paper are writing-review, editing and Supervision.
Hong Li
Hong Li is a PhD at China University of Geosciences, Wuhan. Her research interests include geospatial modeling techniques and geoscience process simulation. The main contributions to the paper are writing-review and supervision.
Shuai Dong
Shuai Dong is a master at China University of Geosciences, Wuhan. His research interests include text data mining and machine learning. The main contribution to the paper is supervision.
Jinming Fu
Jinming Fu is a master at China University of Geosciences, Wuhan. His research interests include geospatial modeling techniques and machine learning. The main contribution to the paper is supervision.
Yiyang Liu
Yiyang Liu is a master at China University of Geosciences, Wuhan. His research interests include machine learning and data standardization. The main contribution to the paper is supervision.
Kuan Huang
Kuan Huang is a senior engineer at Wuhan Zondy Cyber. His research interests include underground space simulation and visualization. The main contribution to the paper is supervision.
Hui Liu
Hui Liu is a senior engineer at Wuhan Zondy Cyber. His research interests include underground space simulation and visualization. The main contribution to the paper is supervision.