ABSTRACT
High performance computing is required for fast geoprocessing of geospatial big data. Using spatial domains to represent computational intensity (CIT) and domain decomposition for parallelism are prominent strategies when designing parallel geoprocessing applications. Traditional domain decomposition is limited in evaluating the computational intensity, which often results in load imbalance and poor parallel performance. From the data science perspective, machine learning from Artificial Intelligence (AI) shows promise for better CIT evaluation. This paper proposes a machine learning approach for predicting computational intensity, followed by an optimized domain decomposition, which divides the spatial domain into balanced subdivisions based on the predicted CIT to achieve better parallel performance. The approach provides a reference framework on how various machine learning methods including feature selection and model training can be used in predicting computational intensity and optimizing parallel geoprocessing against different cases. Some comparative experiments between the approach and traditional methods were performed using the two cases, DEM generation from point clouds and spatial intersection on vector data. The results not only demonstrate the advantage of the approach, but also provide hints on how traditional GIS computation can be improved by the AI machine learning.
Acknowledgments
We appreciate the four reviewers and editor(s) for their constructive comments that helped improve the quality of the paper. The work was supported by Major State Research Development Program of China (No. 2017YFB0503704), National Natural Science Foundation of China (No. 41722109), Hubei Provincial Natural Science Foundation of China (No. 2018CFA053), and Wuhan Yellow Crane Talents (Science) Program (2016).
Disclosure statement
No potential conflict of interest was reported by the authors.
Data availability statement
The sample data and codes that support the findings of this study are available in https://figshare.com/s/d43f17aaa952c09db007 . The original source files for the second point cloud (Section 4.1) and vector data are subject to the data agreement with Mr. Zheng Huang and SOUTH DIGITAL TECHNOLOGY CO., LTD respectively and cannot be shared publicly.
Additional information
Funding
Notes on contributors
Peng Yue
Dr. Peng Yue is a professor from Wuhan University. He serves as the director at Hubei Province Engineering Center for Intelligent Geoprocessing, the director at the Institute of Geospatial Information and Location Based Services, and the director at WHU-SOUTHGIS Joint Research Center for Spatio-temporal Big Data. He is the former Chair of the IEEE Geoscience and Remote Sensing Society Earth Science Informatics Technical Committee, Chair of OGC-China Forum, and China national committee member of International Society for Digital Earth. He was awarded the Excellent Young Scientist program from National Natural Science Foundation of China, the Outstanding Young Scholar of the National Ten Thousand Talents Program, and Yangtze River Young Scholar program. He also got several first prizes of Natural Science awards from Hubei Province and Ministry of Education in China, a first place with the Crystal Bull Award in the 2013 Europa Challenge, and 2017 OpenMI association Award for outstanding contributions.
Fan Gao
Mr. Fan Gao is a Ph.D. student in the School of Remote Sensing and Information Engineering at Wuhan University. His research interest is on high performance geoprocessing and GIS.
Boyi Shangguan
Mr. Boyi Shangguan is a Ph.D. student in the School of Remote Sensing and Information Engineering at Wuhan University. His research interest is on high performance geoprocessing and GIService.
Zheren Yan
Mr. Zheren Yan is a Ph.D. student in the School of Remote Sensing and Information Engineering at Wuhan University. His research interest is on high performance geoprocessing and geoparsing.