ABSTRACT
Missing data is a common problem in the analysis of geospatial information. Existing methods introduce spatiotemporal dependencies to reduce imputing errors yet ignore ease of use in practice. Classical interpolation models are easy to build and apply; however, their imputation accuracy is limited due to their inability to capture spatiotemporal characteristics of geospatial data. Consequently, a lightweight ensemble model was constructed by modelling the spatiotemporal dependencies in a classical interpolation model. Temporally, the average correlation coefficients were introduced into a simple exponential smoothing model to automatically select the time window which ensured that the sample data had the strongest correlation to missing data. Spatially, the Gaussian equivalent and correlation distances were introduced in an inverse distance-weighting model, to assign weights to each spatial neighbor and sufficiently reflect changes in the spatiotemporal pattern. Finally, estimations of the missing values from temporal and spatial were aggregated into the final results with an extreme learning machine. Compared to existing models, the proposed model achieves higher imputation accuracy by lowering the mean absolute error by 10.93 to 52.48% in the road network dataset and by 23.35 to 72.18% in the air quality station dataset and exhibits robust performance in spatiotemporal mutations.
Acknowledgments
This research is supported by [the Strategic Priority Research Program of the Chinese Academy of Sciences #1] under Grant [number XDA23010202]; [the Regional Key Project under Science and Technology Service Network Initiative of Chinese Academy of Sciences #2] under Grant [number KFJ-STS-QYZD-xxx]; [the China Postdoctoral Science Foundation #3] under Grant [number 2019M660774]. Their supports are gratefully acknowledged. And we are also grateful to Prof. May Yuan, Prof. Bo Huang and the anonymous referees for their helpful comments and suggestions.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data and codes availability statement
The data and codes that support the findings of this study are available in ‘figshare.com’ with the identifier https://doi.org/10.6084/m9.figshare.11328584.v2.
Additional information
Funding
Notes on contributors
Shifen Cheng
Shifen Cheng is a Ph. D. candidate with the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China. His research interests are spatiotemporal data mining in transportation systems.
Peng Peng
Peng Peng is a postdoctoral researcher at the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. His research interests are spatiotemporal data mining in transportation systems.
Feng Lu
Feng Lu is a Professor with the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. His research interests cover trajectory data mining, computational transportation science and location-based services.