ABSTRACT
Geospatial data conflation is aimed at matching counterpart features from two or more data sources in order to combine and better utilize information in the data. Due to the importance of conflation in spatial analysis, different approaches to the conflation problem have been proposed ranging from simple buffer-based methods to probability and optimization based models. In this paper, I propose a formal framework for conflation that integrates two powerful tools of geospatial computation: optimization and relational databases. I discuss the connection between the relational database theory and conflation, and demonstrate how the conflation process can be formulated and carried out in standard relational databases. I also propose a set of new optimization models that can be used inside relational databases to solve the conflation problem. The optimization models are based on the minimum cost circulation problem in operations research (also known as the network flow problem), which generalizes existing optimal conflation models that are primarily based on the assignment problem. Using comparable datasets, computational experiments show that the proposed conflation method is effective and outperforms existing optimal conflation models by a large margin. Given its generality, the new method may be applicable to other data types and conflation problems.
Acknowledgments
I would like to thank the three anonymous reviewers for their constructive comments during the review process, which helped improving the content and presentation of this paper.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data and codes availability statement
The data that support the findings of this study are available in [figshare.com] with the identifier(s) [10.6084/m9.figshare.12400724]. The back-end database code is included in the above archive and also available as three modules at https://gitee.com/leitl/dbconflation, https://gitee.com/leitl/pggeos, and https://gitee.com/leitl/pgnetworkflow, respectively. See the README file in https://gitee.com/leitl/dbconflation for detail. The author would like to acknowledge the joint-effort with Prof. Zhen Lei at Wuhan University of Technology for developing the pggeos module.
Supplementay material
Supplemental data for this article can be accessed here.
Additional information
Funding
Notes on contributors
Ting L. Lei
Ting Lei is an assistant professor with the department of Geography and Atmospheric Science at University of Kansas, KS, U.S.A, where he teaches courses in GIS and Remote Sensing. His research interests includes GIS, network and location analysis, geospatial algorithms, spatial databases, operations research and the applications of the preceding fields in the analysis of urban, transportation and service systems.