461
Views
4
CrossRef citations to date
0
Altmetric
Research Articles

Geospatial data conflation: a formal approach based on optimization and relational databases

ORCID Icon
Pages 2296-2334 | Received 07 May 2019, Accepted 31 May 2020, Published online: 16 Jun 2020
 

ABSTRACT

Geospatial data conflation is aimed at matching counterpart features from two or more data sources in order to combine and better utilize information in the data. Due to the importance of conflation in spatial analysis, different approaches to the conflation problem have been proposed ranging from simple buffer-based methods to probability and optimization based models. In this paper, I propose a formal framework for conflation that integrates two powerful tools of geospatial computation: optimization and relational databases. I discuss the connection between the relational database theory and conflation, and demonstrate how the conflation process can be formulated and carried out in standard relational databases. I also propose a set of new optimization models that can be used inside relational databases to solve the conflation problem. The optimization models are based on the minimum cost circulation problem in operations research (also known as the network flow problem), which generalizes existing optimal conflation models that are primarily based on the assignment problem. Using comparable datasets, computational experiments show that the proposed conflation method is effective and outperforms existing optimal conflation models by a large margin. Given its generality, the new method may be applicable to other data types and conflation problems.

Acknowledgments

I would like to thank the three anonymous reviewers for their constructive comments during the review process, which helped improving the content and presentation of this paper.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data and codes availability statement

The data that support the findings of this study are available in [figshare.com] with the identifier(s) [10.6084/m9.figshare.12400724]. The back-end database code is included in the above archive and also available as three modules at https://gitee.com/leitl/dbconflation, https://gitee.com/leitl/pggeos, and https://gitee.com/leitl/pgnetworkflow, respectively. See the README file in https://gitee.com/leitl/dbconflation for detail. The author would like to acknowledge the joint-effort with Prof. Zhen Lei at Wuhan University of Technology for developing the pggeos module.

Supplementay material

Supplemental data for this article can be accessed here.

Additional information

Funding

This work was supported by the National Natural Science Foundation of China [41971334]; University of Kansas [New Faculty General Research Fund].

Notes on contributors

Ting L. Lei

Ting Lei is an assistant professor with the department of Geography and Atmospheric Science at University of Kansas, KS, U.S.A, where he teaches courses in GIS and Remote Sensing. His research interests includes GIS, network and location analysis, geospatial algorithms, spatial databases, operations research and the applications of the preceding fields in the analysis of urban, transportation and service systems.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.