1,166
Views
15
CrossRef citations to date
0
Altmetric
Research Articles

Automatic alignment of contemporary vector data and georeferenced historical maps using reinforcement learning

, , ORCID Icon, ORCID Icon &
Pages 824-849 | Received 15 May 2018, Accepted 25 Nov 2019, Published online: 09 Dec 2019
 

ABSTRACT

With large amounts of digital map archives becoming available, automatically extracting information from scanned historical maps is needed for many domains that require long-term historical geographic data. Convolutional Neural Networks (CNN) are powerful techniques that can be used for extracting locations of geographic features from scanned maps if sufficient representative training data are available. Existing spatial data can provide the approximate locations of corresponding geographic features in historical maps and thus be useful to annotate training data automatically. However, the feature representations, publication date, production scales, and spatial reference systems of contemporary vector data are typically very different from those of historical maps. Hence, such auxiliary data cannot be directly used for annotation of the precise locations of the features of interest in the scanned historical maps. This research introduces an automatic vector-to-raster alignment algorithm based on reinforcement learning to annotate precise locations of geographic features on scanned maps. This paper models the alignment problem using the reinforcement learning framework, which enables informed, efficient searches for matching features without pre-processing steps, such as extracting specific feature signatures (e.g. road intersections). The experimental results show that our algorithm can be applied to various features (roads, water lines, and railroads) and achieve high accuracy.

Acknowledgments

This material is based on research sponsored in part by the National Science Foundation under Grant Nos. IIS 1563933 (to the University of Colorado at Boulder) and IIS 1564164 (to the University of Southern California).

Disclosure statement

No potential conflict of interest was reported by the authors.

Data and codes availability statement

The source code and data that support the findings in this study are available on figshare. The source code of the vector-to-raster alignment algorithm is available on https://doi.org/10.6084/m9.figshare.10025051.v1. For the data, the contemporary vector data and the USGS topographic maps are available on https://doi.org/10.6084/m9.figshare.9999863.v1 and found on https://doi.org/10.6084/m9.figshare.9999740.v1, respectively.

Notes

1. https://cs.stanford.edu/~roozbeh/pascal-context/.

2. This method is based on our previous work published in the first GeoAI Workshop (Duan et al. Citation2017a).

3. However, the poor quality of the original scanned maps may violate the assumption that the color for the geographic feature is consistent and distinguishable and may have a negative impact on the accuracy of our algorithm. For example, when the original maps are prone to color bleaching (Leyk and Boesch Citation2010, Chiang et al. Citation2014), the color cannot be used as a distinct property for the geographic feature. As a consequence, our reward function, which is designed based on the color property, cannot provide accurate rewards to actions. The color property should be valid and consistent for most of the scanned maps edited in recent decades.

4. The manually aligned vector data are reviewed by researchers who are not on the author list of this paper but are affiliated with the same research institute.

Additional information

Funding

This work was supported by the National Science Foundation [1563933,1564164].

Notes on contributors

Weiwei Duan

Weiwei Duan is a Ph.D. student majoring in Computer Science at the University of Southern California (USC). She is working on building a computer-vision-based system for extracting information on georeferenced images and storing them in a structured format for analysis. The system localizes geographic objects on images by integrating geospatial information and using limited noisy labeling data. Her research interests are computer vision, knowledge graphs, and machine learning.

Yao-Yi Chiang

Yao-Yi Chiang, Ph.D., is an Associate Professor (Research) in Spatial Sciences, the Director of the Spatial Computing Laboratory, and the Associate Director of the NSF's Integrated Media Systems Center (IMSC) at the University of Southern California (USC). He is also a faculty in Data Science in the USC Viterbi Data Science M.S. program. Dr. Chiang received his Ph.D. degree in Computer Science from the University of Southern California; his bachelor's degree in Information Management from the National Taiwan University. His current research combines spatial science theories with computer algorithms to enable the discovery of useful insights from heterogeneous data for solving real-world problems. His research interests include information integration, machine learning, data mining, computer vision, and knowledge graphs. Before USC, Dr. Chiang worked as a research scientist for Geosemble Technologies and Fetch Technologies in California. Geosemble Technologies was founded based on a patent on geospatial data fusion techniques, and he was a co-inventor.

Stefan Leyk

Stefan Leyk is an Associate Professor at the Department of Geography, University of Colorado Boulder and a Research Fellow at the Institute of Behavioral Science. He is a Geographical Information Scientist with research interests in information extraction, spatio-temporal modeling and socio-environmental systems. In his work he uses various sources of historical spatial data to better understand the evolution of human systems and how the built environment interacts with environmental processes in the context of land use and natural hazards.

Johannes H. Uhl

Johannes H. Uhl received a diploma degree in surveying and geomatics from Karlsruhe University of Applied Sciences, Germany, in 2009 and a double graduation M.Sc. degree in geomatics from Karlsruhe University of Applied Sciences, Germany and in cartography and geodesy from Polytechnic University of Valencia (UPV), Spain in 2011. He received his Ph.D. degree in geographic information science from University of Colorado, Boulder, USA in 2019.In 2008/2009, he worked as a student intern at Department of Photogrammetry and Image Analysis (IMF) at the German Aerospace Center, Oberpfaffenhofen, Germany. From 2012 to 2015, he worked as Geospatial Data Analyst and Software Developer at Pfalzwerke Netz AG, Ludwigshafen, Germany, and from 2016 to 2019 as Graduate Research Assistant at the Department of Geography, University of Colorado, Boulder, USA. Since 2019, he is a Post-Doctoral Research Associate at the University of Colorado Population Center (CUPC), Institute of Behavioral Science at University of Colorado, Boulder, USA. His research interests include the efficient integration and analysis of large geospatial datasets, spatio-temporal modeling and information extraction based on multi-source geospatial data using machine learning, image processing and statistical analysis, as well as uncertainty quantification and modeling of spatio-temporal data. Dr. Uhl was a recipient of the Best Paper Award at the International Conference of Pattern Recognition Systems (ICPRS) 2017 in Madrid, Spain and received the Gilbert F. White Fellowship from University of Colorado in 2018.

Craig A. Knoblock

Craig A. Knoblock is Executive Director of the Information Sciences Institute of the University of Southern California (USC), Research Professor of both Computer Science and Spatial Sciences at USC, and Director of the Data Science Program at USC.   He received his Bachelor of Science degree from Syracuse University and his Master’s and Ph.D. from Carnegie Mellon University in computer science. His research focuses on techniques for describing, acquiring, and exploiting the semantics of data.  He has worked extensively on source modeling, schema and ontology alignment, entity and record linkage, data cleaning and normalization, extracting data from the Web, and combining all of these techniques to build knowledge graphs.  He has published more than 300 journal articles, book chapters, and conference papers on these topics and has received 7 best paper awards on this work.  Dr. Knoblock is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Fellow of the Association of Computing Machinery (ACM), past President and Trustee of the International Joint Conference on Artificial Intelligence (IJCAI), and winner of the Robert S. Engelmore Award.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.