ABSTRACT
With large amounts of digital map archives becoming available, automatically extracting information from scanned historical maps is needed for many domains that require long-term historical geographic data. Convolutional Neural Networks (CNN) are powerful techniques that can be used for extracting locations of geographic features from scanned maps if sufficient representative training data are available. Existing spatial data can provide the approximate locations of corresponding geographic features in historical maps and thus be useful to annotate training data automatically. However, the feature representations, publication date, production scales, and spatial reference systems of contemporary vector data are typically very different from those of historical maps. Hence, such auxiliary data cannot be directly used for annotation of the precise locations of the features of interest in the scanned historical maps. This research introduces an automatic vector-to-raster alignment algorithm based on reinforcement learning to annotate precise locations of geographic features on scanned maps. This paper models the alignment problem using the reinforcement learning framework, which enables informed, efficient searches for matching features without pre-processing steps, such as extracting specific feature signatures (e.g. road intersections). The experimental results show that our algorithm can be applied to various features (roads, water lines, and railroads) and achieve high accuracy.
Acknowledgments
This material is based on research sponsored in part by the National Science Foundation under Grant Nos. IIS 1563933 (to the University of Colorado at Boulder) and IIS 1564164 (to the University of Southern California).
Disclosure statement
No potential conflict of interest was reported by the authors.
Data and codes availability statement
The source code and data that support the findings in this study are available on figshare. The source code of the vector-to-raster alignment algorithm is available on https://doi.org/10.6084/m9.figshare.10025051.v1. For the data, the contemporary vector data and the USGS topographic maps are available on https://doi.org/10.6084/m9.figshare.9999863.v1 and found on https://doi.org/10.6084/m9.figshare.9999740.v1, respectively.
Notes
1. https://cs.stanford.edu/~roozbeh/pascal-context/.
2. This method is based on our previous work published in the first GeoAI Workshop (Duan et al. Citation2017a).
3. However, the poor quality of the original scanned maps may violate the assumption that the color for the geographic feature is consistent and distinguishable and may have a negative impact on the accuracy of our algorithm. For example, when the original maps are prone to color bleaching (Leyk and Boesch Citation2010, Chiang et al. Citation2014), the color cannot be used as a distinct property for the geographic feature. As a consequence, our reward function, which is designed based on the color property, cannot provide accurate rewards to actions. The color property should be valid and consistent for most of the scanned maps edited in recent decades.
4. The manually aligned vector data are reviewed by researchers who are not on the author list of this paper but are affiliated with the same research institute.
Additional information
Funding
Notes on contributors
Weiwei Duan
Weiwei Duan is a Ph.D. student majoring in Computer Science at the University of Southern California (USC). She is working on building a computer-vision-based system for extracting information on georeferenced images and storing them in a structured format for analysis. The system localizes geographic objects on images by integrating geospatial information and using limited noisy labeling data. Her research interests are computer vision, knowledge graphs, and machine learning.
Yao-Yi Chiang
Yao-Yi Chiang, Ph.D., is an Associate Professor (Research) in Spatial Sciences, the Director of the Spatial Computing Laboratory, and the Associate Director of the NSF's Integrated Media Systems Center (IMSC) at the University of Southern California (USC). He is also a faculty in Data Science in the USC Viterbi Data Science M.S. program. Dr. Chiang received his Ph.D. degree in Computer Science from the University of Southern California; his bachelor's degree in Information Management from the National Taiwan University. His current research combines spatial science theories with computer algorithms to enable the discovery of useful insights from heterogeneous data for solving real-world problems. His research interests include information integration, machine learning, data mining, computer vision, and knowledge graphs. Before USC, Dr. Chiang worked as a research scientist for Geosemble Technologies and Fetch Technologies in California. Geosemble Technologies was founded based on a patent on geospatial data fusion techniques, and he was a co-inventor.
Stefan Leyk
Stefan Leyk is an Associate Professor at the Department of Geography, University of Colorado Boulder and a Research Fellow at the Institute of Behavioral Science. He is a Geographical Information Scientist with research interests in information extraction, spatio-temporal modeling and socio-environmental systems. In his work he uses various sources of historical spatial data to better understand the evolution of human systems and how the built environment interacts with environmental processes in the context of land use and natural hazards.
Johannes H. Uhl
Johannes H. Uhl received a diploma degree in surveying and geomatics from Karlsruhe University of Applied Sciences, Germany, in 2009 and a double graduation M.Sc. degree in geomatics from Karlsruhe University of Applied Sciences, Germany and in cartography and geodesy from Polytechnic University of Valencia (UPV), Spain in 2011. He received his Ph.D. degree in geographic information science from University of Colorado, Boulder, USA in 2019.In 2008/2009, he worked as a student intern at Department of Photogrammetry and Image Analysis (IMF) at the German Aerospace Center, Oberpfaffenhofen, Germany. From 2012 to 2015, he worked as Geospatial Data Analyst and Software Developer at Pfalzwerke Netz AG, Ludwigshafen, Germany, and from 2016 to 2019 as Graduate Research Assistant at the Department of Geography, University of Colorado, Boulder, USA. Since 2019, he is a Post-Doctoral Research Associate at the University of Colorado Population Center (CUPC), Institute of Behavioral Science at University of Colorado, Boulder, USA. His research interests include the efficient integration and analysis of large geospatial datasets, spatio-temporal modeling and information extraction based on multi-source geospatial data using machine learning, image processing and statistical analysis, as well as uncertainty quantification and modeling of spatio-temporal data. Dr. Uhl was a recipient of the Best Paper Award at the International Conference of Pattern Recognition Systems (ICPRS) 2017 in Madrid, Spain and received the Gilbert F. White Fellowship from University of Colorado in 2018.
Craig A. Knoblock
Craig A. Knoblock is Executive Director of the Information Sciences Institute of the University of Southern California (USC), Research Professor of both Computer Science and Spatial Sciences at USC, and Director of the Data Science Program at USC. He received his Bachelor of Science degree from Syracuse University and his Master’s and Ph.D. from Carnegie Mellon University in computer science. His research focuses on techniques for describing, acquiring, and exploiting the semantics of data. He has worked extensively on source modeling, schema and ontology alignment, entity and record linkage, data cleaning and normalization, extracting data from the Web, and combining all of these techniques to build knowledge graphs. He has published more than 300 journal articles, book chapters, and conference papers on these topics and has received 7 best paper awards on this work. Dr. Knoblock is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Fellow of the Association of Computing Machinery (ACM), past President and Trustee of the International Joint Conference on Artificial Intelligence (IJCAI), and winner of the Robert S. Engelmore Award.