ABSTRACT
Address matching is a crucial step in geocoding, which plays an important role in urban planning and management. To date, the unprecedented development of location-based services has generated a large amount of unstructured address data. Traditional address matching methods mainly focus on the literal similarity of address records and are therefore not applicable to the unstructured address data. In this study, we introduce an address matching method based on deep learning to identify the semantic similarity between address records. First, we train the word2vec model to transform the address records into their corresponding vector representations. Next, we apply the enhanced sequential inference model (ESIM), a deep text-matching model, to make local and global inferences to determine if two addresses match. To evaluate the accuracy of the proposed method, we fine-tune the model with real-world address data from the Shenzhen Address Database and compare the outputs with those of several popular address matching methods. The results indicate that the proposed method achieves a higher matching accuracy for unstructured address records, with its precision, recall, and F1 score (i.e., the harmonic mean of precision and recall) reaching 0.97 on the test set.
Acknowledgments
We acknowledge Qin Tian for his valuable suggestions on the methodology of this study. We also appreciate the insightful comments from the associate editor, Christophe Claramunt, and all the anonymous reviewers.
Data and code availability statement
The first 498,294 records of the corpus derived from the Shenzhen Address Database, the labelled address dataset for semantic address matching and codes that support the findings of this study are available in Zenodo with the identifiers doi: 10.5281/zenodo.3477633 (part of the corpus for word2vec training), doi: 10.5281/zenodo.3477007 (labelled address dataset for semantic address matching) and doi: 10.5281/zenodo.3476673 (codes). Complete corpus from the Shenzhen Address Database cannot be made publicly available to protect personal information and to follow the national policy on data security.
Disclosure statement
No potential conflict of interest was reported by the authors.
Additional information
Funding
Notes on contributors
Yue Lin
Yue Lin is a first-year PhD student in the Department of Geography at the Ohio State University. She received her bachelor’s degree in Geographical Information Science from School of Resource and Environmental Sciences at Wuhan University. Her research interests include GeoAI, spatial data analysis, and urban geography.
Mengjun Kang
Mengjun Kang is an associate professor in School of Resource and Environmental Sciences at Wuhan University. His research interests include geocoding, urban addresses, and digital mapping.
Yuyang Wu
Yuyang Wu is an undergraduate student in School of Geography and Information Engineering at China University of Geosciences.
Qingyun Du
Qingyun Du is a professor and dean in School of Resource and Environmental Sciences at Wuhan University. His research interests include GIScience and natural language representations of spatial information.
Tao Liu
Tao Liu is a professor in Faculty of Geomatics at Lanzhou Jiaotong University.