ABSTRACT
Place descriptions are used in everyday communication as a common way to convey spatial information. Processing the information from place descriptions poses multiple significant challenges because these descriptions are written in natural language. In particular, corpora of place descriptions provide a plethora of human spatial knowledge beyond geographical information system, even if these descriptions refer to the same places in various ways. This article focuses on resolving ambiguous or synonymous place names from place descriptions by exploring the given relationships with other spatial features. It matches place names from multiple descriptions by developing a novel labelled graph matching process that relies solely on the comparison of string, linguistic and spatial similarities between identified places. This process uses unstructured place descriptions as an input, and produces a composite place graph with qualitative spatial relations from the descriptions. The performance of this novel process exceeds current toponym resolution by coping with non-gazetteered places.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. We refer to the general definition and notion: ‘Places are the conceptual entities that enable cognitive structuring of the spatial aspects of reality’ (Bennett and Agarwal Citation2007), and ‘Places are typically determined by entities in the geographic environment or by relations between entities in the environment rather than by externally imposed coordinates and geometric properties’ (Winter and Freksa Citation2012).
2. There are exceptions, for example, georeferenced Wikipedia entries.
3. The national topographic database provided by Ordnance Survey, Britain’s national mapping agency.
4. The sensitivity (or the recall = true positives / (true positives + false negatives)) represents the proportion of positives that actually belong to the correct pairs.
5. The specificity (=true negatives / (true negatives + false positives)) is the proportion of negatives that are correctly identified.
6. The precision represents the number of correctly matched pairs (true positives) divided by the total number of pairs matched as belonging to the correct pairs (the sum of true positives and false positives).