2,173
Views
4
CrossRef citations to date
0
Altmetric
Research Article

Are prominent mountains frequently mentioned in text? Exploring the spatial expressiveness of text frequency

&
Pages 856-873 | Received 19 Jul 2017, Accepted 13 Dec 2017, Published online: 26 Dec 2017
 

ABSTRACT

Data-driven GIScience shows a growing interest in making spatial information from large text data. In this paper, we quantify and thus evaluate the relation between text frequency and properties of the outer-text, geographic setting by comparing text frequencies of mountain names to the respective geomorphometric characteristics. We focus on some 2000 unique mountain names that appear some 50,000 times in a large compilation of texts on Swiss alpine history. The results on the full data set suggest only a weak relation: only 5–10% of the variation in the text frequency being explained by the respective geomorphometric characteristics. However, an analysis of multiple scales allows us to identify a Simpson’s Paradox. What appears to be ‘noise’ in the analysis of all mountains in the whole of Switzerland shows significant local signals. Small spatial extents, found all over Switzerland, can show considerably strong correlations between text frequency and spatial prominence, with up to 90% of the total variation explained. We argue that our findings have practical implications for data-driven GIScience. Retrieving meaningful spatial information from text might only be possible if the spatial scale of analysis reflects the spatial scale described in the input text documents.

Acknowledgments

We would like to express our special thanks to Ross S. Purves for his helpful comments to an early version of the manuscript.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. The details of the named entity recognition in the current version of the corpus are described in the corpus documentation (in German), available at http://www.textberg.ch/ReleaseNotes/README_Release_151_v01.htm.

2. We relate to these two measures as frequency (i.e. mountain toponym frequency) and prominence (i.e. spatial mountain prominence) in the rest of this discussion.

Additional information

Funding

This research was supported by the University Research Priority Program ‘Language and Space’ at the University of Zurich.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.