Are prominent mountains frequently mentioned in text? Exploring the spatial expressiveness of text frequency

Curdin DerungsDepartment of Geography, University of Zurich, Zurich, Switzerland;URPP Language and Space, University of Zurich, Zurich, SwitzerlandCorrespondence[email protected]
View further author information

Tanja SamardžićURPP Language and Space, University of Zurich, Zurich, SwitzerlandView further author information

ABSTRACT

Data-driven GIScience shows a growing interest in making spatial information from large text data. In this paper, we quantify and thus evaluate the relation between text frequency and properties of the outer-text, geographic setting by comparing text frequencies of mountain names to the respective geomorphometric characteristics. We focus on some 2000 unique mountain names that appear some 50,000 times in a large compilation of texts on Swiss alpine history. The results on the full data set suggest only a weak relation: only 5–10% of the variation in the text frequency being explained by the respective geomorphometric characteristics. However, an analysis of multiple scales allows us to identify a Simpson’s Paradox. What appears to be ‘noise’ in the analysis of all mountains in the whole of Switzerland shows significant local signals. Small spatial extents, found all over Switzerland, can show considerably strong correlations between text frequency and spatial prominence, with up to 90% of the total variation explained. We argue that our findings have practical implications for data-driven GIScience. Retrieving meaningful spatial information from text might only be possible if the spatial scale of analysis reflects the spatial scale described in the input text documents.

KEYWORDS:

Acknowledgments

We would like to express our special thanks to Ross S. Purves for his helpful comments to an early version of the manuscript.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1. The details of the named entity recognition in the current version of the corpus are described in the corpus documentation (in German), available at http://www.textberg.ch/ReleaseNotes/README_Release_151_v01.htm.

2. We relate to these two measures as frequency (i.e. mountain toponym frequency) and prominence (i.e. spatial mountain prominence) in the rest of this discussion.

Additional information

Funding

This research was supported by the University Research Priority Program ‘Language and Space’ at the University of Zurich.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Are prominent mountains frequently mentioned in text? Exploring the spatial expressiveness of text frequency

Information for

Open access

Opportunities

Help and information

Are prominent mountains frequently mentioned in text? Exploring the spatial expressiveness of text frequency

ABSTRACT

Acknowledgments

Disclosure statement

Notes

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature