4,162
Views
8
CrossRef citations to date
0
Altmetric
Articles

Variation in Word Frequency Distributions: Definitions, Measures and Implications for a Corpus-Based Language Typology

, , &
 

Abstract

Word frequencies are central to linguistic studies investigating processing difficulty, learnability, age of acquisition, diachronic transmission and the relative weight given to a concept in society. However, there are few cross-linguistic studies on entire distributions of word frequencies, and even less on systematic changes within them. Here, we first define and test an exact measure for the relative difference between distributions – the Normalised Frequency Difference (NFD). We then apply this measure to parallel corpora in overall 19 languages, explaining systematic variation in the frequency distributions within the same language and across different languages. We further establish the NFD between lemmatised and un-lemmatised corpora as a frequency-based measure of inflectional productivity of a language. Finally, we argue that quantitative measures like the NFD can advance language typology beyond abstract, theory-driven expert judgments, towards more corpus-based, empirical and reproducible analyses.

Notes

1. We made an R package available for NFD calculation and plotting via https://github.com/dimalik/nfd/.

2. Note that we included the ’s genitive both under inflexion and clitics. Theoretically it should be considered a phrasal clitic, since it does not attach directly to nouns, but rather to noun phrases. However, in practice it is found mostly on nouns and might be perceived as noun inflexion by learners and speakers.

3. In the upper panels we log-transform the ranks of the distributions, but not the ΔFreq. This exaggerates the visual differences in frequencies somewhat.

4. The POS tags used in the BTagger are the first two letters of the Multext-East morphosyntactic definitions (MSD). See a full list here: http://nl.ijs.si/ME/V4/msd/html/index.html.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.