228
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Unifying Models for Word Length Distributions Based on Types and Tokens

&
 

ABSTRACT

Word length studies have been one of the central issues in Quantitative Linguistics for a long time. Most models were constructed for very specific purposes, i.e. the individual models apply only to a specific language, only to token counts or only to type counts. The present paper takes up the challenge of developing unifying models which account for both type and token frequencies of a moderately large sample of languages (eight Indo-European and two non-Indo-European languages). We introduce three models which can be well fitted to all our data: the exponentiated Hyper-Poisson distribution, the generalized gamma and the Sichel distribution. We also discuss the possibility of interpreting the model parameters linguistically.

Acknowledgments

The authors would like to thank an anonymous referee for useful comments on earlier versions of the manuscript which helped us to improve the paper considerably.

Disclosure statement

No potential conflict of interest was reported by the authors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.