381
Views
2
CrossRef citations to date
0
Altmetric
Articles

The Probability Distribution of Textual Vocabulary in the English Language

, &
Pages 49-70 | Published online: 23 Feb 2016
 

Abstract

The probability of textual vocabulary is defined as the combined probabilities of the individual lemmas occurring in a text, which sum to 1 in the text but normally less than 1 in another different text. If the text is expanded the probability of the original textual vocabulary would be smaller than 1 in the expanded text. However, the present study reveals that as the text expands continually, instead of monotonically decreasing, the probability of the original textual vocabulary quickly reaches a point from which it stabilizes despite further expansion of the text. In addition, the probability of the textual vocabulary of a text occurring in other texts is not affected by the length of the texts in which they occur. Mathematical models are formulated capturing the distribution of the probability of textual vocabulary in the English language.

Disclosure statement

No potential conflict of interest was reported by the authors.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.