Abstract
The probability of textual vocabulary is defined as the combined probabilities of the individual lemmas occurring in a text, which sum to 1 in the text but normally less than 1 in another different text. If the text is expanded the probability of the original textual vocabulary would be smaller than 1 in the expanded text. However, the present study reveals that as the text expands continually, instead of monotonically decreasing, the probability of the original textual vocabulary quickly reaches a point from which it stabilizes despite further expansion of the text. In addition, the probability of the textual vocabulary of a text occurring in other texts is not affected by the length of the texts in which they occur. Mathematical models are formulated capturing the distribution of the probability of textual vocabulary in the English language.
Disclosure statement
No potential conflict of interest was reported by the authors.