Abstract
Zipf's law states that if words of a language are sorted in the order of decreasing frequency of usage, a word's frequency is inversely proportional to its rank, or sequence number in the list. The Zipf-Mandelbrot law is a more general formula that provides a better fit in the low-rank region. Among several models aimed at explaining this effect, Mandelbrot's model is one of the best known. It derives Zipf's law as a result of the optimization of information/cost ratio, but leads to an unrealistic view of texts as random character sequences. In this article, a new modification of the model is proposed that is free from this drawback and allows the optimal information/cost ratio to be achieved via language evolution. It is demonstrated that the Zipf-Mandelbrot formula follows from this model, but its two parameters are not independent. As a result, the formula cannot convincingly be fitted to the actual word frequency distributions.
Notes
1As an example, consider the process known in linguistics by which so-called expressive synonyms change to regular words. A well-known example is Russian glaz“eye”, which initially meant “pebble”, then became expressive for “eye”, and gradually displaced the original word for “eye”, oko of Indo-European descent. Another example is provided by French tête“head” < testa“crock, pot”, that started as an expressive synonym for “head” and eventually supplanted the original word chef in this sense.