67
Views
21
CrossRef citations to date
0
Altmetric
Original Articles

Quantitative linguistics and complex system studiesFootnote*

&
Pages 177-228 | Published online: 22 Aug 2008
 

Abstract

Linguistic discourses treated as maximum entropy systems of words according to prescriptions of algorithmic information theory (Kolmogorov, Chaitin, & Zurek) are shown to give a natural explanation of Zipf's law with quantitative rigor. The pattern of word frequencies in discourse naturally leads to a distinction between two classes of words: content words (c‐words) and service words (s‐words). A unified entropy model for the two classes of words leads to word frequency distribution functions in accordance with data. The model draws on principles of classical and quantum statistical mechanics and emphasises general principles of classifying, counting and optimising their related costs for coding of sequential symbols, under certain obvious constraints; hence it is likely to be valid for diverse complex systems of nature. Unlike other models of Zipf s law, which require exponential distribution of word lengths, entropy models based on words as primary symbols do not restrict the word length distribution. It is shown that language exhibits the characteristics of complex adaptive systems (Gell‐Mann, 1994), in which the complexity measure is maximal for a system of intermediate algorithmic entropy, between totally ordered and disordered systems. A complexity function ‐ a higher order entropy ‐is defined for linguistic discourse which has the above properties. Natural discourses indeed seem to have the right mix of order and randomness and a complexity close to maximal.

Notes

We are grateful to Mr. M.S. Nanjundiah for lending us the CD‐ROM (Library of the Future TM Series). We thank Prof. R. Köhler for several excellent suggestions to improve the paper and Prof. G. Altmann for valuable comments. We thank Prof. P. Grassberger for permission to reproduce Figure la (Grassberger, 1986); Prof. S. Ramaseshan, Editor, Current Science, and Dr. H.K. Khanna, Editor, Journal of Scientific and Industrial Research, for allowing reproductions of figures from our earlier publications [Figure 4,5a, Naranan & Balasubrahmanyan (1992b) and Figure 6, Naranan & Balsubrahmanyan (1993)]. We acknowledge the help of Mr. T.V. Suresh, Dr. Rahul Sinha and Dr. G. Subramoniam in processing the manuscript for e‐mail. Prof. R. Ramachandran, Director, Institute of Mathematical Sciences, Madras has kindly made available to us the excellent e‐mail facilities of the Institute; and we are grateful to him. We thank Mr. N.M. Malwad for access to Dewey (1923).

Address correspondence to: S. Naranan, 20 A/3, Second Cross Street, Jayaramnagar, Thiruvanmiyur, Madras 600041, India.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.