77
Views
6
CrossRef citations to date
0
Altmetric
Original Articles

Information Theory and Algorithmic Complexity: Applications to Language Discourses and DNA Sequences as Complex Systems Part II: Complexity of DNa Sequences, Analogy with Linguistic Discourses

 

Abstract

Linguistic discourses and DNA sequences in molecular biology are treated as complex adaptive systems with interacting coexisting elements of order and randomness. Following a prescription for ‘effective complexity’ of a system by Gell-Mann, we defined earlier a complexity function C for a linguistic discourse. C depends on two ‘order’ parameters x and a, which in turn depend on two kinds of entropies, Shannon entropy and Algorithmic (Kolmogorov) entropy. Algorithmic complexity is used to define an Optimum Meaning Preserving Code (OMPC) which preserves the ‘meaning’ of a particular word sequence, unlike the Shannon entropy. C tends to be 0 for systems of low as well as high order and is maximum (C = 1) for a mixture of order and disorder. The starting point for our analysis is the distribution of word frequencies, Zipf’s law, which is a power law (W (k) = B k -2), where W (k) is the frequency of words occurring k times and B a constant). In earlier papers, we deduced a modified version of Zipf’s law (MPL) which was in better agreement with data from natural languages. The model used physical principles of maximum entropy and degeneracy from classical and quantum statistical mechanics. The model was extended to speech, a small invariant set of phonemes to obtain a law similar to the MPL, called the Cumulative Modified Power Law (CMPL), which adequately fits the phoneme rank frequencies. It was shown that the near maximal value of complexity (˜1) is a consequence of Zipf’s law. In this paper, we extend the above concepts to DNA sequences treated as strings of symbols from a four-letter alphabet (bases A, G, C, U). The genetic code is examined at three hierarchical levels of codons (64, 26, 21). Codon rank frequencies of 20 different species are shown to follow the CMPL. Entropy, order and complexity parameters for DNA are numerically similar to those obtained for language. Complexity ˜1 for all 20 species spanning a wide range of evolutionary age. Some parameters show significant correlation with evolutionary age of the species.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.