Information Theory and Algorithmic Complexity: Applications to Linguistic Discourses and DNA Sequences as Complex Systems Part I: Efficiency of the Genetic Code of DNA: Journal of Quantitative Linguistics: Vol 7, No 2

Views

CrossRef citations to date

Altmetric

Abstract

The genetic code is a mapping of 64 possible triplet codons from a 4-letter alphabet (A, C, G, U) into 20 amino acids and one STOP signal for protein synthesis. The pattern of degeneracies in codon assignments to amino acids suggests that there is an underlying variable length code, which meets all the optimal coding criteria dictated by Shannon’s Information theory. The genetic code can be viewed as an instantaneous, prefix code with unique decipherability and compactness. Optimal codon assignments and average code lengths for 20 species from 10 groups in the phylogenetic tree have been determined from the available data on codon usage, using the Huffman algorithm. The average binary code length of the genetic code exceeds the optimal average length of the Huffman code only by 2 to 5%, showing that the genetic code is close to optimal. But functionally, the genetic code is a fixed length code of 6 binary bits (3 bases). This provides the needed redundancy (?25%) for tolerance of errors in the DNA sequences due to mutations. This hybrid character of the genetic code, combining the advantages of variable and fixed length codes, supports a speculation that in the past the genetic code could have been a variable length code which has evolved to its modern version. The DNA sequence bears striking similarities to linguistic discourses from an Information Theoretic view. Both are complex adaptive systems with intrinsic elements of order and randomness. The complexity parameter, which we had defined earlier for linguistic discourses, is close to maximal in both DNA and natural languages. In this article, the first of two parts, we have focused on the variable length genetic code. In Part II, we deal with DNA sequence treated as a complex adaptive system and some possible correlation of certain parameters with evolution.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Information Theory and Algorithmic Complexity: Applications to Linguistic Discourses and DNA Sequences as Complex Systems Part I: Efficiency of the Genetic Code of DNA

Information for

Open access

Opportunities

Help and information

Information Theory and Algorithmic Complexity: Applications to Linguistic Discourses and DNA Sequences as Complex Systems Part I: Efficiency of the Genetic Code of DNA

Abstract

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature