28
Views
69
CrossRef citations to date
0
Altmetric
Original Articles

Linguistics of Nucleotide Sequences I: The Significance of Deviations from Mean Statistical Characteristics and Prediction of the Frequencies of Occurrence of Words

, &
Pages 1013-1026 | Received 05 Sep 1989, Published online: 15 May 2012
 

Abstract

Mathematical models of the generation of genetic texts appeared simultaneously with the first sequencing DNA They are used to establish functional and evolutionary relations between genetic texts, to predict the number and distribution of specific sites in a sequence and to identify “meaningful” words. The present paper deals with two problems:

1) The significance of deviations from the mean statistical characteristics in a genetic text.

Anyone who has addressed himself to the statistical analysis of sequenced DNA is familiar with the question: what deviations from the expected frequencies of occurrence of particular words testify to the “biological” significance of those words? We propose a formula for the variance of the number of word's occurrences in the text, with allowance for word overlaps, making it possible to assess the significance of the deviations from the expected statistical characteristics.

2) A new method for predicting the frequencies of occurrence of particular words in a genetic text using the statistical characteristics of “spaced” L-grams.

The method can be used for predicting the number of restriction sites in human DNA and in planning experiments on the physical mapping and sequencing of the human genome.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.