411
Views
3
CrossRef citations to date
0
Altmetric
Abstract

117 Analysis of SNP containing sites in human genome using text complexity estimates

, &

Due to technological breakthrough in the development of DNA sequencing technologies amount of available genomic data exponentially grows each year, including information on genomes natural variability. The study of genomic context of single nucleotide polymorphisms (SNPs) represented in the dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/) is of greater interest. Association of SNP position in human genome with mononucleotide repeats was shown earlier. We studied context dependencies in broader scale in human, mouse and rat genomes using several complexity measures. Nucleotide text complexity is important mathematical features to explore fully the contextual dependencies in the sequences, is the complexity of the text (Orlov, Te Boekhorst, & Abnizova, 2006). A wide range of complexity measures estimates different features of the nucleotide text: linguistic complexity relates to oligonucleotide vocabularies, complexity estimation by Lempel–Ziv compression relates to structure of repeats in the text, Shannon entropy counts variation of nucleotides. These algorithms which were previously used in “Complexity” software, developed in the Institute of Cytology and Genetics in Novosibirsk (http://wwwmgs.bionet.nsc.ru/mgs/programs/lzcomposer/) have been re-implemented in a computer program with supplements weight complexity measures and measures the rotation of the monomers. We analyzed the nucleotide sequences containing SNPs in the human, mouse and rat genomes by in-house program (in C++) calculating the averaged text complexity profiles. We analyzed more than 2.7 million SNP containing sites (±20 nt) in the human genome presented at the UCSC Genome Browser tables and in the “1000 genomes” project (http://www.1000genomes.org/data). The presence of low complexity sites in the flanking regions around SNPs in the human genome was statistically shown. The same effect was confirmed for sample of SNPs in mouse and in rat genomes. Effect of mononucleotide repeats adjacent to a SNP position (Siddle, Goodship, Keavney, & Santibanez-Koref, 2011) was confirmed on new data including model mammalian genomes. Note that low complexity profiles keep more information extending just measures of mononucleotide patches. This effect was found in model genomes (Levitsky, Babenko, & Vershinin, 2014; Matushkin, Levitsky, Orlov, Likhoshvai, & Kolchanov, 2013). The irregularities of mutation hot-spots in genome have been shown earlier on a limited data. The molecular mechanism of the observed effect of lowering the text complexity on flanks of SNP genome position can be explained by the increased frequency of double-helix DNA breaks in flanking positions.

The research has been supported by ICG SB RAS budget project VI.61.1.2 and RFBR 14-04-01906.

References

  • Levitsky, V. G., Babenko, V. N., & Vershinin, A. V. (2014). The roles of the monomer length and nucleotide context of plant tandem repeats in nucleosome positioning. Journal of Biomolecular Structure and Dynamics, 32, 115–126.10.1080/07391102.2012.755796
  • Matushkin, Y. G., Levitsky, V. G., Orlov, Y. L., Likhoshvai, V. A., & Kolchanov, N. A. (2013). Translation efficiency in yeasts correlates with nucleosome formation in promoters. Journal of Biomolecular Structure and Dynamics, 31, 96–102.10.1080/07391102.2012.691366
  • Orlov, Y. L., Te Boekhorst, R., & Abnizova, I. I. (2006). Statistical measures of the structure of genomic sequences: Entropy, complexity, and position information. Journal of Bioinformatics and Computational Biology, 04, 523–536.10.1142/S0219720006001801
  • Siddle, K. J., Goodship, J. A., Keavney, B., & Santibanez-Koref, M. F. (2011). Bases adjacent to mononucleotide repeats show an increased single nucleotide polymorphism frequency in the human genome. Bioinformatics, 27, 895–898.10.1093/bioinformatics/btr067

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.