1,145
Views
1
CrossRef citations to date
0
Altmetric
Article; Bioinformatics

Towards computational improvement of DNA database indexing and short DNA query searching

, , , &
Pages 958-967 | Received 31 Oct 2013, Accepted 09 Jul 2014, Published online: 31 Oct 2014

References

  • Kirilov KT, Golshani A, Ivanov IG. Termination codons and stop codon context in bacteria and mammalian mitochondria. Biotechnol Biotechnol Equip. 2013;27(4):4018–4025.
  • Kirilov K, Ivanov I. A programme for determination of codons and codons context frequency of occurrence in sequenced genomes. Biotechnol Biotechnol Equip. 2012;26(5):3310–3314.
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–453.
  • Smith T, Waterman M. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195–197.
  • Lipman DJ, Pearson WR. Rapid and sensitive protein similarity searches. Science. 1985;227(4693):1435–1441.
  • Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410.
  • Blast online tool [Internet]. Rockville Pike, Bethesda: National Library of Medicine; [cited 2014 Apr 20]. Available from: http://blast.ncbi.nlm.nih.gov/Blast.cgi.
  • Stokes WA, Glick BS. MICA: desktop software for comprehensive searching of DNA databases. BMC Bioinform. 2006;7:427.
  • Meek C, Patel JM, Kasetty S. OASIS: an online and accurate technique for local-alignment searches on biological sequences. In: Freytag JC, Lockemann PC, Abiteboul S, Carey MJ, Selinger PG, Heuer A, editors. Proceedings of 29th International Conference on Very Large Data Bases; 2003 Sep 9–12; Berlin: Elsevier Science & Technology; 2003.
  • Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. Alignment of whole genomes. Nucleic Acids Res. 1999;27(11):2369–2376.
  • Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 2002;30:2478–2483.
  • Kurtz S, Philippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12.
  • Abouelhoda MI, Kurtz S, Ohlebusch, E. Replacing suffix trees with enhanced suffix arrays. J Discrete Algorithms. 2004;2:53–86.
  • Kurtz S. Reducing the space requirement of suffix trees. J Softw Pract Exp. 1999;29(13):1149–1171.
  • Khan Z, Bloom JS, Kruglyak L, Singh M. A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays. Bioinformatics. 2009;25(13):1609–1616.
  • Vyverman M, De Baets B, Fack V, Dawyndt P. EssaMEM: finding maximal exact matches using enhanced sparse suffix arrays. Bioinformatics. 2013;29(6):802–804.
  • Ferragina P, Manzini G. Opportunistic data structures with applications. In: IEEE, editor. Proceedings of the 41st IEEE Symposium on Foundations of Computer Science; 2000 Nov 12–14; Redondo Beach: IEEE Computer Society; 2000.
  • Grossi R, Vitter JS. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J Comput. 2005;35(2):378–407.
  • Burrow M, Wheeler DJ. A block sorting lossless data compression algorithm. Palo Alto (CA): Digital Equipment Corporation; 1994. (Technical Report 124).
  • Lippert RA, Mobarry CM, Walenz BP. A space-efficient construction of the Burrows–Wheeler transform for genomic data. J Comput Biol. 2005;12(7):943–951.
  • Lippert RA. Space-efficient whole genome comparisons with Burrows–Wheeler transforms. J Comput Biol. 2005;12(4):407–415.
  • Lam TW, Sung WK, Tam SL, Wong CK, Yiu SM. Compressed indexing and local alignment of DNA. Bioinformatics. 2008;24(6):791–797.
  • Ohlebusch E, Gog S, Kugell A. Computing matching statistics and maximal exact matches on compressed full-text indexes. In: Chávez E, Lonardi S, editors. Proceedings of the 17th Annual Symposium on String Processing and Information Retrieval; 2010 Oct 11–13; Los Cabos: Springer; 2010.
  • Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27(2):573–580.
  • Rigoutsos I, Floratos A. Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics. 1998;14(1):55–67.
  • Miller C, Gurd J, Brass A. A RAPID algorithm for sequence database comparisons: application to the identification of vector contamination in the EMBL databases. Bioinformatics. 1999;15(2):111–121.
  • Giladi E, Walker MG, Wang JZ, Volkmuth W. SST: an algorithm for finding near-exact sequence matches in time proportional to the logarithm of the database size. Bioinformatics. 2002;18(6):873–877.
  • Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res. 2001;11(10):1725–1729.
  • Reneker J, Shyu C-R. Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals. BMC Bioinform. 2005;6(1):111.
  • Kent WJ. BLAT – the BLAST-like alignment tool. Genome Res. 2002;12(4):656–664.
  • The European Nucleotide Archive [Internet]. Heidelberg: The European Molecular Biology; [cited 2014 Apr 20]. Available from: http://www.ebi.ac.uk/ena/.
  • Simpson JT, Durbin R. Efficient construction of an assembly string graph using the FM-index. Bioinformatics. 2010;26(12):i367–i373.