146
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

Estimation of Sequencing Error Rates Present in Genome Databases

, &
Pages 3302-3305 | Published online: 16 Apr 2014
 

ABSTRACT

The quality of next-generation sequencing data is a major problem in today's bioinformatics. The validation of sequences, either by re-sequencing or pure statistical error evaluation, is the tool needed to ensure the correct results of all following research done with the data.

Estimating the error rates in genome databases gives an idea about the level of inherited errors in genome sequences. It is important as these kinds of errors have cumulative effect on every following step of analysis of the sequences. Here we present a way to define the error level in a genome, using two different databases: National Center of Biotechnology Information (NCBI) (as a verified one) and Resources for Plant Comparative Genomics (PlantGDB) as reference. Based on the most conservative regions in every genome—donor/acceptor splice sites (the canonical forms are the dinucleotides GT or GC and AG), we applied statistical methods to derive the NCBI error level for Oryza sativa (japonica cultivar) genome.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.