Abstract
Optimal string alignment is used to discover evolutionary relationships or mutations in DNA/RNA or protein sequences. Errors, missing parts or uncertainty in such a sequence can be covered with wild cards, so-called wild bases. This makes an alignment possible even when the data are corrupted or incomplete. The extended pairwise local alignment of wild card DNA/RNA sequences requires additional calculations in the dynamic programming algorithm and necessitates a subsequent best- and worst-case analysis for the wild card positions. In this paper, we propose an algorithm which solves the problem of input data wild cards, offers a highly flexible set of parameters and displays a detailed alignment output and a compact representation of the mutated positions of the alignment. An implementation of the algorithm can be obtained at https://github.com/sysbio-bioinf/swat+ and http://sysbio.uni-ulm.de/?Software:Swat+.
Acknowledgements
Axel Fürstberger and Markus Maucher contributed equally to this paper.
Funding
This work was funded in part by the German Science Foundation (DFG, SFB1074, Project Z1) and the German Federal Ministry of Education and Research (BMBF) within the framework GERONTOSYS (Forschungskern SyStaR, project ID 0315894A).