Abstract
Genome subsequence assembly plays an important role in personalized medicine and genome variation. Third generation sequencing mechanisms were very crucial problems of subsequence assembly. For example mapping the subsequences or finding the identical positions in them is one of the rigid challenges in Bioinformatics. In this paper, we introduce a novel methodology of MapReduce Maximum Exact Matches (MR-MEM) which effectively utilizes the MapReduce program while finding and mapping between genome subsequences using parallel suffix and prefix index structure. The proposed technique works by aligning fragments according to the reference genome. A fragment subsequence is initially matched with the genome to identify the probable matching locations. These identified locations are then analyzed for complete matches. We find the best matching fragment that is assigned to the location by finding the hamming distance between the query sequence and genome reference. The implementation results show that the proposed approach exhibits faster and accurate alignments by providing very low gaps and very high exact alignments.
ACKNOWLEDGMENT
The authors would like to acknowledge the infrastructure support provided by the Parallel Computing and Bioinformatics Lab and Big Data Laboratory, Department of Computer Applications, NIT, Trichy.
Additional information
Notes on contributors
G. Raja
G Raja received his MTech degree from Bharathidasan University, Trichy. He is currently pursuing his PhD in the Department of Computer Applications, National Institute of Technology, Trichy. His research interests include bioinformatics and big dataanalytics. Corresponding author. Email: [email protected]
U. Srinivasulu Reddy
U Srinivasulu Reddy received his PhD from National Institute of Technology, Trichy. MPhil & MCA from Bharathidasan University, Trichy. He is currently working as assistant professor in the Department of Computer Applications, National Institute of Technology, Trichy. He is a life time member of the Computer Society of India (CSI). His research interests include big data analytics, machine learning and bioinformatics. Email: [email protected]