89
Views
4
CrossRef citations to date
0
Altmetric
Articles

Maximum Exact Matches for High Throughput Genome Subsequence Assembly

&
 

Abstract

Genome subsequence assembly plays an important role in personalized medicine and genome variation. Third generation sequencing mechanisms were very crucial problems of subsequence assembly. For example mapping the subsequences or finding the identical positions in them is one of the rigid challenges in Bioinformatics. In this paper, we introduce a novel methodology of MapReduce Maximum Exact Matches (MR-MEM) which effectively utilizes the MapReduce program while finding and mapping between genome subsequences using parallel suffix and prefix index structure. The proposed technique works by aligning fragments according to the reference genome. A fragment subsequence is initially matched with the genome to identify the probable matching locations. These identified locations are then analyzed for complete matches. We find the best matching fragment that is assigned to the location by finding the hamming distance between the query sequence and genome reference. The implementation results show that the proposed approach exhibits faster and accurate alignments by providing very low gaps and very high exact alignments.

ACKNOWLEDGMENT

The authors would like to acknowledge the infrastructure support provided by the Parallel Computing and Bioinformatics Lab and Big Data Laboratory, Department of Computer Applications, NIT, Trichy.

Additional information

Notes on contributors

G. Raja

G Raja received his MTech degree from Bharathidasan University, Trichy. He is currently pursuing his PhD in the Department of Computer Applications, National Institute of Technology, Trichy. His research interests include bioinformatics and big dataanalytics. Corresponding author. Email: [email protected]

U. Srinivasulu Reddy

U Srinivasulu Reddy received his PhD from National Institute of Technology, Trichy. MPhil & MCA from Bharathidasan University, Trichy. He is currently working as assistant professor in the Department of Computer Applications, National Institute of Technology, Trichy. He is a life time member of the Computer Society of India (CSI). His research interests include big data analytics, machine learning and bioinformatics. Email: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.