119
Views
0
CrossRef citations to date
0
Altmetric
Articles

Fast Pattern Matching in Compressed Text using Wavelet Tree

, ORCID Icon &
Pages 87-99 | Published online: 25 Jul 2017
 

ABSTRACT

With the recent increase in the size of data, compression has become essential tool for handling massive data. Compression is usually more helpful in handling huge data over the networks, large text collections, and storing massive biological data. It is also useful in searching a pattern (of any kind) inside the huge data set. Compressed pattern matching (CPM) is the task of performing string matching in a compressed text without decompressing it. In this type of matching, pattern may or may not be compressed. This paper presents an efficient algorithm (WBTC_WT) for matching a pattern directly inside the compressed text. The proposed algorithm uses the tagged sub optimal code category of algorithm called word-based tagged code (WBTC) and a self-indexed data structure called wavelet tree. WBTC is used for encoding the text and wavelet tree provides fast searching over the compressed text. WBTC_WT removes the problem of false matches, encountered in some of the previous approaches and is able to match arbitrary portion of text without decompressing it. The proposed algorithm is implemented, analyzed, and compared with existing approaches on huge data set. From the simulation results, it is found that our algorithm outperforms the existing algorithms in most of the cases.

ACKNOWLEDGEMENTS

Authors wish to acknowledge Dr. Hari Mohan Singh, Assistant Professor, Department of Computer Science and IT, SHUATS, Naini, Allahabad, India, for his continuous support.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Surya Prakash Mishra

Surya Prakash Mishra is working as an assistant professor at the Department of Computer Science & Information Technology, SHUATS, Naini, Allahabad, India. He received his BSc in PCM from University of Allahabad, Allahabad, India and MCA from, UP Technical University, Lucknow, India. He has more than 10 years of teaching experience at undergraduate and graduate levels. He has published more than seven research papers in International journals and conferences. His research interests include pattern matching, information retrieval, computer algorithms and data compression.

E-mail: [email protected]

Rajesh Prasad

Rajesh Prasad is currently working as an assistant professor at the School of IT and Computing, American University of Nigeria, Yola, Nigeria. He received his MTech in software engineering and PhD in computer science and engineering from MNNIT, Allahabad, India. He has more than 17 years of teaching experience at undergraduate and graduate levels. He has published more than 50 research papers in international journals and conferences. His research interests include pattern matching, information retrieval, data mining, bioinformatics and data compression.

E-mail: [email protected]

Gurmit Singh

Gurmit Singh is working as Professor Emeritus at the Department of Computer Science & Information Technology, SHUATS, Naini, Allahabad, India. He received his BTech in electronics from JNU, New Delhi, India and MTech in electrical engineering from IIT, Kanpur, India. He has more than 20 years of teaching experience at undergraduate and graduate levels. He has published more than 35 research papers in international journals and conferences. His research interests include information retrieval, data communication, image processing and software engineering.

E-mail: [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 100.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.