119
Views
0
CrossRef citations to date
0
Altmetric
Articles

Fast Pattern Matching in Compressed Text using Wavelet Tree

, ORCID Icon &
 

ABSTRACT

With the recent increase in the size of data, compression has become essential tool for handling massive data. Compression is usually more helpful in handling huge data over the networks, large text collections, and storing massive biological data. It is also useful in searching a pattern (of any kind) inside the huge data set. Compressed pattern matching (CPM) is the task of performing string matching in a compressed text without decompressing it. In this type of matching, pattern may or may not be compressed. This paper presents an efficient algorithm (WBTC_WT) for matching a pattern directly inside the compressed text. The proposed algorithm uses the tagged sub optimal code category of algorithm called word-based tagged code (WBTC) and a self-indexed data structure called wavelet tree. WBTC is used for encoding the text and wavelet tree provides fast searching over the compressed text. WBTC_WT removes the problem of false matches, encountered in some of the previous approaches and is able to match arbitrary portion of text without decompressing it. The proposed algorithm is implemented, analyzed, and compared with existing approaches on huge data set. From the simulation results, it is found that our algorithm outperforms the existing algorithms in most of the cases.

ACKNOWLEDGEMENTS

Authors wish to acknowledge Dr. Hari Mohan Singh, Assistant Professor, Department of Computer Science and IT, SHUATS, Naini, Allahabad, India, for his continuous support.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Surya Prakash Mishra

Surya Prakash Mishra is working as an assistant professor at the Department of Computer Science & Information Technology, SHUATS, Naini, Allahabad, India. He received his BSc in PCM from University of Allahabad, Allahabad, India and MCA from, UP Technical University, Lucknow, India. He has more than 10 years of teaching experience at undergraduate and graduate levels. He has published more than seven research papers in International journals and conferences. His research interests include pattern matching, information retrieval, computer algorithms and data compression.

E-mail: [email protected]

Rajesh Prasad

Rajesh Prasad is currently working as an assistant professor at the School of IT and Computing, American University of Nigeria, Yola, Nigeria. He received his MTech in software engineering and PhD in computer science and engineering from MNNIT, Allahabad, India. He has more than 17 years of teaching experience at undergraduate and graduate levels. He has published more than 50 research papers in international journals and conferences. His research interests include pattern matching, information retrieval, data mining, bioinformatics and data compression.

E-mail: [email protected]

Gurmit Singh

Gurmit Singh is working as Professor Emeritus at the Department of Computer Science & Information Technology, SHUATS, Naini, Allahabad, India. He received his BTech in electronics from JNU, New Delhi, India and MTech in electrical engineering from IIT, Kanpur, India. He has more than 20 years of teaching experience at undergraduate and graduate levels. He has published more than 35 research papers in international journals and conferences. His research interests include information retrieval, data communication, image processing and software engineering.

E-mail: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.