463
Views
1
CrossRef citations to date
0
Altmetric
Original Article

An improved Urdu stemming algorithm for text mining based on multi-step hybrid approach

ORCID Icon ORCID Icon, , &
Pages 703-723 | Received 02 Jun 2017, Accepted 03 Apr 2018, Published online: 22 May 2018
 

ABSTRACT

Stemming is the basic operation in Natural language processing (NLP) to remove derivational and inflectional affixes without performing a morphological analysis. This practice is essential to extract the root or stem. In NLP domains, the stemmer is used to improve the process of information retrieval (IR), text classifications (TC), text mining (TM) and related applications. In particular, Urdu stemmers utilize only uni-gram words from the input text by ignoring bigrams, trigrams, and n-gram words. To improve the process and efficiency of stemming, bigrams and trigram words must be included. Despite this fact, there are a few developed methods for Urdu stemmers in the past studies. Therefore, in this paper, we proposed an improved Urdu stemmer, using hybrid approach divided into multi-step operation, to deal with unigram, bigram, and trigram features as well. To evaluate the proposed Urdu stemming method, we have used two corpora; word corpus and text corpus. Moreover, two different evaluation metrics have been applied to measure the performance of the proposed algorithm. The proposed algorithm achieved an accuracy of 92.97% and compression rate of 55%. These experimental results indicate that the proposed system can be used to increase the effectiveness and efficiency of the Urdu stemmer for better information retrieval and text mining applications.

Acknowledgements

I would like to express my heartfelt thanks to the Masood Jhandir Research Library, Sardar pur Jhandir Mailsi, District Vehari, Punjab, Pakistan (jhandir.com) for providing us relevant, and important resources required for this research.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.