463
Views
1
CrossRef citations to date
0
Altmetric
Original Article

An improved Urdu stemming algorithm for text mining based on multi-step hybrid approach

ORCID Icon ORCID Icon, , &
Pages 703-723 | Received 02 Jun 2017, Accepted 03 Apr 2018, Published online: 22 May 2018

References

  • Abainia, K., Ouamour, S., & Sayoud, H. (2016). A novel robust Arabic light stemmer. Journal of Experimental & Theoretical Artificial Intelligence, 1–17.
  • Akram, Q. U. A., Naseer, A., & Hussain, S. (2009, August). Assas-band, an affix-exception-list based Urdu stemmer. In Proceedings of the 7th workshop on Asian language resources (pp. 40–46). Association for Computational Linguistics.
  • Al-Kabi, M. N., Saif, A. K., Belal, M. A. A., Saif, A. A.-R., & Izzat, M. A. (2015). A novel root based Arabic stemmer. Journal of King Saud University-Computer and Information Sciences, 27(2), 94–103.
  • Al-Shammari, E. T., & Jessica, L. (2008, October). Towards an error-free Arabic stemming. In Proceedings of the 2nd ACM workshop on Improving non english web searching (pp. 9–16). ACM.
  • Aldabbas, O., Riyad, A.-S., Ghassan, K., & Mohammed, A. S. (2016). Arabic light stemmer based on regular expression. In Proceedings of the International Computer Sciences and Informatics Conference (ICSIC 2016), 1–9
  • Ali, S., Khlid, S., & Saleemi, M. H. (2014). A novel stemming approach for Urdu language. Journal of Applied Environmental and Biological Sciences, 4(7S), 436–443.
  • Atwan, J., Mohd, M., and Kanaan, G. (2013). Enhanced Arabic information retrieval: Light stemming and stop words. In Soft computing applications and intelligent systems (pp. 219–228). Springer Berlin Heidelberg.
  • Bimba, A., Idris, N., Khamis, N., & Noor, N. F. M. (2016). Stemming Hausa text: Using affix-stripping rules and reference look-up. Language Resources and Evaluation, 50(3), 687–703.10.1007/s10579-015-9311-x
  • Bloch, S. A. (2012). “بنیادی اردو قواعد”, مقتدرہ قومی زبان پاکستان, اسلام آباد. Islamabad: Muqtadra Qaumi Zabaan.
  • Board, P. T. (2010). “اردو قواعدوانشاء” for Class-10th. Lahore: Punjab Textbook Board.
  • Braschler, M., & Ripplinger, B. (2004). How effective is stemming and decompounding for German text retrieval? Information Retrieval, 7(3/4), 291–316.10.1023/B:INRT.0000011208.60754.a1
  • Burney, A., Sami, B., Mahmood, N., Abbas, Z., & Rizwan, K. (2012). Urdu text summarizer using sentence weight algorithm for word processors. International Journal of Computer Applications, 46 (19), 38–43.
  • Dawson, J. (1974). Suffix removal and word conflation. Bulletin of the Association for Literary and Linguistic Computing, 2(3), 33–46.
  • Elrajubi, O. M. (2013, November). An improved Arabic light stemmer. In 2013 International conference on research and innovation in information systems (ICRIIS) (pp. 33–38). IEEE.
  • Fattah, M. A., Ren, F., & Kuroiwa, S. (2006). Stemming to improve translation lexicon creation form bitexts. Information Processing & Management, 42(4), 1003–1016.10.1016/j.ipm.2005.07.002
  • Flores, F. N., & Moreira, V. P. (2016). Assessing the impact of stemming accuracy on information retrieval – A multilingual perspective. Information Processing & Management, 52(5), 840–854.10.1016/j.ipm.2016.03.004
  • Frakes, W. B., & Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms. ( Chapter 8: Stemming algorithms). Englewood Cliffs, NJ: Prentice Hall.
  • Frakes, W. B., & Fox, C. J. (2003). Strength and similarity of affix removal stemming algorithms. In ACM SIGIR Forum (Vol. 37, no. 1, pp. 26–30). ACM.
  • Goweder, A., Alhami, H., Tarik, R., & Al-Musrati, A. (2008). A hybrid method for stemming Arabic text. Journal of computer Science, URL: http://eref.uqu.edu.sa/files/eref2/folder6/f181.pdf.
  • Gupta, V., Joshi, N., & Mathur, I. (2013, September). Rule based stemmer in Urdu. In 2013 4th International conference on computer and communication technology (ICCCT) (pp. 129–132). IEEE.
  • Gupta, V., Joshi, N., & Mathur, I. (2015, February). Design & development of rule based inflectional and derivational Urdu stemmer ‘Usal’. In 2015 International conference on futuristic trends on computational analysis and knowledge management (ABLAZE) (pp. 7–12). IEEE.
  • Hadni, M., Ouatik, S. A., & Lachkar, A. (2013). Effective arabic stemmer based hybrid approach for arabic text categorization. International Journal of Data Mining & Knowledge Management Process (IJDKP), 3(4), 1–14.
  • Haq, M. A. (1996). “قواعد اردو”, انجمن ترقی اردو, (ہند) نئی دلی. New Dehli: Anjuman Taraqqi-e-Urdu.
  • Husain, M. S., Ahamad, F., & Khalid, S. (2013). A language Independent Approach to develop Urdu stemmer. In Advances in computing and information technology (pp. 45–53). Springer Berlin Heidelberg.
  • Hussain, Z., Iqbal, S., Saba, T., Almazyad, A. S., & Rehman, A. (2017). Design and development of dictionary-based stemmer for the Urdu language. Journal of Theoretical & Applied Information Technology, 95(15), 3560–3569.
  • Hussain, S. (2004). Finite-state morphological analyzer for Urdu ( PhD diss.). National University of Computer & Emerging Sciences.
  • Hussain, S. (2008). Resources for Urdu language processing. In IJCNLP (pp. 99–100).
  • Ijaz, M., & Hussain, S. (2007, August). Corpus based Urdu lexicon development. In The proceedings of conference on language technology (CLT07), University of Peshawar, Pakistan (Vol. 73).
  • Islam, R. A. (2012). The morphology of loanwords in Urdu: The Persian, Arabic and English strands. (Doctoral dissertation, Newcastle University). UK, Available online at URL: https://theses.ncl.ac.uk/dspace/bitstream/10443/1407/1/Islam%2C%20R.A.%2012.pdf [accessed 24/03/2016].
  • Ismailov, A., Abdul Jalil, M. M., Abdullah, Z., & Abd Rahim, N. H. (2016, August). A comparative study of stemming algorithms for use with the Uzbek language. In 2016 3rd International conference on computer and information sciences (ICCOINS) (pp. 7–12). IEEE.
  • Jabbar, A., Iqbal, S., Ghani Khan, M. U., & Hussain, S. (2018). A survey on Urdu and Urdu like language stemmers and stemming techniques. Artificial Intelligence Review, 49(3), 339–373.10.1007/s10462-016-9527-1
  • Jabbar, A., Iqbal, S., & Ghani Khan, M. U. (2016, November 18). Analysis and development of resources for urdu text stemming. In Proceedings of the 6th annual international conference on language and technology, KICS-CLE, UET Lahore.
  • Jivani, A. G. (2011). A comparative study of stemming Algorithms. International Journal of Computer Technology and Applications, 2(6), 1930–1938.
  • Khan, S. A., Anwar, W., Bajwa, U. I., & Wang, X. (2012, December). A light weight stemmer for Urdu language: A scarce resourced language. In 24th International conference on computational linguistics (p. 69).
  • Khan, S., Anwar, W., Bajwa, U., & Wang, X. (2015). Template based affix stemmer for a morphologically rich language. The International Arab Journal of Information Technology, 12(2), 146–154.
  • Khoja, & Garside. (1999). Stemming Arabic Text. Lancaster, UK, Computing Department, Lancaster University. Available online at URL: http://zeus.cs.pacificu.edu/shereen/research.htm#stemming [accessed 27/12/2015].
  • Kraaij, W., & Pohlmann, R. (1995). Evaluation of a Dutch stemming algorithm. The New Review of Document and Text Management, 1, 25–43.
  • Lehal, R. K. V. G. G. (2012, December). Rule based Urdu stemmer. In 24th International conference on computational linguistics (p. 267).
  • Lovins, J. B. (1968). Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1–2), 22–31.
  • Majumder, P., Mitra, M., Parui, S. K., Kole, G., Mitra, P., & Datta, K. (2007). YASS: Yet another suffix stripper. ACM Transactions on Information Systems (TOIS), 25(4), 18.10.1145/1281485
  • McEnery, A. M., Baker, J. P., Gaizauskas, R., & Cunningham, H. (2000). EMILLE: Towards a corpus of South Asian languages. British Computing Society Machine Translation Specialist Group, 11, 1–9.
  • Muaz, A., Ali, A., & Hussain, S. (2009, August). Analysis and development of Urdu POS tagged corpus. In Proceedings of the 7th workshop on Asian language resources (pp. 24–29). Association for Computational Linguistics.
  • Oraby, S., El-Sonbaty, Y., & El-Nasr, M. A. (2013). Exploring the effects of word roots for Arabic sentiment analysis. In IJCNLP (pp. 471–479).
  • Paice, C. D. (1994, August). An evaluation method for stemming algorithms. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 42–50). Springer-Verlag New York, Inc.
  • Parveen, A. (2008). Morphological analysis of modern standard Urdu. (Doctoral dissertation, Aligarh Muslim University). Aligarh, India.
  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137.10.1108/eb046814
  • Rahimi, A. (2015). A new hybrid stemming algorithm for Persian. arXiv preprint arXiv:1507.03077.
  • Rahimtoroghi, E., Faili, H., & Shakery, A. (2010, December). A structural rule-based stemmer for Persian. In 2010 5th International symposium on telecommunications (IST) (pp. 574–578). IEEE.
  • Rizvi, S. M. J., & Hussain, M. (2005). “Analysis, design and implementation of Urdu morphological analyzer.” In Student conference on engineering sciences and technology, 2005. SCONEST 2005 (pp. 1–7). IEEE.
  • Sirsat, S. R., Chavan, V., & Mahalle, H. S. (2013). Strength and accuracy analysis of affix removal stemming algorithms. International Journal of Computer Science and Information Technologies, 4(2), 265–269.
  • Suman, M., Maddu, T., Shalini, A., & Bhavana, K. (2015). A new approach for text summarizer. Compusoft, 4(4), 1665.
  • Taghva, K., Elkhoury, R., & Coombs, J. (2005, April). Arabic stemming without a root dictionary. In Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on (Vol. 1, pp. 152–157). IEEE.
  • Tashakori, M., Meybodi, M., & Oroumchian, F. (2002). Bon: The persian stemmer. In EurAsia-ICT 2002: Information and communication technology (pp. 487–494). Springer Berlin Heidelberg.
  • UEP. (2014). “تخلیق اردو گرائمر”, for class 8th. Urdu bazar Lahore: Unique Education Publisher.
  • W1. Retrieved from http://www.bbc.com/urdu
  • W2. Retrieved from https://www.dawnnews.tv
  • W3. Retrieved from http://www.urduencyclopedia.org/urdudictionary
  • W4. Retrieved from http://www.cle.org.pk/software/ling_resources/wordlist.htm
  • W5. Retrieved from http://cle.org.pk/software/ling_resources/UrduHighFreqWords.htm

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.