79
Views
0
CrossRef citations to date
0
Altmetric
Computers and Computing

LemmaQuest Lemmatizer: A Morphological Analyzer Handling Nominalization

&
Pages 6946-6954 | Published online: 23 Jan 2022
 

Abstract

The discussion in this paper is related to extracting a single lemma from different morphological variants related to a particular dictionary root word. The existing popular online lemmatizers like the Stanford LemmaProcessor, Spacy Lemmatizer, LemmaGen, MorphAdorner, etc. generate the correct lemmas for all singular-plural nouns and all verbal words existing in different tenses, but all these lemmatizers are not able to derive the correct lemma for any type of derived words; specially for nominalized derived words. The proposed lemmatizer – ‘LemmaQuest’ is designed and implemented to overcome these limitations. LemmaQuest first creates distinct groups for all allied morphed words like singular-plural nouns, verbs in all tenses, and nominalized words. These groups are created based on a combination of different statistical distance measures considering all possible pairs of input words. After that, lemmas are generated for each group. The main objective of this proposed model is to extract the correct lemma for a set of a large number of input words in an optimized time, which leads to a vast improvement in text simplification, keyword extraction, text summarization and other text mining applications.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Rupam Gupta

Rupam Gupta has completed the Master of Computer Application from North Bengal University after acquiring the degree of BSc (Chemistry) from Calcutta University. She has done her MCA project in text mining at the Indian Statistical Institute, Kolkata. She is pursuing the degree of PhD at The Maharaja Sayajirao University of Baroda, Vadodara. She worked as a software consultant at PCL, Mindware, Kolkata for 2.5 years. She was working as a lecturer cum programmer in the Department of MCA at Chiman Bhai Patel Institute of Technology, Ahmedabad for 4 years under Gujarat University. She has also worked as an assistant professor at SVIT, Vasad under Gujarat Technology University for 15 years. She is currently working as an assistant professor at ITM,University, Baroda. She has presented papers at international conferences and has also published papers in Scopus journals. Her area of specialization is text mining and language processing.

Anjali G. Jivani

Anjali G Jivani is working as an associate professor in the Department of Computer Science and Engineering, The Maharaja Sayajirao University of Baroda, Vadodara. She has more than 30 years of teaching experience and her area of specialization is data mining and database management systems. She is the life member of ISTE and CSI. She has published more than 45 papers having over 560 citations. She has received a number of awards at paper presentations and for the work done during her tenure as head of the Department. She is guiding a number of PhD scholars and is also a DPC member and Board of Studies member at a number of Universities like Gujarat Technological University, Navrachana University and TeamLease University to name a few. Email: [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 100.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.