Abstract
The discussion in this paper is related to extracting a single lemma from different morphological variants related to a particular dictionary root word. The existing popular online lemmatizers like the Stanford LemmaProcessor, Spacy Lemmatizer, LemmaGen, MorphAdorner, etc. generate the correct lemmas for all singular-plural nouns and all verbal words existing in different tenses, but all these lemmatizers are not able to derive the correct lemma for any type of derived words; specially for nominalized derived words. The proposed lemmatizer – ‘LemmaQuest’ is designed and implemented to overcome these limitations. LemmaQuest first creates distinct groups for all allied morphed words like singular-plural nouns, verbs in all tenses, and nominalized words. These groups are created based on a combination of different statistical distance measures considering all possible pairs of input words. After that, lemmas are generated for each group. The main objective of this proposed model is to extract the correct lemma for a set of a large number of input words in an optimized time, which leads to a vast improvement in text simplification, keyword extraction, text summarization and other text mining applications.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
![](/cms/asset/de4b266e-9308-4d57-949f-facc83a8c20e/tijr_a_2013328_ilg0001.gif)
Rupam Gupta
Rupam Gupta has completed the Master of Computer Application from North Bengal University after acquiring the degree of BSc (Chemistry) from Calcutta University. She has done her MCA project in text mining at the Indian Statistical Institute, Kolkata. She is pursuing the degree of PhD at The Maharaja Sayajirao University of Baroda, Vadodara. She worked as a software consultant at PCL, Mindware, Kolkata for 2.5 years. She was working as a lecturer cum programmer in the Department of MCA at Chiman Bhai Patel Institute of Technology, Ahmedabad for 4 years under Gujarat University. She has also worked as an assistant professor at SVIT, Vasad under Gujarat Technology University for 15 years. She is currently working as an assistant professor at ITM,University, Baroda. She has presented papers at international conferences and has also published papers in Scopus journals. Her area of specialization is text mining and language processing.
![](/cms/asset/6c6b5dec-8d7e-4b7f-a18b-451761e27a37/tijr_a_2013328_ilg0002.gif)
Anjali G. Jivani
Anjali G Jivani is working as an associate professor in the Department of Computer Science and Engineering, The Maharaja Sayajirao University of Baroda, Vadodara. She has more than 30 years of teaching experience and her area of specialization is data mining and database management systems. She is the life member of ISTE and CSI. She has published more than 45 papers having over 560 citations. She has received a number of awards at paper presentations and for the work done during her tenure as head of the Department. She is guiding a number of PhD scholars and is also a DPC member and Board of Studies member at a number of Universities like Gujarat Technological University, Navrachana University and TeamLease University to name a few. Email: [email protected]