129
Views
2
CrossRef citations to date
0
Altmetric
Computers and Computing

Automatic Text Summarization of Konkani Folk Tales Using Supervised Machine Learning Algorithms and Language Independent Features

ORCID Icon & ORCID Icon
 

Abstract

Automatic text summarization is an emerging field of research in Natural Language Processing. This work is a novel attempt to include a low-resource language to the domain of Automatic Text Summarization. We use supervised machine learning algorithms to perform single document extractive automatic text summarization on documents in a low-resource language, Konkani. In particular, we propose using language independent features to train supervised machine learning algorithms using a Konkani dataset, specifically devised for the experimentation using books on Konkani folktale literature. We approach the automatic text summarization task as a binary classification problem, and the algorithms, once trained, classify the sentences based on their relevance to generate a summary. Thereafter, the performance of popular linear and non-linear supervised machine learning algorithms is evaluated using K-fold cross-validation. The summary generated by the systems is compared with human-generated summaries to verify its effectiveness. The results show that the linear models exhibit better performance in comparison with the non-linear models; however, all the models could beat the baselines. The output produced by the proposed methodology generates promising summaries without the need for any language-specific domain knowledge.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Jovi D’Silva

Jovi D’Silva is presently a research scholar in the Department of Computer Science and Engineering, School of Engineering, Assam Don Bosco University, Guwahati, India. He has obtained BCA and MCA degrees from Bangalore University, India and MTech in computer science and engineering from Christ University, India. His research area is natural language processing. Email: [email protected]

Uzzal Sharma

Uzzal Sharma obtained his MCA from IGNOU and completed PhD from Gauhati University. He has over 16 years of experience in academics and industry. His research areas include speech signal processing, software engineering, cyber security and data engineering. Currently, he is an assistant professor Stage 2 at Assam Don Bosco University, Guwahati, India. Email: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.