3,050
Views
7
CrossRef citations to date
0
Altmetric
Articles

A simple but effective method for Indonesian automatic text summarisation

ORCID Icon, ORCID Icon & ORCID Icon
Pages 29-43 | Received 21 Feb 2021, Accepted 29 May 2021, Published online: 10 Jun 2021
 

Abstract

Automatic text summarisation (ATS) (therein two main approaches–abstractive summarisation and extractive summarisation are involved) is an automatic procedure for extracting critical information from the text using a specific algorithm or method. Due to the scarcity of corpus, abstractive summarisation achieves poor performance for low-resource language ATS tasks. That’s why it is common for researchers to apply extractive summarisation to low-resource language instead of using abstractive summarisation. As an emerging branch of extraction-based summarisation, methods based on feature analysis quantitate the significance of information by calculating utility scores of each sentence in the article. In this study, we propose a simple but effective extractive method based on the Light Gradient Boosting Machine regression model for Indonesian documents. Four features are extracted, namely PositionScore, TitleScore, the semantic representation similarity between the sentence and the title of document, the semantic representation similarity between the sentence and sentence’s cluster center. We define a formula for calculating the sentence score as the objective function of the linear regression. Considering the characteristics of Indonesian, we use Indonesian lemmatisation technology to improve the calculation of sentence score. The results show that our method is more applicable.

Disclosure statement

The authors declare that they have no known potential conflicts of interest that may have influenced the work presented in this paper.

Notes

Additional information

Funding

This work was supported by Natural Science Foundation of China [Grant Number 61572145]; Guangzhou Science and Technology Plan Project [Grant Number 202009010021]; Major Projects of Guangdong Education Department for Foundation Research and Applied Research [Grant Number 2017KZDXM031].