58
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Quantitative analysis of Sesotho sa Leboa part-of-speech taggers

ORCID Icon & ORCID Icon
 

Abstract

This study reported in this article conducts a quantitative analysis of the Sesotho sa Leboa National Centre for Human Language Technology (NCHLT) part-of-speech annotated data set, and compares the quality of the NCHLT and CTexT part-of-speech taggers based on the data set. The two taggers were developed as part of the NCHLT Text project and are both based on Taljard et al.’s fine-grained tagset, aligned to the morphological structure of Sesotho sa Leboa. A gold standard data set of 7 153 tokens is utilised for comparison and evaluation of the overall performance of the two part-of-speech taggers and to perform fine-grained error analysis. We find that the NCHLT and CTexT taggers obtain 88.40% and 94.18% accuracy respectively and describe the linguistic nature of the most frequent errors observed in the two taggers.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.