133
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

A word-class tagset for Setswana

Pages 203-222 | Published online: 12 Nov 2009
 

Abstract

This paper aims to present a general tagset for use in an automatic word-class tagger, functioning largely at the level of word-classes, rather than pure morphological information. In view of the importance of reusability, guidelines and standards for tagsets are identified, concentrating on the standards proposed by the Expert Advisory Group on Language Engineering Standards (EAGLES) within the framework of the European Union language technology initiatives. Certain criteria for both tagsets and tag labels are identified. Thereafter, problems and solutions for tokenisation in Setswana are discussed, with emphasis on the challenge presented by the disjunctive orthography and the agglutinative character of Bantu languages. The bulk of the article is then devoted to the development of a tagset for the various part-of-speech categories of Setswana, as a test for the extent to which the EAGLES standards can be adopted and adjusted to make them suitable for an agglutinating language. The conclusion is that this is indeed possible to a large extent, with minor elaborations necessary, in particular as far as the disjunctively written prefixes of verbs are concerned.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.