Abstract
This paper aims to present a general tagset for use in an automatic word-class tagger, functioning largely at the level of word-classes, rather than pure morphological information. In view of the importance of reusability, guidelines and standards for tagsets are identified, concentrating on the standards proposed by the Expert Advisory Group on Language Engineering Standards (EAGLES) within the framework of the European Union language technology initiatives. Certain criteria for both tagsets and tag labels are identified. Thereafter, problems and solutions for tokenisation in Setswana are discussed, with emphasis on the challenge presented by the disjunctive orthography and the agglutinative character of Bantu languages. The bulk of the article is then devoted to the development of a tagset for the various part-of-speech categories of Setswana, as a test for the extent to which the EAGLES standards can be adopted and adjusted to make them suitable for an agglutinating language. The conclusion is that this is indeed possible to a large extent, with minor elaborations necessary, in particular as far as the disjunctively written prefixes of verbs are concerned.