4,725
Views
5
CrossRef citations to date
0
Altmetric
ARTICLES

Defining the word

ORCID Icon
 

Abstract

In this paper, I propose a definition of the term word that can be applied to all languages using the same criteria. Roughly, a word is defined as a free morph or a clitic or a root plus affixes or a compound plus affixes. The paper relies on earlier definitions of the terms free, morph, affix, clitic, root, and compound, which are summarized here. I briefly compare the proposed definition with Bloomfield’s, I note that it is a shared-core definition, and I say how word-forms differ from lexemes. In the final section, I explain why I think that an unnatural-seeming definition is better than a prototype definition or other options.

Acknowledgements

For helpful comments on an earlier version of this paper, I thank Artemij Keidan, Susanne Michaelis, Ryan M. Nefdt, Andrey Shluinsky, Nina Speransky, Adam Tallman, Dmitrij Zelenskij, and further commentators on Academia.edu.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 E.g., Lyons (Citation1968, 201): “Forms which never occur alone as whole utterances (in some normal situation) are bound forms; forms which may occur alone as uttences are free forms … It will be evident to the reader that this definition applies … to phonological words rather than grammatical words.”

2 Clitics are often thought to be defined in phonological terms, as somehow ‘phonologically dependent’ elements, but there is no general agreement on what phonological dependence means. As is discussed extensively in Haspelmath (Citation2023a), the phonological effects can be of diverse kinds and mostly apply to affixes as well.

3 Clitics are often said to combine with a phrase, but when a nonroot bound morph only ever occurs peripherally to one type of phrase on a content root, it is usually considered an affix (as is the case with complementizer suffixes on verbs in verb-final languages such as Turkish or Japanese, or case suffixes in languages with noun-final nominals such as Lezgian). Such elements are affixes by Definition 5.

4 The term root is sometimes used in an abstract sense (e.g., for the triconsonantal skeletons in Semitic languages), but the definition used here refers to concrete forms.

5 Needless to say, within each particular language, word classes are defined by morphosyntactic criteria. It is only at the level of comparative concepts that we need to appeal to semantic criteria.

6 This requirement is violated even more clearly in German teil-nehmen ‘take part’, because the element teil sometimes occurs postverbally (cf. sie nehmen morgen teil ‘they will take part tomorrow’, vs. sie haben gestern teil-genommen ‘they took part yesterday’). The spelling is not in conformity here with the compound status according to Definition 7.

7 In many Indo-European languages, compounds can consist of a combination of a root and a compound, which is not reflected in Definition 7. For the sake of simplicity, this is not addressed in the present paper. Moreover, parasynthetics (of the type ‘root+root+affix’, e.g., blue-eye-d) are not included either.

8 For the spelling difference (lower-case compound as a comparative concept; upper case German Compound for the language particular category), see Haspelmath (Citation2020b, §3).

9 Other terms are grammatical word (Matthews Citation1974, 32) and morphosyntactic word (Haspelmath Citation2011, 38). The term word-form comes from the Russian tradition (e.g., Mel’čuk Citation2006). But note that grammatical word has been used to differentiate between homophonous word-forms (e.g., English Past Tense play-ed vs. Past Participle play-ed, which are the same word-form according to Gebhardt (Citation2023, 83), but two different grammatical words). This usage is non-standard.

10 For example, Breiter (Citation1994) investigates ‘the length of lexemes in Chinese’, and what she means is the length of the lexeme-stem. As there are few inflectional affixes in Chinese, the difference does not matter much, but it is important to be aware that only lexeme-stems are forms (with a measurable length), while lexemes are sets of forms.

11 Occasionally, inflectional affixes occur directly on the root and derivational affixes occur outside of them (e.g., Ancient Greek an-é-bain-on [up-pst-go-1sg] ‘I went up’, where the inflectional past tense prefix occurs closer to the root bain- ‘go‘ than the derivational prefix an(a)-). Such cases are not covered by this definition, which concentrates on the core phenomena and thus follows the principle that comparative concepts should be ‘shared-core definitions’ (§3.2).

12 Mugdan (Citation2015, 252) writes at the end: “Where there is room for legitimate disagreement … , the final decision should be guided by overall structural considerations but may often well be a matter of taste.”

13 Ryan Nefdt (p.c.) pointed out to me that there is a tradition in philosophy of thinking about the ontology of words (e.g., Miller Citation2020). It remains to be seen how that work relates to the work in linguistics that is in my purview here.

14 However, the salience of the term lexeme has been reduced in recent times, as we no longer need lexeme-based dictionaries to look up information about words. Modern technology provides us easy ways to search for meanings and other properties of words that do not involve the notion of lexeme (which seems to be rooted in the dictionary word; see Haspelmath Citation2023c).