Abstract
This paper focuses on the question whether grapheme frequencies are in a direct relationship to word length. In other words, a possible interrelation between the frequency of graphemes and the length of linguistic units is discussed. Based on different Slovene text types it is shown that the Altmann-Menzerath law is an adequate theoretical explanation for the supposed interrelation between grapheme frequencies and the word length. Furthermore a linguistic interpretation of parameters of grapheme frequency models is offered.
Acknowledgements
We would like to thank Gabriel Altmann for the fruitful discussions about this paper and in particular for some mathematical hints.
Notes
1Although the investigation concerns explicitly the grapheme level one should keep in mind that – at least for Slavic languages – the grapheme to phoneme correspondence is rather shallow (see Kelih Citation2008a, for a quantitative analysis of the Slovene grapheme to phoneme correspondence). Even if the writing system of a language is considered to be a “secondary” system it should be noted that due to the shallow orthography of Slovene our results presented here should also be corroborated for the phoneme (“spoken language”) level.
2All texts are taken from the Graz text database “Quanta” (http://quanta-textdata.uni-graz.at/).
3Menzerath's law seems to be more a general language law (“Konstruktionsgesetz”) than a text law; therefore the analysis of word form types is preferred (cf. Altmann & Schwibbe, Citation1989).
4In the case of Slovene, the validity of Menzerath's law was proven in only a few cases (cf. Gryzbek, 2000 and Kelih, Citation2008b).