ABSTRACT
Though equi-complexity of languages has long been an assumption in the study of language, recent research has argued that languages differ with regard to their morphological complexity. Some languages rely more on bound morphology to express grammatical meanings, whereas other languages rely more on lexical or word-order-based strategies. It is a moot point to what extent these differences correlate with demographic factors such as population size and language contact, and how complexity should be measured. In this study, we use information-theory, more specifically Kolmogorov Complexity, to assess morphological and word-order-based strategies and apply the procedure to three West Germanic languages, English, Dutch and German, which have been argued to form a continuum along the morphological complexity cline, plausibly due to different rates of demographic upheaval and concomitant language contact. Tracing the morphological and word-order-based complexity through time in parallel Bible translations, our results show that English is consistently less morphologically complex than its sisters, continuing on its path of morphological simplification. We see a statistically robust trade-off between morphology and word order complexity. We find support for earlier findings in the literature, with the exception of the difference between Dutch and German, which does not transpire in our results.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Data Availability Statement
The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.8112777.
Notes
1. Transparency-based complexity is sometimes also classified within the relative approach because it conflates both irregularity-based and efficiency-based complexity.
2. For convenience, the concise description is provided in the programming language Python rather than in natural language.
3. However, it does not encompass redundancy-based complexity, nor any of the metrics within the relative approach (for an in-depth explanation see Ehret, Citation2017, pp. 2–5).
4. We tried to test if it is indeed the case that the morphological complexity measures for Dutch are higher due to the differences in word segmentation. We applied the morphological complexity distortion algorithm as explained in section 2.3, but we first removed all whitespaces from the texts. The resulting morphological complexity measures are not decidedly different from the results that we get when the texts are distorted normally (so without removing whitespaces beforehand), i.e. the positions of Dutch and German are not suddenly switched.
5. The Dutch Hernse Bijbel (1360), Delftse Bijbel (1477), and Professorenbijbel (1911) consist solely of a transcript of the Old Testament, while the Dutch Noord-Nederlandse Vertaling van het Nieuwe Testament (1399) and Hamelsveld Bijbel (1790) as well as the German Mentelin Bibel (1460), Kölner Bibel (1478), Mainzer Bibel (1661), and Rosalino Bibel (1781) are transcripts of the New Testament.
6. It is important to note that the word-final approach does not completely exempt roots and stems from being distorted. For instance, words like candy are more than three characters long and are therefore still the target of the distortion algorithm, as any of the characters n, d and y could be deleted. While this revised approach is more restrictive than the original one, it is still not aware of any specific morphological rules. Still, it seems interesting to compare the two approaches to find out which renders better results.
7. From a processing point of view (relative complexity), a language with strict word order (such as English) is easier to decode than a language with free word order. However, as mentioned earlier in the paper (in Section 2.2), the present methodology is embedded within the absolute view on complexity. This absolute view is mainly concerned with the number of word order rules. In this view, a language with rigid word order is more complex, because such a language has more constraints on the word order rules. This line of reasoning goes back at least to Greenberg (Citation1960), who calculates, as one of his metrics, the proportion of word order links over the total number of nexus.
8. We first considered a model with the morphological complexity as the dependent variable and as the independent variables year in interaction with language. However, the size of our dataset (47 observations) does not allow such an elaborate model with two variables and interaction effects.
9. The small number for the estimate is due to the measurement scales: differences in the complexity ratio are noticeable at two decimal places, and the independent variable scales per year. This scale-dependent measure means that the low estimates should not be taken as an indication of a vanishingly small effect. This is corroborated by the sizable R2 value for the significant trend in English.
10. We considered testing the claims in the literature on the demographic differences, working with average urban growth data (see also De Smet et al., Citation2017), using Granger Causality (see Moscoso Del Prado Martín, Citation2014; Rosemeyer & Van de Velde, Citation2021). Due to the low granularity of the time series (47 texts and only 7 centuries as time steps), this did not lead to any reliable results.