An Information-Theoretic Approach to Morphosyntactic Complexity in English, Dutch and German: Journal of Quantitative Linguistics: Vol 0, No 0

ABSTRACT

Though equi-complexity of languages has long been an assumption in the study of language, recent research has argued that languages differ with regard to their morphological complexity. Some languages rely more on bound morphology to express grammatical meanings, whereas other languages rely more on lexical or word-order-based strategies. It is a moot point to what extent these differences correlate with demographic factors such as population size and language contact, and how complexity should be measured. In this study, we use information-theory, more specifically Kolmogorov Complexity, to assess morphological and word-order-based strategies and apply the procedure to three West Germanic languages, English, Dutch and German, which have been argued to form a continuum along the morphological complexity cline, plausibly due to different rates of demographic upheaval and concomitant language contact. Tracing the morphological and word-order-based complexity through time in parallel Bible translations, our results show that English is consistently less morphologically complex than its sisters, continuing on its path of morphological simplification. We see a statistically robust trade-off between morphology and word order complexity. We find support for earlier findings in the literature, with the exception of the difference between Dutch and German, which does not transpire in our results.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

The data that support the findings of this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.8112777.

Notes

1. Transparency-based complexity is sometimes also classified within the relative approach because it conflates both irregularity-based and efficiency-based complexity.

2. For convenience, the concise description is provided in the programming language Python rather than in natural language.

3. However, it does not encompass redundancy-based complexity, nor any of the metrics within the relative approach (for an in-depth explanation see Ehret, Citation2017, pp. 2–5).

4. We tried to test if it is indeed the case that the morphological complexity measures for Dutch are higher due to the differences in word segmentation. We applied the morphological complexity distortion algorithm as explained in section 2.3, but we first removed all whitespaces from the texts. The resulting morphological complexity measures are not decidedly different from the results that we get when the texts are distorted normally (so without removing whitespaces beforehand), i.e. the positions of Dutch and German are not suddenly switched.

5. The Dutch Hernse Bijbel (1360), Delftse Bijbel (1477), and Professorenbijbel (1911) consist solely of a transcript of the Old Testament, while the Dutch Noord-Nederlandse Vertaling van het Nieuwe Testament (1399) and Hamelsveld Bijbel (1790) as well as the German Mentelin Bibel (1460), Kölner Bibel (1478), Mainzer Bibel (1661), and Rosalino Bibel (1781) are transcripts of the New Testament.

6. It is important to note that the word-final approach does not completely exempt roots and stems from being distorted. For instance, words like candy are more than three characters long and are therefore still the target of the distortion algorithm, as any of the characters n, d and y could be deleted. While this revised approach is more restrictive than the original one, it is still not aware of any specific morphological rules. Still, it seems interesting to compare the two approaches to find out which renders better results.

7. From a processing point of view (relative complexity), a language with strict word order (such as English) is easier to decode than a language with free word order. However, as mentioned earlier in the paper (in Section 2.2), the present methodology is embedded within the absolute view on complexity. This absolute view is mainly concerned with the number of word order rules. In this view, a language with rigid word order is more complex, because such a language has more constraints on the word order rules. This line of reasoning goes back at least to Greenberg (Citation1960), who calculates, as one of his metrics, the proportion of word order links over the total number of nexus.

8. We first considered a model with the morphological complexity as the dependent variable and as the independent variables year in interaction with language. However, the size of our dataset (47 observations) does not allow such an elaborate model with two variables and interaction effects.

9. The small number for the estimate is due to the measurement scales: differences in the complexity ratio are noticeable at two decimal places, and the independent variable scales per year. This scale-dependent measure means that the low estimates should not be taken as an indication of a vanishingly small effect. This is corroborated by the sizable R² value for the significant trend in English.

10. We considered testing the claims in the literature on the demographic differences, working with average urban growth data (see also De Smet et al., Citation2017), using Granger Causality (see Moscoso Del Prado Martín, Citation2014; Rosemeyer & Van de Velde, Citation2021). Due to the low granularity of the time series (47 texts and only 7 centuries as time steps), this did not lead to any reliable results.

Additional information

Funding

This work was supported by the Research Foundation Flanders (FWO) under Grant number G071719N. We wish to thank Eloisa Ruppert for preliminary work.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 394.00 Add to cart

* Local tax will be added as applicable

An Information-Theoretic Approach to Morphosyntactic Complexity in English, Dutch and German

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

An Information-Theoretic Approach to Morphosyntactic Complexity in English, Dutch and German

ABSTRACT

Disclosure Statement

Data Availability Statement

Notes

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature