ABSTRACT
Dependency distance (DD) is a well-established measure of syntactic complexity. Previous studies largely focused on the linear dimension, mostly by mean of dependency distance (MDD). In the present study, a new quantitative indicator –mean hierarchical dependency distance (MHDD), is proposed to discuss DD-related issues. Combining MHDD and MDD, the study investigates syntactic complexity of different texts, using strictly length-controlled sentences of 12 text types from the Freiburg-Brown corpus of American English. Correlations of MHDD and MDD have been identified, and possible reasons are discussed from the mathematical and theoretical perspectives. Mathematically, one is that the numerator of MHDD overlaps with the denominator of MDD, both being (n-1) where n is the number of words in the sentence. The other is that the denominator of MHDD (maximum hierarchical layer: MAXHL) and the numerator of MDD (sum of DD: SOD), are positively correlated. We believe that it is the positive correlation of SOD and MAXHL that ensures the change of MDD and MHDD in the same direction. It is also worth noting that both MAXHL and SOD seem to be minimized at their respective data spectrum, which foreshadows the dependency distance minimization (DDM) tendency on the hierarchical dimension.
Acknowledgments
We sincerely thank the reviewers for their insightful, helpful and constructive suggestions, which contributed a lot to the improvement of this paper and inspire the authors in their future research.
Disclosure Statement
No potential conflict of interest was reported by the authors.
Notes
1. It is worth noting that the calculation of MHDD in this study is different from MHD (mean hierarchical distance) proposed by Jing and Liu (Citation2015) in three ways: First, Jing and Liu use the average value of all HDs in a sentence in referrence to the root word, rather than the HD between any governor and its dependents (which is always 1 in the current study); Second, the HL for the root word is ‘0’ in Jing and Liu’s study, but it is defined as ‘1’ here; Thirdly, the denominator is different in both studies (n Vs. MAXHL). The formula of Jing and Liu (Citation2015) is: So the MHD for the exemplar sentence will be (1+4+3+2+3+4+1+1+3+2+2+3+4+4)/14 = 34/14 = 2.64. Both MHDD and MHD measure syntactic complexity on the hierarchical dimension, with some subtle differences.
2. SOD in our study is exactly what Gildea and Temperley (Citation2010), and Futrell et al. (Citation2015) call ‘D’.
3. This corpus is available at http://www.lancaster.ac.uk/fass/projects/corpus/Frown/.
4. As to the ‘strictly length-controlled sentences’, Jiang and Liu (Citation2015, p. 100) have demonstrated that the fitting of a linear function, a power function, and an exponential function can all reflect the relationship between MDD and SL. In addition, using sentences of all varying lengths or research-controlled lengths will not change significantly the fact that the MDD can only increase slowly with SL, but research-controlled lengths have two advantages: Firstly, the range of SL chosen for the study produced the relatively more concentrated MDD changes, and is more useful for the discovery of new laws of language; Secondly, if using sentences of all possible lengths, with the decrease of sentence quantity at some length, DDs of some sentences have violent fluctuations, which are unfavourable for discovering language laws, and what’s worse, violent changes cause the fitting results of linear function to the whole treebank to be less desirable. Jiang and Liu also compare their research with Ferrer-i-Cancho and Liu (Citation2014) who reported changes in the relationship between DDs of different sentence lengths and SL from four corpora, and demonstrate that both ways of choosing sentences can manifest the changing regularity of MDD with sentences of different lengths, with only the parameters of the function differing.
5. A reason to exclude sentences below 5 is anti-DD minimization effect, which has been elaborated in Ferrer-i-cancho and Gómez-Rodríguez (Citation2021).
6. The Zipf-Alekseev function can well describe the distribution of language units of different physical lengths. As the distribution of dependence distance, be it depicted linearly by MDD or hierarchically by MHDD, is in essence a kind of length distribution, whose distribution patterns are with no exception well-captured by this function. The fitting results for the current treebank data confirm this and are presented in Appendix B for reference.
7. The MNBD fitting results for DD and HDD are presented in Appendices C and D respectively.
8. Due to the length of the sentence, we just split it into two halves at the word ‘because’ for clearer display.
9. According to Temperley (Citation2007, p. 267), the ‘predominant direction’ is ‘the general tendency for languages to be consistently head-first or head-last’, ‘English is mainly right-branching, but has left-branching structures in certain situations’. The predominant direction of the phrase in Figure 10 is right-branching.
10. The 12 text types are classified into 4 macro-domains as in Frown: News (A, B, C), Non-fiction (E, F, G, H), Academic (J) and Fiction (K, L, N, P).
11. Subjects are notated as ‘nsubj’. Objects are ‘obj’ (direct object). Complements are sub-classified as ‘ccomp’ (a clausal complement of a verb) and ‘xcomp’ (clausal complement of an adjective as a predicative or clausal complement without its own subject). Attributives are grouped into ‘amod’ (adjectives as modifiers), ‘compound’ (nouns as modifiers), ‘nmod:poss’ (possessive modifiers) and ‘acl:relcl’ (attributive clauses). Adverbials include ‘advcl’ (adverbial clause modifier) and ‘advmod’ (adverbial modifier). It may be noted that attributives, adverbials and complements normally parasite on subjects and objects, thus HL of the attributive groups also rise in agreement with the subject and object dependencies.
12. Subjects and objects of the main clause usually lie on lower hierarchies as the predicate is on the threshold level, higher-layered subject and object-position dependencies are of clauses.