770
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically

ORCID Icon, ORCID Icon & ORCID Icon
Pages 510-540 | Published online: 09 Dec 2021
 

ABSTRACT

Dependency distance (DD) is a well-established measure of syntactic complexity. Previous studies largely focused on the linear dimension, mostly by mean of dependency distance (MDD). In the present study, a new quantitative indicator –mean hierarchical dependency distance (MHDD), is proposed to discuss DD-related issues. Combining MHDD and MDD, the study investigates syntactic complexity of different texts, using strictly length-controlled sentences of 12 text types from the Freiburg-Brown corpus of American English. Correlations of MHDD and MDD have been identified, and possible reasons are discussed from the mathematical and theoretical perspectives. Mathematically, one is that the numerator of MHDD overlaps with the denominator of MDD, both being (n-1) where n is the number of words in the sentence. The other is that the denominator of MHDD (maximum hierarchical layer: MAXHL) and the numerator of MDD (sum of DD: SOD), are positively correlated. We believe that it is the positive correlation of SOD and MAXHL that ensures the change of MDD and MHDD in the same direction. It is also worth noting that both MAXHL and SOD seem to be minimized at their respective data spectrum, which foreshadows the dependency distance minimization (DDM) tendency on the hierarchical dimension.

Acknowledgments

We sincerely thank the reviewers for their insightful, helpful and constructive suggestions, which contributed a lot to the improvement of this paper and inspire the authors in their future research.

Disclosure Statement

No potential conflict of interest was reported by the authors.

Notes

1. It is worth noting that the calculation of MHDD in this study is different from MHD (mean hierarchical distance) proposed by Jing and Liu (Citation2015) in three ways: First, Jing and Liu use the average value of all HDs in a sentence in referrence to the root word, rather than the HD between any governor and its dependents (which is always 1 in the current study); Second, the HL for the root word is ‘0’ in Jing and Liu’s study, but it is defined as ‘1’ here; Thirdly, the denominator is different in both studies (n Vs. MAXHL). The formula of Jing and Liu (Citation2015) is: MHD=1ni=1nHDi So the MHD for the exemplar sentence will be (1+4+3+2+3+4+1+1+3+2+2+3+4+4)/14 = 34/14 = 2.64. Both MHDD and MHD measure syntactic complexity on the hierarchical dimension, with some subtle differences.

2. SOD in our study is exactly what Gildea and Temperley (Citation2010), and Futrell et al. (Citation2015) call ‘D’.

4. As to the ‘strictly length-controlled sentences’, Jiang and Liu (Citation2015, p. 100) have demonstrated that the fitting of a linear function, a power function, and an exponential function can all reflect the relationship between MDD and SL. In addition, using sentences of all varying lengths or research-controlled lengths will not change significantly the fact that the MDD can only increase slowly with SL, but research-controlled lengths have two advantages: Firstly, the range of SL chosen for the study produced the relatively more concentrated MDD changes, and is more useful for the discovery of new laws of language; Secondly, if using sentences of all possible lengths, with the decrease of sentence quantity at some length, DDs of some sentences have violent fluctuations, which are unfavourable for discovering language laws, and what’s worse, violent changes cause the fitting results of linear function to the whole treebank to be less desirable. Jiang and Liu also compare their research with Ferrer-i-Cancho and Liu (Citation2014) who reported changes in the relationship between DDs of different sentence lengths and SL from four corpora, and demonstrate that both ways of choosing sentences can manifest the changing regularity of MDD with sentences of different lengths, with only the parameters of the function differing.

5. A reason to exclude sentences below 5 is anti-DD minimization effect, which has been elaborated in Ferrer-i-cancho and Gómez-Rodríguez (Citation2021).

6. The Zipf-Alekseev function can well describe the distribution of language units of different physical lengths. As the distribution of dependence distance, be it depicted linearly by MDD or hierarchically by MHDD, is in essence a kind of length distribution, whose distribution patterns are with no exception well-captured by this function. The fitting results for the current treebank data confirm this and are presented in Appendix B for reference.

7. The MNBD fitting results for DD and HDD are presented in Appendices C and D respectively.

8. Due to the length of the sentence, we just split it into two halves at the word ‘because’ for clearer display.

9. According to Temperley (Citation2007, p. 267), the ‘predominant direction’ is ‘the general tendency for languages to be consistently head-first or head-last’, ‘English is mainly right-branching, but has left-branching structures in certain situations’. The predominant direction of the phrase in Figure 10 is right-branching.

10. The 12 text types are classified into 4 macro-domains as in Frown: News (A, B, C), Non-fiction (E, F, G, H), Academic (J) and Fiction (K, L, N, P).

11. Subjects are notated as ‘nsubj’. Objects are ‘obj’ (direct object). Complements are sub-classified as ‘ccomp’ (a clausal complement of a verb) and ‘xcomp’ (clausal complement of an adjective as a predicative or clausal complement without its own subject). Attributives are grouped into ‘amod’ (adjectives as modifiers), ‘compound’ (nouns as modifiers), ‘nmod:poss’ (possessive modifiers) and ‘acl:relcl’ (attributive clauses). Adverbials include ‘advcl’ (adverbial clause modifier) and ‘advmod’ (adverbial modifier). It may be noted that attributives, adverbials and complements normally parasite on subjects and objects, thus HL of the attributive groups also rise in agreement with the subject and object dependencies.

12. Subjects and objects of the main clause usually lie on lower hierarchies as the predicate is on the threshold level, higher-layered subject and object-position dependencies are of clauses.

Additional information

Funding

This research was supported by the National Social Science Foundation of China for the project ‘Quantitative Linguistic Research of Text Features in English and Chinese’ under Grant No. [15BYY098] and by the Introduction of Talent Project of Guizhou University ‘A Comparative Study of Chinese and English Text Characteristics based on quantitative linguistic indicators’ under Grant No. [(2017) 020].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.