Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically: Journal of Quantitative Linguistics: Vol 29 , No 4

Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/09296174.2021.2005960?needAccess=true

ABSTRACT

Dependency distance (DD) is a well-established measure of syntactic complexity. Previous studies largely focused on the linear dimension, mostly by mean of dependency distance (MDD). In the present study, a new quantitative indicator –mean hierarchical dependency distance (MHDD), is proposed to discuss DD-related issues. Combining MHDD and MDD, the study investigates syntactic complexity of different texts, using strictly length-controlled sentences of 12 text types from the Freiburg-Brown corpus of American English. Correlations of MHDD and MDD have been identified, and possible reasons are discussed from the mathematical and theoretical perspectives. Mathematically, one is that the numerator of MHDD overlaps with the denominator of MDD, both being (n-1) where n is the number of words in the sentence. The other is that the denominator of MHDD (maximum hierarchical layer: MAXHL) and the numerator of MDD (sum of DD: SOD), are positively correlated. We believe that it is the positive correlation of SOD and MAXHL that ensures the change of MDD and MHDD in the same direction. It is also worth noting that both MAXHL and SOD seem to be minimized at their respective data spectrum, which foreshadows the dependency distance minimization (DDM) tendency on the hierarchical dimension.

Acknowledgments

We sincerely thank the reviewers for their insightful, helpful and constructive suggestions, which contributed a lot to the improvement of this paper and inspire the authors in their future research.

Disclosure Statement

No potential conflict of interest was reported by the authors.

Notes

1. It is worth noting that the calculation of MHDD in this study is different from MHD (mean hierarchical distance) proposed by Jing and Liu (Citation2015) in three ways: First, Jing and Liu use the average value of all HDs in a sentence in referrence to the root word, rather than the HD between any governor and its dependents (which is always 1 in the current study); Second, the HL for the root word is ‘0’ in Jing and Liu’s study, but it is defined as ‘1’ here; Thirdly, the denominator is different in both studies (n Vs. MAXHL). The formula of Jing and Liu (Citation2015) is: $M H D = \frac{1}{n} \sum_{i = 1}^{n} H D_{i}$ So the MHD for the exemplar sentence will be (1+4+3+2+3+4+1+1+3+2+2+3+4+4)/14 = 34/14 = 2.64. Both MHDD and MHD measure syntactic complexity on the hierarchical dimension, with some subtle differences.

2. SOD in our study is exactly what Gildea and Temperley (Citation2010), and Futrell et al. (Citation2015) call ‘D’.

3. This corpus is available at http://www.lancaster.ac.uk/fass/projects/corpus/Frown/.

4. As to the ‘strictly length-controlled sentences’, Jiang and Liu (Citation2015, p. 100) have demonstrated that the fitting of a linear function, a power function, and an exponential function can all reflect the relationship between MDD and SL. In addition, using sentences of all varying lengths or research-controlled lengths will not change significantly the fact that the MDD can only increase slowly with SL, but research-controlled lengths have two advantages: Firstly, the range of SL chosen for the study produced the relatively more concentrated MDD changes, and is more useful for the discovery of new laws of language; Secondly, if using sentences of all possible lengths, with the decrease of sentence quantity at some length, DDs of some sentences have violent fluctuations, which are unfavourable for discovering language laws, and what’s worse, violent changes cause the fitting results of linear function to the whole treebank to be less desirable. Jiang and Liu also compare their research with Ferrer-i-Cancho and Liu (Citation2014) who reported changes in the relationship between DDs of different sentence lengths and SL from four corpora, and demonstrate that both ways of choosing sentences can manifest the changing regularity of MDD with sentences of different lengths, with only the parameters of the function differing.

5. A reason to exclude sentences below 5 is anti-DD minimization effect, which has been elaborated in Ferrer-i-cancho and Gómez-Rodríguez (Citation2021).

6. The Zipf-Alekseev function can well describe the distribution of language units of different physical lengths. As the distribution of dependence distance, be it depicted linearly by MDD or hierarchically by MHDD, is in essence a kind of length distribution, whose distribution patterns are with no exception well-captured by this function. The fitting results for the current treebank data confirm this and are presented in Appendix B for reference.

7. The MNBD fitting results for DD and HDD are presented in Appendices C and D respectively.

8. Due to the length of the sentence, we just split it into two halves at the word ‘because’ for clearer display.

9. According to Temperley (Citation2007, p. 267), the ‘predominant direction’ is ‘the general tendency for languages to be consistently head-first or head-last’, ‘English is mainly right-branching, but has left-branching structures in certain situations’. The predominant direction of the phrase in Figure 10 is right-branching.

10. The 12 text types are classified into 4 macro-domains as in Frown: News (A, B, C), Non-fiction (E, F, G, H), Academic (J) and Fiction (K, L, N, P).

11. Subjects are notated as ‘nsubj’. Objects are ‘obj’ (direct object). Complements are sub-classified as ‘ccomp’ (a clausal complement of a verb) and ‘xcomp’ (clausal complement of an adjective as a predicative or clausal complement without its own subject). Attributives are grouped into ‘amod’ (adjectives as modifiers), ‘compound’ (nouns as modifiers), ‘nmod:poss’ (possessive modifiers) and ‘acl:relcl’ (attributive clauses). Adverbials include ‘advcl’ (adverbial clause modifier) and ‘advmod’ (adverbial modifier). It may be noted that attributives, adverbials and complements normally parasite on subjects and objects, thus HL of the attributive groups also rise in agreement with the subject and object dependencies.

12. Subjects and objects of the main clause usually lie on lower hierarchies as the predicate is on the threshold level, higher-layered subject and object-position dependencies are of clauses.

Jing, Y., & Liu, H. (2015). Mean hierarchical distance augmenting mean dependency distance. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015). Uppsala: Uppsala University.

Google Scholar

Gildea, D., & Temperley, D. (2010). Do grammars minimize dependency length? Cognitive Science, 34(2), 286–310. https://doi.org/10.1111/j.1551-6709.2009.01073.x

PubMed Web of Science ®Google Scholar

Futrell, R., Mahowald, K., & Gibson, E. (2015). Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences, 112(33), 10336–10341. https://doi.org/10.1073/pnas.1502134112

PubMed Web of Science ®Google Scholar

Jiang, J., & Liu, H. (2015). The effects of sentence length on dependency distance, dependency direction and the implications — Based on a parallel English-Chinese dependency Treebank. Language Science, 50, 93–104. https://doi.org/10.1016/j.langsci.2015.04.002

Web of Science ®Google Scholar

Ferrer-i-Cancho, R. & Liu, H. (2014). The risks of mixing dependency lengths from sequences of different length. Glottotheory, 5(2), 143–155. https://doi.org/10.1515/glot-2014-0014

Google Scholar

Ferrer-i-cancho, R., & Gómez-Rodríguez, C. (2021). Anti dependency distance minimization in short sequences. A graph theoretic approach. Journal of Quantitative Linguistics, 28(1), 50–76. https://doi.org/10.1080/09296174.2019.1645547

Web of Science ®Google Scholar

Temperley, D. (2007). Minimization of dependency length in written English. Cognition, 105(2), 300–333. https://doi.org/10.1016/j.cognition.2006.09.011

PubMed Web of Science ®Google Scholar

Additional information

Funding

This research was supported by the National Social Science Foundation of China for the project ‘Quantitative Linguistic Research of Text Features in English and Chinese’ under Grant No. [15BYY098] and by the Introduction of Talent Project of Guizhou University ‘A Comparative Study of Chinese and English Text Characteristics based on quantitative linguistic indicators’ under Grant No. [(2017) 020].

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 394.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically

ABSTRACT

Acknowledgments

Disclosure Statement

Notes

Additional information

Funding

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature