41
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Quantifying Syntactic Complexity in Czech Texts: An Analysis of Mean Dependency Distance and Average Sentence Length Across Genres

ORCID Icon & ORCID Icon
Pages 260-273 | Received 13 Apr 2024, Accepted 17 Jun 2024, Published online: 01 Jul 2024
 

ABSTRACT

This study investigates the syntactic complexity of various text-types in the Czech language by analysing the mean dependency distance (MDD), a measure that quantifies the average distance between syntactic heads and their dependents within a sentence, and average sentence length (ASL). Using data from the SYN2020 corpus, a large and balanced collection of contemporary written Czech, we calculate the MDD and ASL for different text-types. Our findings reveal distinct patterns in the MDD and ASL values across genres, suggesting that syntactic complexity varies among different types of texts. We observe a clear distinction between fiction and non-fiction genres, with fiction exhibiting lower MDD and ASL values, indicating a more compact syntactic structure. Non-fiction genres, particularly scientific literature, display higher MDD and ASL values, reflecting more complex syntactic constructions. Journalistic texts, such as newspapers and magazines, fall between fiction and non-fiction in terms of MDD and ASL values. These results demonstrate the potential of MDD and ASL as quantitative measures for characterizing and differentiating text-types based on their syntactic complexity. Furthermore, our analysis contributes to a deeper understanding of the syntactic variations across diverse genres in the Czech language.

Acknowledgments

This work was supported by the Czech Science Foundation (GAČR), project No. 22-20632S.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Notes

1. The dependency tree also contains additional information, such as the part-of-speech (POS) tags of words (e.g. VERB, NOUN, ADJ) and the types of dependencies (e.g. root: the root of the sentence, nsubj: noun subject, obj: object, amod: adjective modifier). However, in our current study, we focus solely on analysing the dependency distances (DD) between heads and their dependents, as our primary interest lies in quantifying the syntactic complexity through the MDD measure. Therefore, the root dependency is excluded. Furthermore, while the POS tags and dependency types can provide valuable insights in other contexts, they are not directly relevant to the scope of this particular research.

Additional information

Funding

The work was supported by the Grantová Agentura České Republiky [22-20632S].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 394.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.