ABSTRACT
The study justifies phrases (instead of words) as the direct constituents of clauses using English academic research articles (RAs) by fitting the Menzerath-Altmann Law (MAL) to the correlations between sentence length (in clause numbers) and clause length (in phrase numbers), and between clause length and phrase length (in word numbers). The study was conducted under the framework of dependency grammar, with a self-built corpus of 104,499 tokens. The results indicate that 1) both correlations abide by the MAL, or abide when excluding the second MAL regime, regardless of part-genres, suggesting that phrases are the appropriate direct constituents of clauses. 2) The increasing MAL fitting curve is found in Abstract, which has the shortest mean dependency distance. 3) Clustering analysis on the MAL parameters in dependency structure can effectively differentiate the size-restricted Abstract from the other part-genres at the lower level, i.e., clause-phrase-word level. Furthermore, our findings have important implications: 1) The second regime may be closely related to text characteristics or linguistic unit levels. 2) The dependency distance minimization effect is possibly a counteractive force against the Menzerathian shortening effect. 3) The competing relationship between clausal and phrasal complexity in RAs is driven by the self-regulating and self-adapting language system.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Notes
1. NLREG (Nonlinear Regression and Curve Fitting) is available from https://www.nlreg.com.
2. Mačutek et al. (Citation2017) presented a different picture, in which the fitting curve showed that the phrase length decreased monotonically with the increase of clause length. It should be noted that only main clauses were analyzed in their study. In addition, Mačutek et al. (Citation2021) later pointed out that their statistics in Mačutek et al. (Citation2017) about the phrase length of mono-phrasal clauses should be interpreted with caution because of the Czech language.