ABSTRACT
The present study investigates the relationship between two features of dependencies, namely, dependency distances and dependency frequencies. The study is based on the analysis of a parallel dependency treebank that includes 10 Indo-European languages. Two corresponding random dependency treebanks are generated as baselines for comparison. After computing the values of dependency distances and their frequencies in these treebanks, for each lan-guage, we fit four functions, namely quadratic, exponent, logarithm, and power-law func-tions, to its original and random datasets. The preliminary result shows that there is a rela-tion between the two dependency features for all 10 Indo-European languages. The relation can be further formalized as a power-law function which can distinguish the observed data from randomly generated datasets.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Notes
1. Hudson’s original measures takes two adjacent words to have distance zero. We prefer the alternative definition where x = y ⟺ d(x,y) = 0, i.e. a word has distance zero with itself, making the measure a metric in the mathematical sense.
2. For more details,https://github.com/UniversalDependencies/UD_English-PUD.
3. All parameter values in the models were obtained by NLREG (version 6.3). The same below.