304
Views
31
CrossRef citations to date
0
Altmetric
Original Articles

Dependency-length minimization in natural and artificial languages∗

Pages 256-282 | Published online: 20 Jun 2008
 

Abstract

A wide range of evidence points to a preference for syntactic structures in which dependencies are short. Here we examine the question: what kinds of dependency configurations minimize dependency length? We consider two well-established principles of dependency-length minimization; that dependencies should be consistently right-branching or left-branching, and that shorter dependent phrases should be closer to the head. We also add a third, novel, principle; that some “opposite-branching” of one-word phrases is desirable. In a series of computational experiments, using unordered dependency trees gathered from written English, we examine the effect of these three principles on dependency length, and show that all three contribute significantly to dependency-length reduction. Finally, we present what appears to be the optimal “grammar” for dependency-length minimization.

Notes

1Thanks to Daniel Gildea for valuable comments on a draft of this paper.

2Most dependency grammars either prohibit crossing dependencies completely (Gaifman, Citation1965; Mel'cuk, Citation1987) or allow them only under very limited circumstances (Steedman, Citation1985; Hudson, Citation1990). There are, however, some well-known examples of crossing dependencies in certain languages such as Dutch (Bresnan et al., Citation1982; Joshi, Citation1990).

3Regarding views on dependencies; see, for example, Jackendoff (Citation1977), Mel'cuk (Citation1987), Pollard and Sag (Citation1987), Hudson (Citation1990), Dryer (Citation1992), Hawkins (Citation1994), Radford (Citation1997), Gibson (Citation1998) and Collins (Citation1999). Another controversial case is co-ordination constructions: some have suggested that the head of a co-ordinate phrase like John and Fred is the first conjunct (Mel'cuk, Citation1987), others argue that it is the conjunction (Munn, Citation1993), and still others argue that both conjuncts act as heads, connecting to other words outside the coordinate phrase (Hudson, Citation1990). For the most part, however, the dependency structure of syntactic constructions is clear and unproblematic.

4Gibson first stated this principle as part of his Syntactic Prediction Locality Theory (1998), which he later modified and renamed the Dependency Locality Theory (2000). As Gibson acknowledges, a number of other theories of syntactic complexity have been put forth, going back over several decades. Gibson argues, however, that no other theory accounts for the range of phenomena that is accounted for by dependency-length minimization (see especially Gibson, Citation1998, pp. 1–8).

5Hawkins' research brings together a wide variety of phenomena relating to dependency-length minimization, and the current study draws heavily on his findings. Hawkins' EIC (Early Immediate Constituent) theory argues that language processing is facilitated if the heads of the children within each constituent are clustered together within a short “window”, known as the “constituent recognition domain”; this is advantageous as it provides the parser with “earlier and more rapid access” to the children of the larger constituent (1994, p. 66). While this theory is clearly related to dependency-length minimization, it is not quite the same, and in some cases the two theories make different predictions. For example, in cases where a word has three dependent phrases on the same side, Hawkins' EIC principle predicts only that the longest phrase will be furthest from the head; it predicts no ordering preference between the shorter two phrases, as the constituent recognition domain will be the same size in either case. By contrast, the dependency-length view predicts that the shortest phrase will be closest to the head. A study of this situation in the case of verbs with three adjunct phrases supports the dependency-length view (Temperley, 2007).

6The theory presented here is the “alternate version” of the BDT (Dryer, Citation1992, p. 116). An earlier version of the theory is phrased not in terms of heads and dependents but in terms of constituent structure, stating that a pair of constituents X and Y will pattern consistently only if X is non-phrasal and Y is phrasal (1992, p. 89). Dryer concludes that the alternate version of the theory is more elegant, but notes that it relies on assumptions about head-dependent relationships that are in some cases controversial.

7The test set included all sentences in section 00 of the treebank containing 100 words or less. One sentence in section 00 was excluded for this reason.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.