Crosslinguistic quantitative syntax: Dependency length minimization and beyond
Richard Futrell (MIT)
Work with Kyle Mahowald and Ted Gibson
Tuesday 17 May 2016, 11:00–12:30
1.17 Dugald Stewart Building
Recently available crosslinguistic dependency corpora, such as those from the Universal Dependencies project, have made it possible to investigate the quantitative syntactic properties of dozens of languages. We use these corpora to test the quantitative predictions of communicative functional explanations for typological universals.
One such prominent theory is dependency length minimization: various psycholinguistic theories have suggested that language is easier to produce and comprehend when distances between words linked in syntactic dependencies are short, and a functional pressure for short dependencies has been advanced as a theory to explain word order universals. In order to investigate this effect at a large scale in dependency corpora, we compare the dependency length of attested sentences to random baseline reorderings of those sentences under a series of linguistically motivated constraints. First, we establish that a dependency length minimization effect exists in over 40 languages by showing that real sentences have shorter dependency lengths than random projective reorderings with fixed and free word order. Next, we address the question of whether the observed minimization is the result of utterance-specific speaker choice (usage) or constraints on possible orders in a language (grammar). To investigate this, we develop probabilistic models to induce linearization grammars from dependency corpora, and compare the attested orders for sentences in the corpora to random grammatical orders for the same sentences under these models. In most languages, attested dependency length is shorter than this baseline as well. Subject to the limitations of the linearization models, this result suggests active dependency length minimization in usage. Finally, we discuss some of the residual variance between languages in dependency length, which cannot be explained by a simple dependency length minimization account. We show that languages have less dependency length minimization when they are head-final, morphologically complex, and/or have high degrees of word order freedom. Explaining these phenomena is a challenge for functional typology.
