In the past decade, dependencies, i.e., directed labeled graph structures representing hierarchical relations between morphemes, words and semantic units, have become the near-standard representation in many fields of computational linguistics – including, e.g., parsing, generation, and machine translation. The linguistic significance of these structures often remains vague, and the need for the development of common notational and formal grounds is felt strongly by many people working in these fields.
Historically, the generative grammatical tradition that, in its origins, solely attempts to construct a system that distinguishes grammatical from ungrammatical sentences, left linguistics in a state where the outcome of the grammatical analysis, namely phrase structure, was difficult to connect to deeper (semantic, conceptual) structures. The result was a complete separation between, on one side, Natural Language Processing (NLP) that needed deeper analyses for translation, classification, generation etc. and, on the other side, generative linguistics that built structures which grew more and more complex as languages farther away from English began to be addressed, with the declared goal to model Language as a whole. In the second half of the 20th century, only a few linguists, often referring to Lucien Tesnière, continued to describe language in terms of dependency, mainly because they were working on free word order languages, where the use of phrase structure has been obviously not appropriate.
Since the 1990s, NLP is turning towards dependency analysis, and in the past five years dependency has become quasi-hegemonic: The very large majority of parsers presented in recent NLP conferences are explicitly dependency-based. It seems, however, that the connection between computational linguists and dependency linguists remains sporadic. A very common procedure is that an existing phrase structure tree bank is transferred into a dependency format that fits the computational linguist's needs, and other researchers attempt to reproduce this annotation, with statistical or rule-based grammars. This is not to say that the situation was any better when parsers still derived phrase structures and linguistics discussed “move alpha”. Yet, we believe that the circumstances are different today and dependency linguists and computational linguists have a lot to share: We know that statistical parsing gives better results if we have a linguistically coherent corpus analysis. We need to know what the differences are between surface and deep dependency. What are the units that appear in dependency analysis? What kind of analysis works for which application? How to link dependency structures to the lexicon and to semantics?
Kim Gerdes, Eva Hajičová and Leo Wanner