Abstract

Parsing is a key task in natural language processing. It involves predicting, for each natural language sentence, an abstract representation of the grammatical entities in the sentence and the relations between these entities. This representation provides an interface to compositional semantics and to the notions of “who did what to whom.” The last two decades have seen great advances in parsing English, leading to major leaps also in the performance of applications that use parsers as part of their backbone, such as systems for information extraction, sentiment analysis, text summarization, and machine translation. Attempts to replicate the success of parsing English for other languages have often yielded unsatisfactory results. In particular, parsing languages with complex word structure and flexible word order has been shown to require non-trivial adaptation. This special issue reports on methods that successfully address the challenges involved in parsing a range of morphologically rich languages (MRLs). This introduction characterizes MRLs, describes the challenges in parsing MRLs, and outlines the contributions of the articles in the special issue. These contributions present up-to-date research efforts that address parsing in varied, cross-lingual settings. They show that parsing MRLs addresses challenges that transcend particular representational and algorithmic choices.

1. Parsing MRLs

Parsing is a central task in natural language processing, where a system accepts a sentence in a natural language as input and provides a syntactic representation of the entities and grammatical relations in the sentence as output. The input sentences to a parser reflect language-specific properties (in terms of the order of words, the word forms, the lexical items, and so on), whereas the output abstracts away from these properties in order to yield a structured, formal representation that reflects the functions of the different elements in the sentence.

The best broad-coverage parsing systems to date use statistical models, possibly in combination with hand-crafted grammars. They use machine learning techniques that allow the system to generalize the syntactic patterns characterizing the data. These machine learning methods are trained on a treebank, that is, a collection of natural language sentences which are annotated with their correct syntactic analyses. Based on the patterns and frequencies observed in the treebank, parsing algorithms are designed to suggest and score novel analyses for unseen sentences, and search for the most likely analysis.

The release of a large-scale annotated corpus for English, the Wall Street Journal Penn Treebank (PTB) (Marcus, Santorini, and Marcinkiewicz 1993), led to a significant leap in the performance of statistical parsing for English (Magerman 1995; Collins 1997; Charniak 2000; Charniak and Johnson 2005; Petrov et al. 2006; Huang 2008; Finkel, Kleeman, and Manning 2008; Carreras, Collins, and Koo 2008). At the time of their publication, each of these models improved the state-of-the-art of English parsing, bringing constituency-based parsing performance on the standard test set of the PTB to the level of 92% F1-score using the ParsEval evaluation metrics (Black et al. 1991).

The last decade has seen the development of large-scale annotated treebanks for languages such as Arabic (Maamouri et al. 2004), French (Abeillé, Clément, and Toussenel 2003), German (Uszkoreit 1987; Skut et al. 1997), Hebrew (Sima'an et al. 2001), Swedish (Nivre and Megyesi 2007), and others. The availability of syntactically annotated corpora for these languages had initially raised the hope of attaining the same level of parsing performance on these languages, by simply porting the existing models to the newly available corpora.

Early attempts to apply the aforementioned constituency-based parsing models to other languages have demonstrated that the success of these approaches was rather limited. This observation was confirmed for individual languages such as Czech (Collins et al. 1999), German (Dubey and Keller 2003), Italian (Corazza et al. 2004), French (Arun and Keller 2005), Modern Standard Arabic (Kulick, Gabbard, and Marcus 2006), Modern Hebrew (Tsarfaty and Sima'an 2007), and many more (Tsarfaty et al. 2010).

The same observation was independently confirmed by parallel research efforts on data-driven dependency-based parsing (Kübler, McDonald, and Nivre 2009). Results coming from multilingual parsing evaluation campaigns, such as the CoNLL shared tasks on multilingual dependency parsing, showed significant variation in the results of the same models applied to a range of typologically different languages. In particular, these results demonstrated that the morphologically rich nature of some of those languages makes them inherently harder to parse, regardless of the parsing technique used (Buchholz and Marsi 2006; Nivre et al. 2007a).

Morphologically rich languages (MRLs) express multiple levels of information already at the word level. The lexical information for each word form in an MRL may be augmented with information concerning the grammatical function of the word in the sentence, its grammatical relations to other words, pronominal clitics, inflectional affixes, and so on. In English, many of these notions are expressed implicitly by word order and adjacency: The direct object, for example, is generally the first NP after the verb and thus does not necessarily need an explicit marking. Expressing such functional information morphologically allows for a high degree of word-order variation, since grammatical functions need no longer be strongly associated with syntactic positions. Furthermore, lexical items appearing in different syntactic contexts may be realized in different forms. This leads to a high level of word-form variation and complicates lexical acquisition from small sized corpora.

2. The Overarching Challenges

The complexity of the linguistic patterns found in MRLs was shown to challenge parsing in many ways. For instance, standard models assume that a word always corresponds to a unique terminal in the parse tree. In Arabic, Hebrew, Turkish, and other languages, an input word-token may correspond to multiple terminals. Furthermore, models developed primarily to parse English draw substantial inference based on word-order patterns. Parsing non-configurational languages such as Hungarian may require relying on morphological information to infer equivalent functions. Parsing Czech or German is further complicated by case syncretism, which precludes a deterministic correlation between morphological case and grammatical functions. In languages such as Hungarian or Finnish, the diversity of word forms leads to a high rate of out-of-vocabulary words unseen in the annotated data. MRL parsing is thus often associated with increased lexical data sparseness. An MRL parser requires robust statistical methods for analyzing such phenomena.

Following Tsarfaty et al. (2010) we distinguish three overarching challenges that are associated with parsing MRLs.

(i) The Architectural Challenge. Contrary to English, where the input signal uniquely determines the sequence of tree terminals, word forms in an MRL may contain multiple units of information (morphemes). These morphemes have to be segmented in order to reveal the basic units of analysis. Furthermore, morphological analysis of MRL words may be highly ambiguous, and morphological segmentation may be a non-trivial task for certain languages. Therefore, a parsing architecture for an MRL must contain, at the very least, a morphological component for segmentation and a syntactic component for parsing. The challenge is thus to determine how these two models should be combined in the overall parsing architecture: Should we assume a pipeline architecture, where the morphological segmentation is disambiguated prior to parsing? Or should we construct a joint architecture where the model picks out a parse tree and a segmentation at once?

(ii) The Modeling Challenge. The design of a statistical parsing model requires specifying three formal elements: the formal output representation, the events that can be observed in the data, and the independence assumptions between these events. For an MRL, complex morphosyntactic interactions may impose constraints on the form of events and on their possible combination. In such cases, we may need to incorporate morphological information in the syntactic model explicitly. How should morphological information be treated in the syntactic model: as explicit tree decoration, as hidden variables, or as complex objects in their own right? Which morphological features should be explicitly encoded? Where should we mark morphological features: at the part-of-speech level, at phrase level, on dependency arcs? How do morphological and syntactic events interact, and how can we exploit these interactions for inferring correct overall structures?

(iii) The Lexical Challenge. A parsing model for MRLs requires recognizing the morphological information in each word form. Due to the high level of morphological variation, however, data-driven systems are not guaranteed to observe all morphological variants of a word form in a given annotated corpus. How can we assign correct morphological signatures to the lexical items in the face of such extreme data spareseness? When devising a model for parsing MRLs, one may want to make use of whatever additional resources one has access to—morphological analyzers, unlabeled data, and lexica—in order to extend the coverage of the parser and obtain robust and accurate predictions.

3. Contributions of this Special Issue

This special issue draws attention to the different ways in which researchers working on parsing MRLs address the challenges described herein. It contains six studies discussing parsing results for six languages, using both constituency-based and dependency-based frameworks (cf. Table 1). The first three studies (Seeker and Kuhn; Fraser et al.; Kallmeyer and Maier) focus on parsing European languages and deal with phenomena that lie within their flexible phrase ordering and rich morphology, including problems posed by case syncretism. The next two papers (Goldberg and Elhadad; Marton et al.) focus on Semitic languages and study the application of general-purpose parsing algorithms (constituency-based and dependency-based, respectively) to parsing such data. They empirically show gaps in performance between different architectures (pipeline vs. joint , gold vs. machine-predicted input), feature choices, and techniques for increasing lexical coverage of the parser. The last paper (Green et al.) is a comparative study on multi-word expression (MWE) recognition via two specialized parsing models applied to both French and Modern Standard Arabic. Let us briefly outline the individual contributions made by each of the articles in this special issue.

Table 1

Contributions to the CL special issue on parsing morphologically rich languages (CL-PMRL).


Constituency-Based
Dependency-Based
Arabic Green, de Marneffe, and Manning 2013 Marton, Habash, and Rambow 2013 
Czech  Seeker and Kuhn 2013 
French Green, de Marneffe, and Manning 2013  
German Kallmeyer and Maier 2013 Seeker and Kuhn 2013 
 Fraser et al. 2013  
Hebrew Goldberg and Elhadad 2013  
Hungarian  Seeker and Kuhn 2013 

Constituency-Based
Dependency-Based
Arabic Green, de Marneffe, and Manning 2013 Marton, Habash, and Rambow 2013 
Czech  Seeker and Kuhn 2013 
French Green, de Marneffe, and Manning 2013  
German Kallmeyer and Maier 2013 Seeker and Kuhn 2013 
 Fraser et al. 2013  
Hebrew Goldberg and Elhadad 2013  
Hungarian  Seeker and Kuhn 2013 

Seeker and Kuhn present a comparative study of dependency parsing for three European MRLs from different typological language families: German (Germanic), Czech (Slavonic), and Hungarian (Finno-Ugric). Although all these languages possess richer morphological marking than English, there is variation among these languages in terms of the richness of the morphological information encoded in the word forms, and the ambiguity of these morphological markers. Hungarian is agglutinating, that is, morphological markers in Hungarian are non-ambiguous and easy to recognize. German and Czech are fusional languages with different types of case syncretism. Seeker and Kuhn use the Bohnet Parser (Bohnet 2010) to parse all these languages, and show that not using morphological information in the statistical feature model is detrimental. Using gold morphology significantly improves results for all these languages, whereas automatically predicted morphology leads to smaller improvements for the fusional languages, relative to the agglutinating one. To combat this loss in performance, they add linguistic constraints to the decoder, restricting the possible structures. They show that a decoding algorithm which filters out dependency parses that do not obey predicate-argument constraints allows the authors to obtain more substantial gains from morphology.

Fraser et al. also focus on parsing German, though in a constituency-based setting. They use a PCFG-based unlexicalized chart parser (Schmid 2004) along with a set of manual treebank annotations that bring the treebank grammar performance to the level of automatically predicted states learned by Petrov et al. (2006). As in the previous study, syncretism is shown to cause ambiguity that hurts parsing performance. To combat this added ambiguity, they use external information sources. In particular, they show different ways of using information from monolingual and bilingual data sets in a re-ranking framework for improving parsing accuracy. The bilingual approach is inspired by machine translation studies and exploits the variation in marking the same grammatical functions differently across languages for increasing the confidence of a disambiguation decision in one language by observing a parallel non-ambiguous structure in the other one.

These two studies use German corpora stripped of discontinuous constituents in order to benchmark their parsers. In each of these cases, the discontinuities are converted into pure tree structures, thus ignoring the implied long distance dependencies. Kallmeyer and Maier propose an alternative approach for parsing such languages by presenting an overall solution for parsing discontinuous structures directly. They present a parsing model based on Probabilistic Linear Context-Free Rewriting Systems (PLCFRS), which implements many of the technological advances that were developed in the context of parsing with PCFGs. In particular, they present a decoding algorithm based on weighted deductive CKY parsing, and use it in conjunction with PLCFRS parameters directly estimated from treebank data. Because PLCFRS is a powerful formalism, the parser needs to be tuned for speed. The authors present several admissible heuristics that facilitate faster A* parsing. The authors present parsing results that are competitive with constituency-based parsing of German while providing invaluable information concerning discontinuous constituents and long distance dependencies.

Goldberg and Elhadad investigate constituency parsing for Modern Hebrew (Semitic), a language which is known to have a very rich and ambiguous morphological structure. They empirically show that an application of the split-and-merge general-purpose model of Petrov et al. (2006) for parsing Hebrew does not guarantee accurate parsing in and of itself. In order to obtain competitive parsing performance, they address all three challenges we have noted. In order to deal with the problem of word segmentation (the architectural challenge), they extend the chart-based decoder of Petrov et al. with a lattice-based decoder. In order to handle morphological marking patterns (the modeling challenge), they refine the initial treebank with particularly targeted state-splits, and add a set of linguistic constraints that act as a filter ruling out trees that violate agreement. Finally, they add information from an external wide-coverage lexicon to combat lexical sparseness (the lexical challenge). They show that the contribution of these different methods is cumulative, yielding state-of-the-art results on constituency parsing of Hebrew.

Marton et al. study dependency parsing of Modern Standard Arabic (Semitic) and attend to the same challenges. They show that for two transition-based parsers, MaltParser (Nivre et al. 2007b) and EasyFirst (Goldberg and Elhadad 2010), controlling the architectural and modeling choices leads to similar effects. For instance, when comparing parsing performance on gold and machine-predicted input conditions, they show that rich informative tag sets are preferred in gold conditions, but smaller tag sets are preferred in machine-predicted conditions. They further isolate a set of morphological features which leads to significant improvements in the machine-predicted condition, for both frameworks. They also show that function-based morphological features are more informative than surface-based features, and that performance loss that is due to errors in part-of-speech tagging may be restored by training the model on a joint set of trees encoding gold tags and machine-predicted tags. At the same time, undirected parsing of EasyFirst shows better accuracy, possibly due to the flexiblity in phrase ordering. The emerging insight is that tuning morphological information inside general-purpose parsing systems is of crucial importance for obtaining competitive performance.

Focusing on Modern Standard Arabic (Semitic) and French (Romance), the last article of this special issue, by Green et al., may be seen as an applications paper, treating the task of MWE recognition as a side effect of a joint model for parsing and MWE identification. The key problem here is knowing what to consider a minimal unit for parsing, and how to handle parsing in realistic scenarios where MWEs have not yet been identified. The authors present two parsing models for such a task: a factored model including a factored lexicon that integrates morphological knowledge into the Stanford Parser word model (Klein and Manning 2003), and a Dirichlet Process Tree Substitution Grammar based model (Cohn, Blunsom, and Goldwater 2010). The latter can be roughly described as Data Oriented Parsing (Bod 1992; Bod, Scha, and Sima'an 2003) in a Bayesian framework, extended to include specific features that ease the extraction of tree fragments matching MWEs. Interestingly, those very different models do provide the same range of performance when confronted with predicted morphology input. Additional important challenges that are exposed in the context of this study concern the design of experiments for cross-linguistic comparison in the face of delicate asymmetries between the French and Arabic data sets.

4. Conclusion

This special issue highlights actively studied areas of research that address parsing MRLs. Most approaches described in this issue rely on extending existing parsing models to address three overarching challenges. The joint parsing and segmentation architecture scenario can be addressed by extending a general-purpose CKY decoder into a lattice-based decoder. The modeling challenge may be addressed by explicitly marking morphological features as syntactic state-splits, by modeling discontinuities in the formal syntactic representation directly, by incorporating hard-coded linguistic constraints as filters, and so on. The lexical challenge can be addressed by using external resources such as a wide-coverage lexicon for analyzing unknown words, and the use of additional monolingual and bilingual data in order to obtain robust statistics in the face of extreme sparseness.

An empirical observation reflected in the results presented here is that languages which we refer to as MRLs exhibit their own cross-lingual variation and thus should not be treated as a single, homogeneous class of languages. Some languages show richer morphology than others; some languages possess more flexible word ordering than others; some fusional languages show syncretism (coarse-grained underspecified markers) whereas others use a large set of fine-grained and unambiguous morphological markers. The next challenge would then be to embrace these variations, and investigate whether the typological properties of languages can inform us more directly concerning the adequate methods that can be used to effectively parse them.

As the next research goal, we then set out to obtain a deeper understanding of how annotation choices paired up with modeling choices systematically correlate with parsing performance for different languages. Further work in the line of the studies presented here is required in order to draw relevant generalizations. Furthermore, the time is ripe for another multilingual parser evaluation campaign, which would encourage the community to develop parsing systems that can easily be transferred from one language type to another. By compiling these recent contributions, we hope to encourage not only the development of novel systems for parsing individual MRLs, but also to facilitate the search for more robust, generic cross-linguistic solutions.

Acknowledgements

As guest editors of this special issue, we wish to thank the regular members and the guest members of the Computational Linguistics editorial board for their thorough work, which allowed us to assemble this special issue of high quality contributions in the emerging field of parsing MRLs. We also want to thank Marie Candito, Jennifer Foster, Yoav Goldberg, Ines Rehbein, Lamia Tounsi, and Yannick Versley for their contribution to the initial proposal for this special issue. Finally, we want to express our gratitude to Robert Dale and Suzy Howlett for their invaluable support throughout the editorial process.

References

References
Abeillé
,
Anne
,
Lionel
Clément
, and
François
Toussenel
.
2003
.
Building a treebank for French
.
In Anne Abeillé, editor
,
Treebanks
.
Kluwer
,
Dordrecht
, pages
165
188
.
Arun
,
Abhishek
and
Frank
Keller
.
2005
.
Lexicalization in crosslinguistic probabilistic parsing: The case of French
. In
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics
, pages
306
313
,
Ann Arbor, MI
.
Black
,
Ezra
,
Steven
Abney
,
Dan
Flickinger
,
Claudia
Gdaniec
,
Ralph
Grishman
,
Philip
Harrison
,
Donald
Hindle
,
Robert
Ingria
,
Frederick
Jelinek
,
Judith
Klavans
,
Mark
Liberman
,
Mitchell
Marcus
,
Salim
Roukos
,
Beatrice
Santorini
, and
Tomek
Strzalkowski
.
1991
.
A procedure for quantitatively comparing the syntactic coverage of English grammars
.
Speech Communication
,
33
(1, 2)
:
306
311
.
Bod
,
Rens
.
1992
.
A computational model of language performance: Data oriented parsing
. In
Proceedings of the 14th Conference on Computational linguistics-Volume 3
, pages
855
859
,
Nantes
.
Bod
,
Rens
,
Remko
Scha
, and
Khalil
Sima'an
, editors.
2003
.
Data-Oriented Parsing
.
CSLI
,
Stanford, CA
.
Bohnet
,
Bernd
.
2010
.
Top accuracy and fast dependency parsing is not a contradiction
. In
Proceedings of CoLing
, pages
89
97
,
Sydney
.
Buchholz
,
Sabine
and
Erwin
Marsi
.
2006
.
CoNLL-X shared task on multilingual dependency parsing
. In
Proceedings of the Tenth Conference on Computational Language Learning (CoNLL)
, pages
149
164
,
New York, NY
.
Carreras
,
Xavier
,
Michael
Collins
, and
Terry
Koo
.
2008
.
TAG, dynamic programming, and the perceptron for efficient, feature-rich parsing
. In
Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL)
, pages
9
16
,
Manchester
.
Charniak
,
Eugene
.
2000
.
A maximumentropy-inspired parser
. In
Proceedings of the 1st Annual Meeting of the North American Chapter of the ACL (NAACL)
, pages
132
139
,
Seattle, WA
.
Charniak
,
Eugene
and
Mark
Johnson
.
2005
.
Coarse-to-fine n-best parsing and maxent discriminative reranking
. In
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005)
, pages
173
180
,
Ann Arbor, MI
.
Cohn
,
Trevor
,
Phil
Blunsom
, and
Sharon
Goldwater
.
2010
.
Inducing treesubstitution grammars
.
The Journal of Machine Learning Research
,
11
:
3053
3096
.
Collins
,
Michael
.
1997
.
Three generative, lexicalized models for statistical parsing
. In
Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics
, pages
16
23
,
Madrid
.
Collins
,
Michael
,
Jan
Hajič
,
Lance
Ramshaw
, and
Christoph
Tillmann
.
1999
.
A statistical parser for Czech
. In
Proceedings of the 37th Annual Meeting of the ACL
, pages
505
512
,
College Park, MD
.
Corazza
,
Anna
,
Alberto
Lavelli
,
Giogio
Satta
, and
Roberto
Zanoli
.
2004
.
Analyzing an Italian treebank with state-of-the-art statistical parsers
. In
Proceedings of the Third Workshop on Treebanks and Linguistic Theories (TLT 2004)
, pages
39
50
,
Tübingen
.
Dubey
,
Amit
and
Frank
Keller
.
2003
.
Probabilistic parsing for German using sister-head dependencies
. In
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
, pages
96
103
,
Ann Arbor, MI
.
Finkel
,
Jenny Rose
,
Alex
Kleeman
, and
Christopher D.
Manning
.
2008
.
Efficient, feature-based, conditional random field parsing
. In
Proceedings of ACL-08: HLT
, pages
959
967
,
Columbus, OH
.
Goldberg
,
Yoav
and
Michael
Elhadad
.
2010
.
An efficient algorithm for easy-first non-directional dependency parsing
. In
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
, pages
742
750
,
Los Angeles, CA
.
Huang
,
Liang
.
2008
.
Forest reranking: Discriminative parsing with non-local features
. In
Proceedings of ACL-08: HLT
, pages
586
594
,
Columbus, OH
.
Klein
,
Dan
and
Christopher D.
Manning
.
2003
.
Accurate unlexicalized parsing
. In
Proceedings of the 41st Annual Meeting on Association for Computational Linguistics
, pages
423
430
,
Sapporo
.
Kübler
,
Sandra
,
Ryan
McDonald
, and
Joakim
Nivre
.
2009
.
Dependency Parsing
.
Number 2 in Synthesis Lectures on Human Language Technologies
.
Morgan & Claypool Publishers
.
Kulick
,
Seth
,
Ryan
Gabbard
, and
Mitchell
Marcus
.
2006
.
Parsing the Arabic treebank: Analysis and improvements
. In
Proceedings of the 5th International Workshop on Treebanks and Linguistic Theories (TLT)
, pages
31
42
,
Prague
.
Maamouri
,
Mohamed
,
Anne
Bies
,
Tim
Buckwalter
, and
Wigdan
Mekki
.
2004
.
The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus
. In
Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools
, pages
102
109
,
Cairo
.
Magerman
,
David M.
1995
.
Statistical decision-tree models for parsing
. In
Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics
, pages
276
283
,
Cambridge, MA
.
Marcus
,
Mitchell
,
Beatrice
Santorini
, and
Mary Ann
Marcinkiewicz
.
1993
.
Building a large annotated corpus of English: The Penn Treebank
.
Computational Linguistics
,
19
(2)
:
313
330
.
Nivre
,
Joakim
,
Johan
Hall
,
Sandra
Kübler
,
Ryan
McDonald
,
Jens
Nilsson
,
Sebastian
Riedel
, and
Deniz
Yuret
.
2007a
.
The CoNLL 2007 shared task on dependency parsing
. In
Proceedings of the CoNLL 2007 Shared Task. Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
, pages
915
932
,
Prague
.
Nivre
,
Joakim
,
Johan
Hall
,
Jens
Nilsson
,
Atanas
Chanev
,
Gülşen
Eryiǧit
,
Sandra
Kübler
,
Svetoslav
Marinov
, and
Erwin
Marsi
.
2007b
.
MaltParser: A language-independent system for data-driven dependency parsing
.
Natural Language Engineering
,
13
(2)
:
95
135
.
Nivre
,
Joakim
and
Beata
Megyesi
.
2007
.
Bootstrapping a Swedish Treebank using cross-corpus harmonization and annotation projection
. In
Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT)
, pages
97
102
,
Bergen
.
Petrov
,
Slav
,
Leon
Barrett
,
Romain
Thibaux
, and
Dan
Klein
.
2006
.
Learning accurate, compact, and interpretable tree annotation
. In
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
, pages
433
440
,
Sydney
.
Schmid
,
Helmut
.
2004
.
Efficient parsing of highly ambiguous context-free grammars with bit vectors
. In
Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004)
, pages
162
168
,
Geneva
.
Sima'an
,
Khalil
,
Alon
Itai
,
Yoad
Winter
,
Alon
Altmann
, and
Noa
Nativ
.
2001
.
Building a tree-bank of Modern Hebrew text
.
Traitement Automatique des Langues
,
42
:
347
380
.
Skut
,
Wojciech
,
Brigitte
Krenn
,
Thorsten
Brants
, and
Hans
Uszkoreit
.
1997
.
An annotation scheme for free word order languages
. In
Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP)
, pages
88
95
,
Washington, D.C.
Tsarfaty
,
Reut
,
Djamé
Seddah
,
Yoav
Goldberg
,
Sandra
Kübler
,
Marie
Candito
,
Jennifer
Foster
,
Yannick
Versley
,
Ines
Rehbein
, and
Lamia
Tounsi
.
2010
.
Statistical parsing of morphologically rich languages (SPMRL): What, how and whither
. In
Proceedings of the NAACL Workshop on Statistical Parsing of Morphologically Rich Languages
, pages
1
12
,
Los Angeles, CA
.
Tsarfaty
,
Reut
and
Khalil
Sima'an
.
2007
.
Three-dimensional parametrization for parsing morphologically rich languages
. In
Proceedings of the Tenth International Conference on Parsing Technologies
, pages
156
167
,
Prague
.
Uszkoreit
,
Hans
.
1987
.
Word Order and Constituent Structure in German
.
CSLI
,
Stanford, CA
.

Author notes

*

Uppsala University, Department of Linguistics and Philology, Box 635, 75126 Uppsala, Sweden. E-mail: tsarfaty@stp.lingfil.uu.se.

**

Inria's Alpage project & Université Paris Sorbonne, Maison de la Recherche, 28 rue Serpentes, 75006 Paris, France. E-mail: djame.seddah@paris-sorbonne.fr.

Indiana University, Department of Linguistics, Memorial Hall 322, Bloomington IN-47405, USA. E-mail: skuebler@indiana.edu.

Uppsala University, Department of Linguistics and Philology, Box 635, 75126 Uppsala, Sweden. E-mail: joakim.nivre@lingfil.uu.se.