Abstract

Linguistic knowledge plays an important role in phrase movement in statistical machine translation. To efficiently incorporate linguistic knowledge into phrase reordering, we propose a new approach: Linguistically Annotated Reordering (LAR). In LAR, we build hard hierarchical skeletons and inject soft linguistic knowledge from source parse trees to nodes of hard skeletons during translation. The experimental results on large-scale training data show that LAR is comparable to boundary word-based reordering (BWR) (Xiong, Liu, and Lin 2006), which is a very competitive lexicalized reordering approach. When combined with BWR, LAR provides complementary information for phrase reordering, which collectively improves the BLEU score significantly.

To further understand the contribution of linguistic knowledge in LAR to phrase reordering, we introduce a syntax-based analysis method to automatically detect constituent movement in both reference and system translations, and summarize syntactic reordering patterns that are captured by reordering models. With the proposed analysis method, we conduct a comparative analysis that not only provides the insight into how linguistic knowledge affects phrase movement but also reveals new challenges in phrase reordering.

This content is only available as a PDF.

Author notes

*

1 Fusionopolis Way #21-01 Connexis Singapore 138632. E-mail: dyxiong@i2r.a-star.edu.sg.

**

1 Fusionopolis Way #21-01 Connexis Singapore 138632. E-mail: mzhang@i2r.a-star.edu.sg.

1 Fusionopolis Way #21-01 Connexis Singapore 138632. E-mail: aaiti@i2r.a-star.edu.sg.

1 Fusionopolis Way #21-01 Connexis Singapore 138632. E-mail: hli@i2r.a-star.edu.sg.