Skip Nav Destination
1-1 of 1
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Publisher: Journals Gateway
Computational Linguistics (2015) 41 (1): 119–147.
Published: 01 March 2015
FIGURES | View All (14)
AbstractView article PDF
Manually annotated corpora are indispensable resources, yet for many annotation tasks, such as the creation of treebanks, there exist multiple corpora with different and incompatible annotation guidelines. This leads to an inefficient use of human expertise, but it could be remedied by integrating knowledge across corpora with different annotation guidelines. In this article we describe the problem of annotation adaptation and the intrinsic principles of the solutions, and present a series of successively enhanced models that can automatically adapt the divergence between different annotation formats. We evaluate our algorithms on the tasks of Chinese word segmentation and dependency parsing. For word segmentation, where there are no universal segmentation guidelines because of the lack of morphology in Chinese, we perform annotation adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank. For dependency parsing, we perform annotation adaptation from the Penn Chinese Treebank to a semantics-oriented Dependency Treebank, which is annotated using significantly different annotation guidelines. In both experiments, automatic annotation adaptation brings significant improvement, achieving state-of-the-art performance despite the use of purely local features in training.