Skip Nav Destination
1-1 of 1
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Publisher: Journals Gateway
Computational Linguistics (2004) 30 (3): 253–276.
Published: 01 September 2004
AbstractView article PDF
Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a helpful training example. Experiments are performed across two syntactic learning tasks and within the single task of parsing across two learning models to compare the effect of different predictive criteria. We find that sample selection can significantly reduce the size of annotated training corpora and that uncertainty is a robust predictive criterion that can be easily applied to different learning models.