Learned representations, and especially latent-variable statistical language model representations, significantly outperform a traditional CRF system on domain adaptation for POS tagging. Percent error is shown for all words and out-of-vocabulary (OOV) words. The SCL+500bio system was given 500 labeled training sentences from the biomedical domain. 1.8% of tokens in the biomedical test set had POS tags like ‘HYPHENATED’, which are not part of the tagset for the training data, and were labeled incorrectly by all systems without access to labeled data from the biomedical domain. As a result, an error rate of 1.8 + 3.9 = 5.7 serves as a reasonable lower bound for a system that has never seen labeled examples from the biomedical domain.
Model . | All words . | OOV words . |
---|---|---|
Trad-R | 11.7 | 32.7 |
n-gram-R | 11.7 | 32.2 |
Lsa-R | 11.6 | 31.1 |
NB-R | 11.6 | 30.7 |
ASO | 11.6 | 29.1 |
SCL | 11.1 | 28 |
Brown-Token-R | 10.0 | 25.2 |
Hmm-Token-R | 9.5 | 24.8 |
Web1T-n-gram-R | 6.9 | 24.4 |
I-Hmm-Token-R | 6.7 | 24 |
Lattice-Token-R | 6.2 | 21.3 |
SCL+500bio | 3.9 | – |
Model . | All words . | OOV words . |
---|---|---|
Trad-R | 11.7 | 32.7 |
n-gram-R | 11.7 | 32.2 |
Lsa-R | 11.6 | 31.1 |
NB-R | 11.6 | 30.7 |
ASO | 11.6 | 29.1 |
SCL | 11.1 | 28 |
Brown-Token-R | 10.0 | 25.2 |
Hmm-Token-R | 9.5 | 24.8 |
Web1T-n-gram-R | 6.9 | 24.4 |
I-Hmm-Token-R | 6.7 | 24 |
Lattice-Token-R | 6.2 | 21.3 |
SCL+500bio | 3.9 | – |