Graphical models consistently outperform n-gram models by a larger margin on sparse words than not-sparse words, and by a larger margin on polysemous words than not-polysemous words. One exception is the NB-R, which performs worse relative to Web1T-n-gram-R on polysemous words than non-polysemous words. For each graphical model representation, we show the difference in performance between that representation and Web1T-n-gram-R in parentheses. For each representation, differences in accuracy on polysemous and non-polysemous subsets were statistically significant at p < 0.01 using a two-tailed Fisher's exact test. Likewise for performance on sparse vs. non-sparse categories.
. | polysemous . | not polysemous . | sparse . | not sparse . |
---|---|---|---|---|
tokens | 159 | 4,321 | 463 | 12,194 |
Trad-R | 59.5 | 78.5 | 52.5 | 89.6 |
Web1T-n-gram-R | 68.2 | 85.3 | 61.8 | 94.0 |
NB-R | 64.5 | 88.7 | 57.8 | 89.4 |
(-Web1T-n-gram-R) | (−3.7) | (+3.4) | (−4.0) | (−4.6) |
Hmm-Token-R | 67.9 | 83.4 | 60.2 | 91.6 |
(-Web1T-n-gram-R) | (−0.3) | (−1.9) | (−1.6) | (−2.4) |
I-Hmm-Token-R | 75.6 | 85.2 | 62.9 | 94.5 |
(-Web1T-n-gram-R) | (+7.4) | (−0.1) | (+1.1) | (+0.5) |
Lattice-Token-R | 70.5 | 86.9 | 65.2 | 94.6 |
(-Web1T-n-gram-R) | (+2.3) | (+1.6) | (+3.4) | (+0.6) |
. | polysemous . | not polysemous . | sparse . | not sparse . |
---|---|---|---|---|
tokens | 159 | 4,321 | 463 | 12,194 |
Trad-R | 59.5 | 78.5 | 52.5 | 89.6 |
Web1T-n-gram-R | 68.2 | 85.3 | 61.8 | 94.0 |
NB-R | 64.5 | 88.7 | 57.8 | 89.4 |
(-Web1T-n-gram-R) | (−3.7) | (+3.4) | (−4.0) | (−4.6) |
Hmm-Token-R | 67.9 | 83.4 | 60.2 | 91.6 |
(-Web1T-n-gram-R) | (−0.3) | (−1.9) | (−1.6) | (−2.4) |
I-Hmm-Token-R | 75.6 | 85.2 | 62.9 | 94.5 |
(-Web1T-n-gram-R) | (+7.4) | (−0.1) | (+1.1) | (+0.5) |
Lattice-Token-R | 70.5 | 86.9 | 65.2 | 94.6 |
(-Web1T-n-gram-R) | (+2.3) | (+1.6) | (+3.4) | (+0.6) |