NER model comparison, showing F1-score on the test sets after 50 epochs averaged over 5 runs. This result is for all 4 tags in the dataset: PER, ORG, LOC, DATE. Bold marks the top score (tied if within the range of SE). mBERT and XLM-R are trained in two ways (1) MeanE: mean output embeddings of the 12 LM layers are used to initialize BiLSTM + Linear classifier, and (2) FTune: LM fine-tuned end-to-end with a linear classifier. Lang. BERT & Lang XLM-R (base) are models fine-tuned after language adaptive fine-tuning.
Lang. . | In mBERT? . | In XLM-R? . | % OOV in Test Entities . | CNN-BiLSTM CRF . | mBERT-base MeanE / FTune . | XLM-R-base MeanE / FTune . | XLM-R Large FTune . | lang. BERT FTune . | lang. XLM-R FTune . |
---|---|---|---|---|---|---|---|---|---|
amh | ✗ | ✓ | 72.94 | 52.08 | 0.0 / 0.0 | 63.57 / 70.62 | 76.18 | 60.89 | 77.97 |
hau | ✗ | ✓ | 33.40 | 83.52 | 81.49 / 86.65 | 86.06 / 89.50 | 90.54 | 91.31 | 91.47 |
ibo | ✗ | ✗ | 46.56 | 80.02 | 76.17 / 85.19 | 73.47 / 84.78 | 84.12 | 86.75 | 87.74 |
kin | ✗ | ✗ | 57.85 | 62.97 | 65.85 / 72.20 | 63.66 / 73.32 | 73.75 | 77.57 | 77.76 |
lug | ✗ | ✗ | 61.12 | 74.67 | 70.38 / 80.36 | 68.15 / 79.69 | 81.57 | 83.44 | 84.70 |
luo | ✗ | ✗ | 65.18 | 65.98 | 56.56 / 74.22 | 52.57 / 74.86 | 73.58 | 75.59 | 75.27 |
pcm | ✗ | ✗ | 61.26 | 67.67 | 81.87 / 87.23 | 81.93 / 87.26 | 89.02 | 89.95 | 90.00 |
swa | ✓ | ✓ | 40.97 | 78.24 | 83.08 / 86.80 | 84.33 / 87.37 | 89.36 | 89.36 | 89.46 |
wol | ✗ | ✗ | 69.73 | 59.70 | 57.21 / 64.52 | 54.97 / 63.86 | 67.90 | 69.43 | 68.31 |
yor | ✓ | ✗ | 65.99 | 67.44 | 74.28 / 78.97 | 67.45 / 78.26 | 78.89 | 82.58 | 83.66 |
avg | – | – | 57.50 | 69.23 | 64.69 / 71.61 | 69.62 / 78.96 | 80.49 | 80.69 | 82.63 |
avg (excl. amh) | – | – | 55.78 | 71.13 | 71.87 / 79.88 | 70.29 / 79.88 | 80.97 | 82.89 | 83.15 |
Lang. . | In mBERT? . | In XLM-R? . | % OOV in Test Entities . | CNN-BiLSTM CRF . | mBERT-base MeanE / FTune . | XLM-R-base MeanE / FTune . | XLM-R Large FTune . | lang. BERT FTune . | lang. XLM-R FTune . |
---|---|---|---|---|---|---|---|---|---|
amh | ✗ | ✓ | 72.94 | 52.08 | 0.0 / 0.0 | 63.57 / 70.62 | 76.18 | 60.89 | 77.97 |
hau | ✗ | ✓ | 33.40 | 83.52 | 81.49 / 86.65 | 86.06 / 89.50 | 90.54 | 91.31 | 91.47 |
ibo | ✗ | ✗ | 46.56 | 80.02 | 76.17 / 85.19 | 73.47 / 84.78 | 84.12 | 86.75 | 87.74 |
kin | ✗ | ✗ | 57.85 | 62.97 | 65.85 / 72.20 | 63.66 / 73.32 | 73.75 | 77.57 | 77.76 |
lug | ✗ | ✗ | 61.12 | 74.67 | 70.38 / 80.36 | 68.15 / 79.69 | 81.57 | 83.44 | 84.70 |
luo | ✗ | ✗ | 65.18 | 65.98 | 56.56 / 74.22 | 52.57 / 74.86 | 73.58 | 75.59 | 75.27 |
pcm | ✗ | ✗ | 61.26 | 67.67 | 81.87 / 87.23 | 81.93 / 87.26 | 89.02 | 89.95 | 90.00 |
swa | ✓ | ✓ | 40.97 | 78.24 | 83.08 / 86.80 | 84.33 / 87.37 | 89.36 | 89.36 | 89.46 |
wol | ✗ | ✗ | 69.73 | 59.70 | 57.21 / 64.52 | 54.97 / 63.86 | 67.90 | 69.43 | 68.31 |
yor | ✓ | ✗ | 65.99 | 67.44 | 74.28 / 78.97 | 67.45 / 78.26 | 78.89 | 82.58 | 83.66 |
avg | – | – | 57.50 | 69.23 | 64.69 / 71.61 | 69.62 / 78.96 | 80.49 | 80.69 | 82.63 |
avg (excl. amh) | – | – | 55.78 | 71.13 | 71.87 / 79.88 | 70.29 / 79.88 | 80.97 | 82.89 | 83.15 |