Language-wise breakdown for Named Entity Recognition for the CoNLL and MasakhaNER datasets (labeled F1). mBert obtains a score of zero on Amharic due to having no vocabulary entries in the Amharic script.
Language . | mBert . | Canine-C . | Canine-C + n-grams . |
---|---|---|---|
CoNLL | |||
Dutch | 90.2 | 74.7 (–15.5) | 88.5 (–1.7) |
English | 91.1 | 79.8 (–11.3) | 89.8 (–1.3) |
German | 82.5 | 64.1 (–18.4) | 82.1 (–0.4) |
Spanish | 87.6 | 77.4 (–10.2) | 86.5 (–1.1) |
Macro Avg | 87.8 | 74.0 (–13.8) | 86.7 (–1.1) |
MasakhaNER | |||
Amharic | 0.0 | 44.6 (+44.6) | 50.0 (+50.0) |
Hausa | 89.3 | 76.1 (–13.2) | 88.0 (–1.3) |
Igbo | 84.6 | 75.6 (–9.0) | 85.0 (+0.4) |
Kinyarwanda | 73.9 | 58.3 (–15.6) | 72.8 (–1.1) |
Luganda | 80.2 | 69.4 (–10.8) | 79.6 (–0.6) |
Luo | 75.8 | 63.4 (–12.4) | 74.2 (–1.6) |
Nigerian Pidgin | 89.8 | 66.6 (–23.2) | 88.7 (–1.1) |
Swahili | 87.1 | 72.7 (–14.4) | 83.7 (–3.4) |
Wolof | 64.9 | 60.7 (–4.2) | 66.5 (+1.6) |
Yorùbá | 78.7 | 67.9 (–10.8) | 79.1 (+0.4) |
Macro Avg | 72.4 | 65.5 (–6.9) | 76.8 (+4.3) |
Language . | mBert . | Canine-C . | Canine-C + n-grams . |
---|---|---|---|
CoNLL | |||
Dutch | 90.2 | 74.7 (–15.5) | 88.5 (–1.7) |
English | 91.1 | 79.8 (–11.3) | 89.8 (–1.3) |
German | 82.5 | 64.1 (–18.4) | 82.1 (–0.4) |
Spanish | 87.6 | 77.4 (–10.2) | 86.5 (–1.1) |
Macro Avg | 87.8 | 74.0 (–13.8) | 86.7 (–1.1) |
MasakhaNER | |||
Amharic | 0.0 | 44.6 (+44.6) | 50.0 (+50.0) |
Hausa | 89.3 | 76.1 (–13.2) | 88.0 (–1.3) |
Igbo | 84.6 | 75.6 (–9.0) | 85.0 (+0.4) |
Kinyarwanda | 73.9 | 58.3 (–15.6) | 72.8 (–1.1) |
Luganda | 80.2 | 69.4 (–10.8) | 79.6 (–0.6) |
Luo | 75.8 | 63.4 (–12.4) | 74.2 (–1.6) |
Nigerian Pidgin | 89.8 | 66.6 (–23.2) | 88.7 (–1.1) |
Swahili | 87.1 | 72.7 (–14.4) | 83.7 (–3.4) |
Wolof | 64.9 | 60.7 (–4.2) | 66.5 (+1.6) |
Yorùbá | 78.7 | 67.9 (–10.8) | 79.1 (+0.4) |
Macro Avg | 72.4 | 65.5 (–6.9) | 76.8 (+4.3) |