Skip to Main Content
Table 1: 

Benchmark test set F1 scores across different languages and annotation scenarios. Best models in bold. † indicates that for EE the test F1 score is statistically signficantly better than SNS-BERT-shortest (p < 0.01) (details in footnote 9). Other pairs between SNS-BERT-shortest and EER-BERT-short/shortest were not signficant.

Approach / Languageeng-cdeuespnedeng-ochiaraavg
Gold-BERT-all 92.7 83.9 88.3 91.1 90.7 79.4 72.9 85.6 
Gold-SNS-BERT-all 91.1 82.3 87.9 89.5 89.7 77.1 62.1 82.8 
 
 Non-Native Speaker Scenario (NNS): Recall=50%, Precision=90% 
Raw-BERT-all 81.9 69.1 71.2 70.1 68.0 61.9 52.8 67.9 
Raw+CD-BERT-all 86.3 78.4 79.9 77.2 80.9 64.9 60.1 75.4 
CBL-LSTM-all 79.2 38.4 54.6 48.2 67.9 53.5 39.4 54.5 
CBL-BERT-all 84.8 77.5 78.7 75.3 76.3 68.9 61.9 74.8 
SNS-BERT-all 86.0 77.0 80.8 77.9 81.5 66.4 56.0 75.1 
EER-BERT-all 88.0 77.3 80.9 76.9 84.5 66.6 56.6 75.8 
 
 Exploratory Expert Scenario (EE): 1,000 Annotations 
Raw-BERT-all 0.4 02.6 00.7 0.0 0.4 2.4 5.3 1.7 
Raw-BERT-short 44.1 37.2 44.4 0.0 28.4 32.4 15.4 28.8 
Raw-BERT-shortest 80.7 65.4 73.0 69.1 67.5 57.1 42.0 65.0 
Raw+CD-BERT-shortest 82.4 67.9 76.6 70.0 68.9 58.3 43.9 66.9 
CBL-LSTM-all 60.2 27.5 41.2 33.3 23.1 29.9 15.3 32.9 
CBL-LSTM-shortest 67.8 20.1 36.2 26.7 42.0 24.6 9.7 32.4 
CBL-BERT-all 36.4 52.8 40.9 52.5 22.4 29.3 20.8 36.4 
CBL-BERT-short 43.7 64.7 56.4 60.8 16.0 31.2 30.2 43.3 
CBL-BERT-shortest 80.6 65.1 74.7 71.2 28.4 53.6 39.2 59.0 
SNS-BERT-all 59.5 63.8 70.8 70.3 14.0 28.8 0.0 43.9 
SNS-BERT-short 64.4 62.6 70.8 64.1 40.7 46.4 0.0 49.9 
SNS-BERT-shortest 83.9 70.1 76.8 77.1 75.6 63.3 40.7 69.6 
EER-BERT-all 86.3 73.2 80.2 80.2 61.2 56.2 42.9 68.6 
EER-BERT-short 89.0 72.2 76.5 80.3 75.9 61.4 46.8 71.7 
EER-BERT-shortest 87.3 73.6 76.5 74.2 74.0 64.3 42.1 70.3 
Approach / Languageeng-cdeuespnedeng-ochiaraavg
Gold-BERT-all 92.7 83.9 88.3 91.1 90.7 79.4 72.9 85.6 
Gold-SNS-BERT-all 91.1 82.3 87.9 89.5 89.7 77.1 62.1 82.8 
 
 Non-Native Speaker Scenario (NNS): Recall=50%, Precision=90% 
Raw-BERT-all 81.9 69.1 71.2 70.1 68.0 61.9 52.8 67.9 
Raw+CD-BERT-all 86.3 78.4 79.9 77.2 80.9 64.9 60.1 75.4 
CBL-LSTM-all 79.2 38.4 54.6 48.2 67.9 53.5 39.4 54.5 
CBL-BERT-all 84.8 77.5 78.7 75.3 76.3 68.9 61.9 74.8 
SNS-BERT-all 86.0 77.0 80.8 77.9 81.5 66.4 56.0 75.1 
EER-BERT-all 88.0 77.3 80.9 76.9 84.5 66.6 56.6 75.8 
 
 Exploratory Expert Scenario (EE): 1,000 Annotations 
Raw-BERT-all 0.4 02.6 00.7 0.0 0.4 2.4 5.3 1.7 
Raw-BERT-short 44.1 37.2 44.4 0.0 28.4 32.4 15.4 28.8 
Raw-BERT-shortest 80.7 65.4 73.0 69.1 67.5 57.1 42.0 65.0 
Raw+CD-BERT-shortest 82.4 67.9 76.6 70.0 68.9 58.3 43.9 66.9 
CBL-LSTM-all 60.2 27.5 41.2 33.3 23.1 29.9 15.3 32.9 
CBL-LSTM-shortest 67.8 20.1 36.2 26.7 42.0 24.6 9.7 32.4 
CBL-BERT-all 36.4 52.8 40.9 52.5 22.4 29.3 20.8 36.4 
CBL-BERT-short 43.7 64.7 56.4 60.8 16.0 31.2 30.2 43.3 
CBL-BERT-shortest 80.6 65.1 74.7 71.2 28.4 53.6 39.2 59.0 
SNS-BERT-all 59.5 63.8 70.8 70.3 14.0 28.8 0.0 43.9 
SNS-BERT-short 64.4 62.6 70.8 64.1 40.7 46.4 0.0 49.9 
SNS-BERT-shortest 83.9 70.1 76.8 77.1 75.6 63.3 40.7 69.6 
EER-BERT-all 86.3 73.2 80.2 80.2 61.2 56.2 42.9 68.6 
EER-BERT-short 89.0 72.2 76.5 80.3 75.9 61.4 46.8 71.7 
EER-BERT-shortest 87.3 73.6 76.5 74.2 74.0 64.3 42.1 70.3 
Close Modal

or Create an Account

Close Modal
Close Modal