Table 6: 
Performance of baseline in accuracy (%) on the C3 dataset (*: based on the annotated subset of test and development sets of C3).
MethodCM3CD3C3
DevTestDevTestDevTest
Random 27.8 27.8 26.4 26.6 27.1 27.2 
Distance-Based Sliding Window (Richardson et al., 2013) 47.9 45.8 39.6 40.4 43.8 43.1 
Co-Matching (Wang et al., 2018) 47.0 48.2 55.5 51.4 51.0 49.8 
BERT (Devlin et al., 2019) 65.6 64.6 65.9 64.4 65.7 64.5 
ERNIE (Sun et al., 2019b) 63.7 63.6 67.3 64.6 65.5 64.1 
BERT-wwm (Cui et al., 2019) 66.1 64.0 64.8 65.0 65.5 64.5 
BERT-wwm-ext (Cui et al., 2019) 67.9 68.0 67.7 68.9 67.8 68.5 
 
Human Performance* 96.0 93.3 98.0 98.7 97.0 96.0 
MethodCM3CD3C3
DevTestDevTestDevTest
Random 27.8 27.8 26.4 26.6 27.1 27.2 
Distance-Based Sliding Window (Richardson et al., 2013) 47.9 45.8 39.6 40.4 43.8 43.1 
Co-Matching (Wang et al., 2018) 47.0 48.2 55.5 51.4 51.0 49.8 
BERT (Devlin et al., 2019) 65.6 64.6 65.9 64.4 65.7 64.5 
ERNIE (Sun et al., 2019b) 63.7 63.6 67.3 64.6 65.5 64.1 
BERT-wwm (Cui et al., 2019) 66.1 64.0 64.8 65.0 65.5 64.5 
BERT-wwm-ext (Cui et al., 2019) 67.9 68.0 67.7 68.9 67.8 68.5 
 
Human Performance* 96.0 93.3 98.0 98.7 97.0 96.0 
Close Modal

or Create an Account

Close Modal
Close Modal