Skip to Main Content
Table 8: 
Performance in accuracy (%) on the DREAM data set. Performance marked by ★ is reported based on 25% annotated questions from the development and test sets.
MethodDevTest
Random 32.8 33.4 
Word Matching (WM) (Yih et al., 2013) 41.7 42.0 
Sliding Window (SW) (Richardson et al., 2013) 42.6 42.5 
Distance-Based Sliding Window (DSW) (Richardson et al., 2013) 44.4 44.6 
Stanford Attentive Reader (SAR) (Chen et al., 2016) 40.2 39.8 
Gated-Attention Reader (GAR) (Dhingra et al., 2017) 40.5 41.3 
Co-Matching (CO) (Wang et al., 2018b) 45.6 45.5 
Finetuned Transformer LM (FTLM) (Radford et al., 2018) 55.9 55.5 
OurApproaches
DSW ++ (DSW w/ Dialogue Structure and ConceptNet Embedding) 51.4 50.1 
GBDT ++ (GBDT w/ Features of Dialogue Structure and General World Knowledge) 53.3 52.8 
FTLM ++ (FTLM w/ Speaker Embedding) 57.6 57.4 
Ensemble of 3 FTLM ++ 58.1 58.2 
Ensemble of 1 GBDT ++ and 3 FTLM ++ 59.6 59.5 
Human Performance 93.9 95.5 
Ceiling Performance 98.7 98.6 
MethodDevTest
Random 32.8 33.4 
Word Matching (WM) (Yih et al., 2013) 41.7 42.0 
Sliding Window (SW) (Richardson et al., 2013) 42.6 42.5 
Distance-Based Sliding Window (DSW) (Richardson et al., 2013) 44.4 44.6 
Stanford Attentive Reader (SAR) (Chen et al., 2016) 40.2 39.8 
Gated-Attention Reader (GAR) (Dhingra et al., 2017) 40.5 41.3 
Co-Matching (CO) (Wang et al., 2018b) 45.6 45.5 
Finetuned Transformer LM (FTLM) (Radford et al., 2018) 55.9 55.5 
OurApproaches
DSW ++ (DSW w/ Dialogue Structure and ConceptNet Embedding) 51.4 50.1 
GBDT ++ (GBDT w/ Features of Dialogue Structure and General World Knowledge) 53.3 52.8 
FTLM ++ (FTLM w/ Speaker Embedding) 57.6 57.4 
Ensemble of 3 FTLM ++ 58.1 58.2 
Ensemble of 1 GBDT ++ and 3 FTLM ++ 59.6 59.5 
Human Performance 93.9 95.5 
Ceiling Performance 98.7 98.6 
Close Modal

or Create an Account

Close Modal
Close Modal