Method . | Dev . | Test . |
---|---|---|
Random | 32.8 | 33.4 |
Word Matching (WM) (Yih et al., 2013) | 41.7 | 42.0 |
Sliding Window (SW) (Richardson et al., 2013) | 42.6 | 42.5 |
Distance-Based Sliding Window (DSW) (Richardson et al., 2013) | 44.4 | 44.6 |
Stanford Attentive Reader (SAR) (Chen et al., 2016) | 40.2 | 39.8 |
Gated-Attention Reader (GAR) (Dhingra et al., 2017) | 40.5 | 41.3 |
Co-Matching (CO) (Wang et al., 2018b) | 45.6 | 45.5 |
Finetuned Transformer LM (FTLM) (Radford et al., 2018) | 55.9 | 55.5 |
OurApproaches : | ||
DSW ++ (DSW w/ Dialogue Structure and ConceptNet Embedding) | 51.4 | 50.1 |
GBDT ++ (GBDT w/ Features of Dialogue Structure and General World Knowledge) | 53.3 | 52.8 |
FTLM ++ (FTLM w/ Speaker Embedding) | 57.6 | 57.4 |
Ensemble of 3 FTLM ++ | 58.1 | 58.2 |
Ensemble of 1 GBDT ++ and 3 FTLM ++ | 59.6 | 59.5 |
Human Performance | 93.9★ | 95.5★ |
Ceiling Performance | 98.7★ | 98.6★ |
Method . | Dev . | Test . |
---|---|---|
Random | 32.8 | 33.4 |
Word Matching (WM) (Yih et al., 2013) | 41.7 | 42.0 |
Sliding Window (SW) (Richardson et al., 2013) | 42.6 | 42.5 |
Distance-Based Sliding Window (DSW) (Richardson et al., 2013) | 44.4 | 44.6 |
Stanford Attentive Reader (SAR) (Chen et al., 2016) | 40.2 | 39.8 |
Gated-Attention Reader (GAR) (Dhingra et al., 2017) | 40.5 | 41.3 |
Co-Matching (CO) (Wang et al., 2018b) | 45.6 | 45.5 |
Finetuned Transformer LM (FTLM) (Radford et al., 2018) | 55.9 | 55.5 |
OurApproaches : | ||
DSW ++ (DSW w/ Dialogue Structure and ConceptNet Embedding) | 51.4 | 50.1 |
GBDT ++ (GBDT w/ Features of Dialogue Structure and General World Knowledge) | 53.3 | 52.8 |
FTLM ++ (FTLM w/ Speaker Embedding) | 57.6 | 57.4 |
Ensemble of 3 FTLM ++ | 58.1 | 58.2 |
Ensemble of 1 GBDT ++ and 3 FTLM ++ | 59.6 | 59.5 |
Human Performance | 93.9★ | 95.5★ |
Ceiling Performance | 98.7★ | 98.6★ |