Performance gap for supervised TOD systems when trained in different evaluation settings.
Training . | DSTC2 . | CamRest . | |||||
---|---|---|---|---|---|---|---|
Q.Position Predictor . | Query Predictor . | Ent. F1 . | Ent. F1 . | BLEU . | Ent. F1 . | Ent. F1 . | BLEU . |
KB . | All . | . | KB . | All . | . | ||
Predicted | Predicted | 0.24±0.02 | 0.41±0.02 | 48.35±1.58 | 0.29±0.03 | 0.41±0.02 | 14.68±0.85 |
Oracle | Predicted | 0.32±0.04 | 0.40±0.02 | 48.52±1.31 | 0.32±0.02 | 0.40±0.02 | 14.17±0.70 |
Predicted | Oracle. | 0.32±0.03 | 0.41±0.03 | 48.94±1.80 | 0.37±0.04 | 0.44±0.03 | 14.63±0.82 |
Oracle | Oracle | 0.38±0.03 | 0.41±0.02 | 49.79±1.80 | 0.39±0.04 | 0.45±0.03 | 14.84±0.94 |
Training . | DSTC2 . | CamRest . | |||||
---|---|---|---|---|---|---|---|
Q.Position Predictor . | Query Predictor . | Ent. F1 . | Ent. F1 . | BLEU . | Ent. F1 . | Ent. F1 . | BLEU . |
KB . | All . | . | KB . | All . | . | ||
Predicted | Predicted | 0.24±0.02 | 0.41±0.02 | 48.35±1.58 | 0.29±0.03 | 0.41±0.02 | 14.68±0.85 |
Oracle | Predicted | 0.32±0.04 | 0.40±0.02 | 48.52±1.31 | 0.32±0.02 | 0.40±0.02 | 14.17±0.70 |
Predicted | Oracle. | 0.32±0.03 | 0.41±0.03 | 48.94±1.80 | 0.37±0.04 | 0.44±0.03 | 14.63±0.82 |
Oracle | Oracle | 0.38±0.03 | 0.41±0.02 | 49.79±1.80 | 0.39±0.04 | 0.45±0.03 | 14.84±0.94 |