Skip to Main Content
Table 9: 

Performance gap for supervised TOD systems when trained in different evaluation settings.

TrainingDSTC2CamRest
Q.Position PredictorQuery PredictorEnt. F1Ent. F1BLEUEnt. F1Ent. F1BLEU
KBAllKBAll
Predicted Predicted 0.24±0.02 0.41±0.02 48.35±1.58 0.29±0.03 0.41±0.02 14.68±0.85 
Oracle Predicted 0.32±0.04 0.40±0.02 48.52±1.31 0.32±0.02 0.40±0.02 14.17±0.70 
Predicted Oracle. 0.32±0.03 0.41±0.03 48.94±1.80 0.37±0.04 0.44±0.03 14.63±0.82 
Oracle Oracle 0.38±0.03 0.41±0.02 49.79±1.80 0.39±0.04 0.45±0.03 14.84±0.94 
TrainingDSTC2CamRest
Q.Position PredictorQuery PredictorEnt. F1Ent. F1BLEUEnt. F1Ent. F1BLEU
KBAllKBAll
Predicted Predicted 0.24±0.02 0.41±0.02 48.35±1.58 0.29±0.03 0.41±0.02 14.68±0.85 
Oracle Predicted 0.32±0.04 0.40±0.02 48.52±1.31 0.32±0.02 0.40±0.02 14.17±0.70 
Predicted Oracle. 0.32±0.03 0.41±0.03 48.94±1.80 0.37±0.04 0.44±0.03 14.63±0.82 
Oracle Oracle 0.38±0.03 0.41±0.02 49.79±1.80 0.39±0.04 0.45±0.03 14.84±0.94 
Close Modal

or Create an Account

Close Modal
Close Modal