Skip to Main Content
Table 8: 
Performance of Benchmark models. “Single turn” means having the gold information of the last turn. Task finish rate is evaluated on 1000 times simulations for each goal type. It’s worth noting that “task finish” does not mean the task is successful, because the system may provide wrong information. Results show that cross multi-domain dialogues (CM and CM+T) is challenging for these tasks.
  M+T CM CM+T Overall 
BERTNLU Dialogue act F1 96.69 96.01 96.15 94.99 95.38 95.53 
– context 94.55 93.05 93.70 90.66 90.82 91.85 
 
RuleDST Joint state accuracy (single turn) 84.17 78.17 81.93 63.38 67.86 71.33 
TRADE Joint state accuracy 71.67 45.29 37.98 30.77 25.65 36.08 
 
SL policy Dialogue act F1 50.28 44.97 54.01 41.65 44.02 44.92 
Dialogue act F1 (delex) 67.96 67.35 73.94 62.27 66.29 66.02 
 
Simulator Joint state accuracy (single turn) 63.53 48.79 50.26 40.66 41.76 45.00 
Dialogue act F1 (single turn) 85.99 81.39 80.82 75.27 77.23 78.39 
 
DA Sim Task finish rate 76.5 49.4 33.7 17.2 15.7 34.6 
NL Sim (Template) 67.4 33.3 29.1 10.0 10.0 23.6 
NL Sim (SC-LSTM) 60.6 27.1 23.1 8.8 9.0 19.7 
  M+T CM CM+T Overall 
BERTNLU Dialogue act F1 96.69 96.01 96.15 94.99 95.38 95.53 
– context 94.55 93.05 93.70 90.66 90.82 91.85 
 
RuleDST Joint state accuracy (single turn) 84.17 78.17 81.93 63.38 67.86 71.33 
TRADE Joint state accuracy 71.67 45.29 37.98 30.77 25.65 36.08 
 
SL policy Dialogue act F1 50.28 44.97 54.01 41.65 44.02 44.92 
Dialogue act F1 (delex) 67.96 67.35 73.94 62.27 66.29 66.02 
 
Simulator Joint state accuracy (single turn) 63.53 48.79 50.26 40.66 41.76 45.00 
Dialogue act F1 (single turn) 85.99 81.39 80.82 75.27 77.23 78.39 
 
DA Sim Task finish rate 76.5 49.4 33.7 17.2 15.7 34.6 
NL Sim (Template) 67.4 33.3 29.1 10.0 10.0 23.6 
NL Sim (SC-LSTM) 60.6 27.1 23.1 8.8 9.0 19.7 
Close Modal

or Create an Account

Close Modal
Close Modal