Table 2: 

End-to-end response generation results on MultiWOZ2.0. ✓and✗ denote whether the dialogue act annotation is used in the training process. We list all the model sizes of the Transformer-based end-to-end TOD models. Notice that we directly use the UBAR result provided by Liu et al. According the released code of the UBAR, they have not used the standard evaluation metric, which is unfair to compare to other methods. We also run their code with released model checkpoint, whose combined score is even worse than the result provided by Liu et al. Results are significant (p < 0.01) comparing the OPAL model and BART model as the initialized TOD model.

ModelModel SizeDialogue ActInformSuccessBLEUCombined
Sequicity (Lei et al., 2018) – ✗ 66.40 45.30 15.54 71.39 
HRED-TS (Peng et al., 2019) – ✓ 70.00 58.00 17.50 81.50 
DSTC8 Winner (Ham et al., 2020) 124M ✓ 73.00 62.40 16.00 83.50 
DAMD (Zhang et al., 2020b) – ✓ 76.40 60.40 16.60 85.00 
SimpleTOD (Hosseini-Asl et al., 2020) 117M ✓ 84.40 70.10 15.01 92.26 
SOLOIST (Peng et al., 2020a) 117M ✗ 85.50 72.90 16.54 95.74 
MinTL-BART (Lin et al., 2020) 406M ✗ 84.88 74.91 17.89 97.78 
UBAR (Yang et al., 2021) 82M ✗ 88.20 79.50 16.43 100.28 
NCMB (Liu et al., 2021) 116M ✓ 85.90 74.80 19.76 100.11 
NCML (Liu et al., 2021) 292M ✓ 86.90 76.20 20.58 102.13 
HTER (Santra et al., 2021) – ✓ 91.72 75.80 19.05 102.81 
 
BART 139M ✗ 87.50 72.20 16.67 96.53 
OPAL 139M ✗ 89.40 81.10 18.60 103.85 
BARTL 406M ✗ 86.20 70.30 17.01 95.26 
OPALL 406M ✗ 88.00 82.80 20.80 106.20 
ModelModel SizeDialogue ActInformSuccessBLEUCombined
Sequicity (Lei et al., 2018) – ✗ 66.40 45.30 15.54 71.39 
HRED-TS (Peng et al., 2019) – ✓ 70.00 58.00 17.50 81.50 
DSTC8 Winner (Ham et al., 2020) 124M ✓ 73.00 62.40 16.00 83.50 
DAMD (Zhang et al., 2020b) – ✓ 76.40 60.40 16.60 85.00 
SimpleTOD (Hosseini-Asl et al., 2020) 117M ✓ 84.40 70.10 15.01 92.26 
SOLOIST (Peng et al., 2020a) 117M ✗ 85.50 72.90 16.54 95.74 
MinTL-BART (Lin et al., 2020) 406M ✗ 84.88 74.91 17.89 97.78 
UBAR (Yang et al., 2021) 82M ✗ 88.20 79.50 16.43 100.28 
NCMB (Liu et al., 2021) 116M ✓ 85.90 74.80 19.76 100.11 
NCML (Liu et al., 2021) 292M ✓ 86.90 76.20 20.58 102.13 
HTER (Santra et al., 2021) – ✓ 91.72 75.80 19.05 102.81 
 
BART 139M ✗ 87.50 72.20 16.67 96.53 
OPAL 139M ✗ 89.40 81.10 18.60 103.85 
BARTL 406M ✗ 86.20 70.30 17.01 95.26 
OPALL 406M ✗ 88.00 82.80 20.80 106.20 
Close Modal

or Create an Account

Close Modal
Close Modal