End-to-end response generation results on MultiWOZ2.0. ✓and✗ denote whether the dialogue act annotation is used in the training process. We list all the model sizes of the Transformer-based end-to-end TOD models. Notice that we directly use the UBAR result provided by Liu et al. According the released code of the UBAR, they have not used the standard evaluation metric, which is unfair to compare to other methods. We also run their code with released model checkpoint, whose combined score is even worse than the result provided by Liu et al. Results are significant (p < 0.01) comparing the OPAL model and BART model as the initialized TOD model.
Model . | Model Size . | Dialogue Act . | Inform . | Success . | BLEU . | Combined . |
---|---|---|---|---|---|---|
Sequicity (Lei et al., 2018) | – | ✗ | 66.40 | 45.30 | 15.54 | 71.39 |
HRED-TS (Peng et al., 2019) | – | ✓ | 70.00 | 58.00 | 17.50 | 81.50 |
DSTC8 Winner (Ham et al., 2020) | 124M | ✓ | 73.00 | 62.40 | 16.00 | 83.50 |
DAMD (Zhang et al., 2020b) | – | ✓ | 76.40 | 60.40 | 16.60 | 85.00 |
SimpleTOD (Hosseini-Asl et al., 2020) | 117M | ✓ | 84.40 | 70.10 | 15.01 | 92.26 |
SOLOIST (Peng et al., 2020a) | 117M | ✗ | 85.50 | 72.90 | 16.54 | 95.74 |
MinTL-BART (Lin et al., 2020) | 406M | ✗ | 84.88 | 74.91 | 17.89 | 97.78 |
UBAR (Yang et al., 2021) | 82M | ✗ | 88.20 | 79.50 | 16.43 | 100.28 |
NCMB (Liu et al., 2021) | 116M | ✓ | 85.90 | 74.80 | 19.76 | 100.11 |
NCML (Liu et al., 2021) | 292M | ✓ | 86.90 | 76.20 | 20.58 | 102.13 |
HTER (Santra et al., 2021) | – | ✓ | 91.72 | 75.80 | 19.05 | 102.81 |
BART | 139M | ✗ | 87.50 | 72.20 | 16.67 | 96.53 |
OPAL | 139M | ✗ | 89.40 | 81.10 | 18.60 | 103.85 |
BARTL | 406M | ✗ | 86.20 | 70.30 | 17.01 | 95.26 |
OPALL | 406M | ✗ | 88.00 | 82.80 | 20.80 | 106.20 |
Model . | Model Size . | Dialogue Act . | Inform . | Success . | BLEU . | Combined . |
---|---|---|---|---|---|---|
Sequicity (Lei et al., 2018) | – | ✗ | 66.40 | 45.30 | 15.54 | 71.39 |
HRED-TS (Peng et al., 2019) | – | ✓ | 70.00 | 58.00 | 17.50 | 81.50 |
DSTC8 Winner (Ham et al., 2020) | 124M | ✓ | 73.00 | 62.40 | 16.00 | 83.50 |
DAMD (Zhang et al., 2020b) | – | ✓ | 76.40 | 60.40 | 16.60 | 85.00 |
SimpleTOD (Hosseini-Asl et al., 2020) | 117M | ✓ | 84.40 | 70.10 | 15.01 | 92.26 |
SOLOIST (Peng et al., 2020a) | 117M | ✗ | 85.50 | 72.90 | 16.54 | 95.74 |
MinTL-BART (Lin et al., 2020) | 406M | ✗ | 84.88 | 74.91 | 17.89 | 97.78 |
UBAR (Yang et al., 2021) | 82M | ✗ | 88.20 | 79.50 | 16.43 | 100.28 |
NCMB (Liu et al., 2021) | 116M | ✓ | 85.90 | 74.80 | 19.76 | 100.11 |
NCML (Liu et al., 2021) | 292M | ✓ | 86.90 | 76.20 | 20.58 | 102.13 |
HTER (Santra et al., 2021) | – | ✓ | 91.72 | 75.80 | 19.05 | 102.81 |
BART | 139M | ✗ | 87.50 | 72.20 | 16.67 | 96.53 |
OPAL | 139M | ✗ | 89.40 | 81.10 | 18.60 | 103.85 |
BARTL | 406M | ✗ | 86.20 | 70.30 | 17.01 | 95.26 |
OPALL | 406M | ✗ | 88.00 | 82.80 | 20.80 | 106.20 |