Skip to Main Content
Table 3: 
MultiWOZ 2.1 test set results. TRADE (Wu et al., 2019) results are from the public implementation. “Joint Goal” (Budzianowski et al., 2018) is average dialogue state exact-match, “Dialogue” is average dialogue-level exact-match, and “Prefix” is the average number of turns before an incorrect prediction. Within each column, the best result is boldfaced, along with all results that are not significantly worse (p < 0.05, paired permutation test). Moreover, all of “Dataflow,” “inline refer,” and “inline both” have higher dialogue accuracy than TRADE (p < 0.005).
Joint GoalDialoguePrefix
Dataflow .467 .220 3.07 
inline refer .447 .202 2.97 
inline both .467 .205 2.90 
TRADE .454 .168 2.73 
Joint GoalDialoguePrefix
Dataflow .467 .220 3.07 
inline refer .447 .202 2.97 
inline both .467 .205 2.90 
TRADE .454 .168 2.73 
Close Modal

or Create an Account

Close Modal
Close Modal