Skip to Main Content
Table 4: 
Test results for the different methods on the modified-bAbI dialog task. The numbers represent the mean and standard deviation (shown in parenthesis) of running the different methods across 5 different permutations of the test set. User Accuracy: Task success rate for the user; Model ratio: Percentage of time the classifier chooses the model M; Final Model Accuracy: Accuracy of the model M at the end of testing.
MethodUser AccuracyModel ratioFinal Model Accuracy
Per-turnPer-dialogPer-turnPer-dialog
Baseline method (M) 81.73(0) 3.7(0) 100.0(0) 81.73(0) 3.7(0) 
R: 1, 2, -4 M + C* 92.85(1.58) 33.48(10.59) 51.97(8.22) 81.73(0) 3.7(0) 
M* + C* 96.28(1.16) 54.5(10.72) 64.06(4.65) 90.83(0.82) 14.82(3.7) 
Ma*+C* 96.19 (1.21) 54.44(11.40) 61.14(6.9) 88.98(0.34) 10.26(1.39) 
 
R: 1, 3, -3 M + C* 91.31(1.15) 26.50(7.57) 58.82(4.62) 81.73(0) 3.7(0) 
M* + C* 94.67(1.20) 43.48(8.80) 70.33(2.13) 89.27(0.74) 12.84(2.22) 
Ma*+C* 94.08(1.0) 38.8(8.15) 69.69(6.14) 88.75(0.91) 11.62(2.61) 
MethodUser AccuracyModel ratioFinal Model Accuracy
Per-turnPer-dialogPer-turnPer-dialog
Baseline method (M) 81.73(0) 3.7(0) 100.0(0) 81.73(0) 3.7(0) 
R: 1, 2, -4 M + C* 92.85(1.58) 33.48(10.59) 51.97(8.22) 81.73(0) 3.7(0) 
M* + C* 96.28(1.16) 54.5(10.72) 64.06(4.65) 90.83(0.82) 14.82(3.7) 
Ma*+C* 96.19 (1.21) 54.44(11.40) 61.14(6.9) 88.98(0.34) 10.26(1.39) 
 
R: 1, 3, -3 M + C* 91.31(1.15) 26.50(7.57) 58.82(4.62) 81.73(0) 3.7(0) 
M* + C* 94.67(1.20) 43.48(8.80) 70.33(2.13) 89.27(0.74) 12.84(2.22) 
Ma*+C* 94.08(1.0) 38.8(8.15) 69.69(6.14) 88.75(0.91) 11.62(2.61) 
Close Modal

or Create an Account

Close Modal
Close Modal