Table 7

Evaluation of learned POMDP policies based on mean number of dialogue turns, mean accumulated discounted rewards in simulation, and accuracy relative to human annotations, as compared with baseline policies of choosing either the indicated action only, or uniformly random actions.

Dialogue turns 4.23 (2.22) 5.12 (4.04) 11.11 (8.61) 7.29 (6.89) 8.69 (7.20)
Accumulated rewards 51.75 (8.59) 45.27 (13.24) 30.23 (11.99) 42.63 (9.23) 31.59 (11.53)
Accuracy 91.4% 32.5% 36.3% 27.8% 86.2%
Control POMDP ASK ENCG REPEAT RAND
Dialogue turns 2.94 (1.53) 2.95 (1.82) 6.53 (4.55) 3.20 (1.95) 4.16 (2.87)
Accumulated rewards 57.64 (8.30) 49.70 (10.47) 45.15 (7.14) 36.16 (11.39) 38.47 (11.12)
Accuracy 96.1% 25.6% 19.5% 14.8% 84.1%
Combined POMDP ASK ENCG REPEAT RAND
Dialogue turns 4.03 (2.47) 3.73 (2.53) 9.52 (7.05) 6.28 (6.24) 6.17 (4.76)
Accumulated rewards 50.86 (8.99) 49.34 (10.11) 33.30 (10.79) 42.09 (10.92) 34.42 (11.92)
Accuracy 94.1% 30.8% 32.0% 24.4% 86.7%