Skip to Main Content
Table 7 

Evaluation of learned POMDP policies based on mean number of dialogue turns, mean accumulated discounted rewards in simulation, and accuracy relative to human annotations, as compared with baseline policies of choosing either the indicated action only, or uniformly random actions.

DementiaPOMDPASKENCGREPEATRAND
Dialogue turns 4.23 (2.22) 5.12 (4.04) 11.11 (8.61) 7.29 (6.89) 8.69 (7.20) 
Accumulated rewards 51.75 (8.59) 45.27 (13.24) 30.23 (11.99) 42.63 (9.23) 31.59 (11.53) 
Accuracy 91.4% 32.5% 36.3% 27.8% 86.2% 
Control POMDP ASK ENCG REPEAT RAND 
Dialogue turns 2.94 (1.53) 2.95 (1.82) 6.53 (4.55) 3.20 (1.95) 4.16 (2.87) 
Accumulated rewards 57.64 (8.30) 49.70 (10.47) 45.15 (7.14) 36.16 (11.39) 38.47 (11.12) 
Accuracy 96.1% 25.6% 19.5% 14.8% 84.1% 
Combined POMDP ASK ENCG REPEAT RAND 
Dialogue turns 4.03 (2.47) 3.73 (2.53) 9.52 (7.05) 6.28 (6.24) 6.17 (4.76) 
Accumulated rewards 50.86 (8.99) 49.34 (10.11) 33.30 (10.79) 42.09 (10.92) 34.42 (11.92) 
Accuracy 94.1% 30.8% 32.0% 24.4% 86.7% 
DementiaPOMDPASKENCGREPEATRAND
Dialogue turns 4.23 (2.22) 5.12 (4.04) 11.11 (8.61) 7.29 (6.89) 8.69 (7.20) 
Accumulated rewards 51.75 (8.59) 45.27 (13.24) 30.23 (11.99) 42.63 (9.23) 31.59 (11.53) 
Accuracy 91.4% 32.5% 36.3% 27.8% 86.2% 
Control POMDP ASK ENCG REPEAT RAND 
Dialogue turns 2.94 (1.53) 2.95 (1.82) 6.53 (4.55) 3.20 (1.95) 4.16 (2.87) 
Accumulated rewards 57.64 (8.30) 49.70 (10.47) 45.15 (7.14) 36.16 (11.39) 38.47 (11.12) 
Accuracy 96.1% 25.6% 19.5% 14.8% 84.1% 
Combined POMDP ASK ENCG REPEAT RAND 
Dialogue turns 4.03 (2.47) 3.73 (2.53) 9.52 (7.05) 6.28 (6.24) 6.17 (4.76) 
Accumulated rewards 50.86 (8.99) 49.34 (10.11) 33.30 (10.79) 42.09 (10.92) 34.42 (11.92) 
Accuracy 94.1% 30.8% 32.0% 24.4% 86.7% 
Close Modal

or Create an Account

Close Modal
Close Modal