Evaluation of learned POMDP policies based on mean number of dialogue turns, mean accumulated discounted rewards in simulation, and accuracy relative to human annotations, as compared with baseline policies of choosing either the indicated action only, or uniformly random actions.
Dementia . | POMDP . | ASK . | ENCG . | REPEAT . | RAND . |
---|---|---|---|---|---|
Dialogue turns | 4.23 (2.22) | 5.12 (4.04) | 11.11 (8.61) | 7.29 (6.89) | 8.69 (7.20) |
Accumulated rewards | 51.75 (8.59) | 45.27 (13.24) | 30.23 (11.99) | 42.63 (9.23) | 31.59 (11.53) |
Accuracy | 91.4% | 32.5% | 36.3% | 27.8% | 86.2% |
Control | POMDP | ASK | ENCG | REPEAT | RAND |
Dialogue turns | 2.94 (1.53) | 2.95 (1.82) | 6.53 (4.55) | 3.20 (1.95) | 4.16 (2.87) |
Accumulated rewards | 57.64 (8.30) | 49.70 (10.47) | 45.15 (7.14) | 36.16 (11.39) | 38.47 (11.12) |
Accuracy | 96.1% | 25.6% | 19.5% | 14.8% | 84.1% |
Combined | POMDP | ASK | ENCG | REPEAT | RAND |
Dialogue turns | 4.03 (2.47) | 3.73 (2.53) | 9.52 (7.05) | 6.28 (6.24) | 6.17 (4.76) |
Accumulated rewards | 50.86 (8.99) | 49.34 (10.11) | 33.30 (10.79) | 42.09 (10.92) | 34.42 (11.92) |
Accuracy | 94.1% | 30.8% | 32.0% | 24.4% | 86.7% |
Dementia . | POMDP . | ASK . | ENCG . | REPEAT . | RAND . |
---|---|---|---|---|---|
Dialogue turns | 4.23 (2.22) | 5.12 (4.04) | 11.11 (8.61) | 7.29 (6.89) | 8.69 (7.20) |
Accumulated rewards | 51.75 (8.59) | 45.27 (13.24) | 30.23 (11.99) | 42.63 (9.23) | 31.59 (11.53) |
Accuracy | 91.4% | 32.5% | 36.3% | 27.8% | 86.2% |
Control | POMDP | ASK | ENCG | REPEAT | RAND |
Dialogue turns | 2.94 (1.53) | 2.95 (1.82) | 6.53 (4.55) | 3.20 (1.95) | 4.16 (2.87) |
Accumulated rewards | 57.64 (8.30) | 49.70 (10.47) | 45.15 (7.14) | 36.16 (11.39) | 38.47 (11.12) |
Accuracy | 96.1% | 25.6% | 19.5% | 14.8% | 84.1% |
Combined | POMDP | ASK | ENCG | REPEAT | RAND |
Dialogue turns | 4.03 (2.47) | 3.73 (2.53) | 9.52 (7.05) | 6.28 (6.24) | 6.17 (4.76) |
Accumulated rewards | 50.86 (8.99) | 49.34 (10.11) | 33.30 (10.79) | 42.09 (10.92) | 34.42 (11.92) |
Accuracy | 94.1% | 30.8% | 32.0% | 24.4% | 86.7% |