Skip to Main Content
Table 1:
Average Reward (and 95% Confidence Interval) for Each Agent, across Both Deterministic and Stochastic Environments.
Average Score [95% CI]
AlgorithmBelief-BasedDeterministic EnvironmentStochastic Environment
Q-Learning (ε=0.197.79 [97.41, 98.16] 66.08 [63.28, 68.88] 
Q-Learning (ε=1 decaying to 0) 80.44 [78.96, 81.93] 65.13 [62.57, 67.68] 
Bayesian RL 99.76 [99.45, 100.00] 64.39 [60.33, 68.44] 
Active Inference 99.88 [99.64, 100.00] 98.90 [98.00, 99.79] 
Active Inference (null model) 50.03 [49.70, 50.35] 50.22 [49.89, 50.22] 
Average Score [95% CI]
AlgorithmBelief-BasedDeterministic EnvironmentStochastic Environment
Q-Learning (ε=0.197.79 [97.41, 98.16] 66.08 [63.28, 68.88] 
Q-Learning (ε=1 decaying to 0) 80.44 [78.96, 81.93] 65.13 [62.57, 67.68] 
Bayesian RL 99.76 [99.45, 100.00] 64.39 [60.33, 68.44] 
Active Inference 99.88 [99.64, 100.00] 98.90 [98.00, 99.79] 
Active Inference (null model) 50.03 [49.70, 50.35] 50.22 [49.89, 50.22] 

Note: The results are calculated from the 200 trials across 500 episodes.

Close Modal

or Create an Account

Close Modal
Close Modal