Skip to Main Content
Table 2:
Reward Shaping: Average Score and Number of Moves across 100 Episodes for 100 Agents.
RewardsAverage Score (Average Number of Moves)
(G)(H)(F)Q-Learning* (ε=0.1)Bayesian RLActive Inference
0.00 0.00 0.00 0.00 (15.00) 39.94 (9.17) 44.00 (8.67) 
0.00 −100 0.00 0.00 (15.00) 0.00 (15.00) 0.00 (15.00) 
100 −100 0.00 95.56 (3.53) 99.77 (3.02) 99.52 (3.03) 
100 0.00 −10.0 96.00 (3.48) 99.89 (3.00) 99.47 (3.00) 
100 −100 −10.0 96.47 (3.42) 99.79 (3.01) 99.58 (3.00) 
100 0.00 0.00 95.32 (3.58) 99.74 (3.00) 99.50 (3.07) 
RewardsAverage Score (Average Number of Moves)
(G)(H)(F)Q-Learning* (ε=0.1)Bayesian RLActive Inference
0.00 0.00 0.00 0.00 (15.00) 39.94 (9.17) 44.00 (8.67) 
0.00 −100 0.00 0.00 (15.00) 0.00 (15.00) 0.00 (15.00) 
100 −100 0.00 95.56 (3.53) 99.77 (3.02) 99.52 (3.03) 
100 0.00 −10.0 96.00 (3.48) 99.89 (3.00) 99.47 (3.00) 
100 −100 −10.0 96.47 (3.42) 99.79 (3.01) 99.58 (3.00) 
100 0.00 0.00 95.32 (3.58) 99.74 (3.00) 99.50 (3.07) 

Note: For this experiment, we evaluate under ε=0.0 (i.e., on-policy).

Close Modal

or Create an Account

Close Modal
Close Modal