Figure 4:
Regret (a) and Sharpe ratio (b) under the Rényi bound. (a) The line plot illustrates the cumulative regret across the 4000 iterations for each agent optimizing a particular Rényi bound. The x-axis denotes the iteration and y-axis the accompanying cumulative regret. (b) The line plot illustrates the average achieved Sharpe ratio of an agent across the 4000 iterations, for each particular Rényi bound. The x-axis denotes the iteration and y-axis the Sharpe ratio. Here, blue is for agents optimizing Rényi bound for α→+∞, orange for α=10, green for α=2, red for α→+1-, purple for α=0.5, and brown for α→0+. Dashed black line represents regret under a random policy (i.e., any arm). Each agent was simulated 20 times (95% confidence interval). In our simulations, the agents with α→+1- and α=2 obtained the best performance.

Regret (a) and Sharpe ratio (b) under the Rényi bound. (a) The line plot illustrates the cumulative regret across the 4000 iterations for each agent optimizing a particular Rényi bound. The x-axis denotes the iteration and y-axis the accompanying cumulative regret. (b) The line plot illustrates the average achieved Sharpe ratio of an agent across the 4000 iterations, for each particular Rényi bound. The x-axis denotes the iteration and y-axis the Sharpe ratio. Here, blue is for agents optimizing Rényi bound for α+, orange for α=10, green for α=2, red for α+1-, purple for α=0.5, and brown for α0+. Dashed black line represents regret under a random policy (i.e., any arm). Each agent was simulated 20 times (95% confidence interval). In our simulations, the agents with α+1- and α=2 obtained the best performance.

Close Modal

or Create an Account

Close Modal
Close Modal