MIT Press

Figure 4:

Regret (a) and Sharpe ratio (b) under the Rényi bound. (a) The line plot illustrates the cumulative regret across the 4000 iterations for each agent optimizing a particular Rényi bound. The $x$ -axis denotes the iteration and $y$ -axis the accompanying cumulative regret. (b) The line plot illustrates the average achieved Sharpe ratio of an agent across the 4000 iterations, for each particular Rényi bound. The $x$ -axis denotes the iteration and $y$ -axis the Sharpe ratio. Here, blue is for agents optimizing Rényi bound for $α \to + \infty$ ⁠, orange for $α = 10$ ⁠, green for $α = 2$ ⁠, red for $α \to + 1^{-}$ ⁠, purple for $α = 0.5$ ⁠, and brown for $α \to 0^{+}$ ⁠. Dashed black line represents regret under a random policy (i.e., any arm). Each agent was simulated 20 times (95% confidence interval). In our simulations, the agents with $α \to + 1^{-}$ and $α = 2$ obtained the best performance.

This Feature Is Available To Subscribers Only