Simulation setup. (a) Reward contingencies. The illustration represents the chosen (orange) and unchosen (blue) bandits, each with a feedback signal (central number). Below, we state the range of possible outcomes and probabilities. (b) Learning periods. The illustration represents the different lengths of the learning periods and the different outcome combinations potentially received by the agents. (c) Volatility types. The line plots represent the evolution of the two arms' probability across trials in the different volatility conditions.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.