We considered four ways in which the reward probabilities are set, illustrated schematically in Figure 1c. First, we considered stable environments in which reward probabilities were constant. We also considered conditions of 1 reversal and 3 reversals where the payout probabilities were reversed to once in the middle of the task (second display in Figure 1c) or three times at equal intervals (third display in Figure 1c). In stable, 1 reversal and 3 reversals conditions, the initial probabilities at the start of the task were sampled at intervals of 0.1 in the range [0.05, 0.95] such that , and we tested all possible combinations of these probabilities (45 probability pairs). Unless otherwise noted, results are averaged across these initial probabilities.
Learning Rates in the Confirmation and Alternative Models.
. | Chosen Option . | Unchosen Option . | ||
---|---|---|---|---|
Model . | . | . | . | . |
Confirmation model | ||||
Valence model | ||||
Hybrid model | ||||
Partial feedback | — | — |
. | Chosen Option . | Unchosen Option . | ||
---|---|---|---|---|
Model . | . | . | . | . |
Confirmation model | ||||
Valence model | ||||
Hybrid model | ||||
Partial feedback | — | — |
Note: To make the table easier to read, and are highlighted in bold.