We considered four ways in which the reward probabilities $pi$ are set, illustrated schematically in Figure 1c. First, we considered stable environments in which reward probabilities were constant. We also considered conditions of 1 reversal and 3 reversals where the payout probabilities were reversed to $1-pi$ once in the middle of the task (second display in Figure 1c) or three times at equal intervals (third display in Figure 1c). In stable, 1 reversal and 3 reversals conditions, the initial probabilities $pi$ at the start of the task were sampled at intervals of 0.1 in the range [0.05, 0.95] such that $p1≠p2$, and we tested all possible combinations of these probabilities (45 probability pairs). Unless otherwise noted, results are averaged across these initial probabilities.

Table 1:

Learning Rates in the Confirmation and Alternative Models.

Chosen Option $i$Unchosen Option $j≠i$
Model$δti>0$$δti<0$$δtj>0$$δtj<0$
Confirmation model $αC$ $αD$ $αD$ $αC$
Valence model $α+$ $α-$ $α+$ $α-$
Hybrid model $α+$ $α-$ $α=$ $α=$
Partial feedback $α+$ $α-$ — —
Chosen Option $i$Unchosen Option $j≠i$
Model$δti>0$$δti<0$$δtj>0$$δtj<0$
Confirmation model $αC$ $αD$ $αD$ $αC$
Valence model $α+$ $α-$ $α+$ $α-$
Hybrid model $α+$ $α-$ $α=$ $α=$
Partial feedback $α+$ $α-$ — —

Note: To make the table easier to read, $αC$ and $α+$ are highlighted in bold.

Close Modal