Skip to Main Content

We considered four ways in which the reward probabilities pi are set, illustrated schematically in Figure 1c. First, we considered stable environments in which reward probabilities were constant. We also considered conditions of 1 reversal and 3 reversals where the payout probabilities were reversed to 1-pi once in the middle of the task (second display in Figure 1c) or three times at equal intervals (third display in Figure 1c). In stable, 1 reversal and 3 reversals conditions, the initial probabilities pi at the start of the task were sampled at intervals of 0.1 in the range [0.05, 0.95] such that p1p2, and we tested all possible combinations of these probabilities (45 probability pairs). Unless otherwise noted, results are averaged across these initial probabilities.

Table 1:

Learning Rates in the Confirmation and Alternative Models.

Chosen Option iUnchosen Option ji
Modelδti>0δti<0δtj>0δtj<0
Confirmation model αC αD αD αC 
Valence model α+ α- α+ α- 
Hybrid model α+ α- α= α= 
Partial feedback α+ α- — — 
Chosen Option iUnchosen Option ji
Modelδti>0δti<0δtj>0δtj<0
Confirmation model αC αD αD αC 
Valence model α+ α- α+ α- 
Hybrid model α+ α- α= α= 
Partial feedback α+ α- — — 

Note: To make the table easier to read, αC and α+ are highlighted in bold.

Close Modal

or Create an Account

Close Modal
Close Modal