We considered four ways in which the reward probabilities $pi$ are set, illustrated schematically in Figure 1c. First, we considered stable environments in which reward probabilities were constant. We also considered conditions of 1 reversal and 3 reversals where the payout probabilities were reversed to $1-pi$ once in the middle of the task (second display in Figure 1c) or three times at equal intervals (third display in Figure 1c). In stable, 1 reversal and 3 reversals conditions, the initial probabilities $pi$ at the start of the task were sampled at intervals of 0.1 in the range [0.05, 0.95] such that $p1\u2260p2$, and we tested all possible combinations of these probabilities (45 probability pairs). Unless otherwise noted, results are averaged across these initial probabilities.

Table 1:

. | Chosen Option $i$ . | Unchosen Option $j\u2260i$ . | ||
---|---|---|---|---|

Model . | $\delta ti>0$ . | $\delta ti<0$ . | $\delta tj>0$ . | $\delta tj<0$ . |

Confirmation model | $\alpha C$ | $\alpha D$ | $\alpha D$ | $\alpha C$ |

Valence model | $\alpha +$ | $\alpha -$ | $\alpha +$ | $\alpha -$ |

Hybrid model | $\alpha +$ | $\alpha -$ | $\alpha =$ | $\alpha =$ |

Partial feedback | $\alpha +$ | $\alpha -$ | — | — |

. | Chosen Option $i$ . | Unchosen Option $j\u2260i$ . | ||
---|---|---|---|---|

Model . | $\delta ti>0$ . | $\delta ti<0$ . | $\delta tj>0$ . | $\delta tj<0$ . |

Confirmation model | $\alpha C$ | $\alpha D$ | $\alpha D$ | $\alpha C$ |

Valence model | $\alpha +$ | $\alpha -$ | $\alpha +$ | $\alpha -$ |

Hybrid model | $\alpha +$ | $\alpha -$ | $\alpha =$ | $\alpha =$ |

Partial feedback | $\alpha +$ | $\alpha -$ | — | — |

Note: To make the table easier to read, $\alpha C$ and $\alpha +$ are highlighted in bold.

This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy.