We considered four ways in which the reward probabilities $pi$ are set, illustrated schematically in Figure 1c. First, we considered stable environments in which reward probabilities were constant. We also considered conditions of 1 reversal and 3 reversals where the payout probabilities were reversed to $1-pi$ once in the middle of the task (second display in Figure 1c) or three times at equal intervals (third display in Figure 1c). In stable, 1 reversal and 3 reversals conditions, the initial probabilities $pi$ at the start of the task were sampled at intervals of 0.1 in the range [0.05, 0.95] such that $p1\u2260p2$, and we tested all possible combinations of these probabilities (45 probability pairs). Unless otherwise noted, results are averaged across these initial probabilities.

Table 1:

. | Chosen Option $i$ . | Unchosen Option $j\u2260i$ . | ||
---|---|---|---|---|

Model . | $\delta ti>0$ . | $\delta ti<0$ . | $\delta tj>0$ . | $\delta tj<0$ . |

Confirmation model | $\alpha C$ | $\alpha D$ | $\alpha D$ | $\alpha C$ |

Valence model | $\alpha +$ | $\alpha -$ | $\alpha +$ | $\alpha -$ |

Hybrid model | $\alpha +$ | $\alpha -$ | $\alpha =$ | $\alpha =$ |

Partial feedback | $\alpha +$ | $\alpha -$ | — | — |

. | Chosen Option $i$ . | Unchosen Option $j\u2260i$ . | ||
---|---|---|---|---|

Model . | $\delta ti>0$ . | $\delta ti<0$ . | $\delta tj>0$ . | $\delta tj<0$ . |

Confirmation model | $\alpha C$ | $\alpha D$ | $\alpha D$ | $\alpha C$ |

Valence model | $\alpha +$ | $\alpha -$ | $\alpha +$ | $\alpha -$ |

Hybrid model | $\alpha +$ | $\alpha -$ | $\alpha =$ | $\alpha =$ |

Partial feedback | $\alpha +$ | $\alpha -$ | — | — |

Note: To make the table easier to read, $\alpha C$ and $\alpha +$ are highlighted in bold.

This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.