Hundreds of ERP studies have reported a midfrontal negative-going amplitude shift following negative compared with positive action outcomes. This feedback-related negativity (FRN) effect is typically thought to reflect an early and binary mechanism of action evaluation in the posterior midcingulate cortex. However, in prior research on the FRN effect, the instantaneous value and the long-term value of action outcomes have been perfectly confounded. That is, instantaneously positive outcomes were generally consistent with task goals, whereas instantaneously negative outcomes were inconsistent with task goals. In this study, we disentangled these two outcome aspects in two experiments. Our results reveal an interaction of instantaneous and long-term outcome values. More precisely, our findings strongly suggest that the FRN effect is mainly driven by a reward positivity, which is evoked only by outcomes that possess an instantaneously positive value and also help the organism to reach its long-term goals. These findings add to a recent literature according to which the posterior midcingulate cortex acts as a hierarchical reinforcement learning system and suggest that this system integrates instant and long-term action–outcome values. This, in turn, might be crucial for learning optimal behavioral strategies in a given setting.
For a considerable time, cognitive psychologists and neuroscientists have been interested in the neuronal foundations of performance monitoring, that is, the brain's ability to quickly and efficiently evaluate action and decision outcomes. Although research on this topic has already yielded remarkable insights (Ullsperger, Danielmeier, & Jocham, 2014), many open questions remain. For instance, hundreds of studies have shown that, about 200–350 msec after the onset of a negative compared with positive action feedback, a negative-going amplitude shift occurs at frontomedial sites in the scalp-recorded EEG. This difference between positive and negative action feedback is commonly referred to as feedback-related negativity (FRN) effect (for recent overviews, see Sambrook & Goslin, 2015; Walsh & Anderson, 2012). However, what exactly defines whether a particular action outcome is positive or negative? Imagine, for example, you are working on an important grant proposal the deadline of which will end in a few hours. Although this work is still far from being ready for submission, you decide to take a break and watch some funny Internet videos. Although the outcome of this decision may be positive on the short run, it may be rather negative with regard to your long-term goals. To date, it is unclear whether the FRN effect reflects the instantaneous value of action and decision outcomes or its long-term value. As we will outline below, an answer to this question would be highly relevant to further our understanding of the functional significance of the FRN effect and its neural generator. In the present work, we therefore addressed this issue in two experiments.
The FRN effect has first been described by Miltner, Braun, and Coles (1997), and since then an enormous amount of research has been conducted to uncover the neurocognitive functions underlying it. Today, there is more or less consensus that the FRN effect reflects an early binary distinction between positive and negative outcomes (e.g., Hajcak, Moser, Holroyd, & Simons, 2006; Holroyd, Hajcak, & Larsen, 2006; Yeung & Sanfey, 2004; Gehring & Willoughby, 2002) and that it is generated in the posterior midcingulate cortex (pMCC; e.g., Hauser et al., 2014; Gruendler, Ullsperger, & Huster, 2011; Bellebaum & Daum, 2008; Hewig et al., 2007; Gehring & Willoughby, 2002; Miltner et al., 1997; also see, Warren, Hyman, Seamans, & Holroyd, 2015; Emeric et al., 2008). According to initial accounts of the FRN effect, it mainly arises from neural responses to negative action outcomes (i.e., a punishment or non-reward negativity; e.g., Gehring & Willoughby, 2002; Holroyd & Coles, 2002; Miltner et al., 1997). However, more recent research has indicated that the FRN effect may rather be driven by a reward positivity (Rew-P), which attenuates a default frontomedial N2 component and which is present for positive but not for negative outcomes (Holroyd, Pakzad-Vaezi, & Krigolson, 2008; also see, Frömer, Stürmer, & Sommer, 2016; Gibbons, Schnuerch, & Stahl, 2016; Sambrook & Goslin, 2016; Proudfit, 2015; Varona-Moya, Moris, & Luque, 2015; Becker, Nitsch, Miltner, & Straube, 2014; Kujawa, Smith, Luhmann, & Hajcak, 2013; Kreussel et al., 2012; Luque, López, Marco-Pallares, Càmara, & Rodriguez-Fornells, 2012; Warren & Holroyd, 2012; Foti, Weinberg, Dien, & Hajcak, 2011; Holroyd, Krigolson, & Lee, 2011; Hewig et al., 2007, 2010; Eppinger, Kray, Mock, & Mecklinger, 2008; Holroyd et al., 2008; Cohen, Elger, & Ranganath, 2007; Potts, Martin, Burton, & Montague, 2006).
With regard to the detailed functional meaning of the FRN effect, it has been assumed that it reflects the signaling of reward prediction errors in the pMCC (i.e., an outcome is worse than expected vs. better than expected), which may use these signals to optimize task behavior (e.g., Osinsky, Seeger, Mussel, & Hewig, 2016; Sambrook & Goslin, 2014; Nieuwenhuis, Holroyd, Mol, & Coles, 2004; Holroyd & Coles, 2002). If this function of the pMCC is located on a low hierarchical level of behavioral organization and control, the FRN effect should be sensitive only to the isolated instantaneous value of a single simple action (i.e., whether the instant action outcome is positive vs. negative, irrespective of its long-term value). Such view would be consistent, for instance, with findings showing that the FRN effect is not sensitive to task correctness of an outcome in terms of counterfactual comparisons (i.e., an obtained small monetary gain might have a positive instantaneous value but would be incorrect with regard to the long-term task if the unchosen option would have resulted in a larger gain; e.g., Osinsky, Walter, & Hewig, 2014; Kujawa et al., 2013; Yeung & Sanfey, 2004; Gehring & Willoughby, 2002; but also see Nieuwenhuis, Yeung, Holroyd, Schurger, & Cohen, 2004). However, recent empirical and theoretical work suggests that the pMCC may act as a system of hierarchical reinforcement learning and therefore rather plays an important role in the selection and maintenance of hierarchically more complex behavior (e.g., a sequence of several simple actions to complete a complex task; Holroyd & McClure, 2015; Holroyd & Yeung, 2012; Ribas-Fernandes et al., 2011). For instance, Holroyd and Yeung (2012) presented a model according to which the pMCC selects between extended sequences of behavioral acts to reach a particular goal (e.g., preparing a meal by your own vs. ordering it via telephone) and continuously receives information about task progress from a critic module consisting of the OFC and the ventral striatum. In case of an event that threatens task completion, the pMCC initiates adaptations in control to maintain the system on-task. If, however, an event brings the individual closer to reaching its long-term goal, the pMCC uses this information to learn about optimal behavioral strategies in a given context. Briefly, the pMCC “is more concerned with the selection and maintenance of the task itself than with the minutiae of task execution” (Holroyd & Yeung, 2012, p. 123). Given that the FRN effect reflects the incoming and/or utilization of such task-related outcome information in the pMCC (cf. Ribas-Fernandes et al., 2011) it should be more reflective of the long-term task value of an action outcome rather than the instantaneous, task-unrelated value.
To clarify whether the FRN effect is more reflective of instantaneous or long-term values, it would be necessary to disentangle these two outcome aspects in a single task design. In prior research on the FRN effect, researchers have frequently used simple gambling or guessing tasks, during each trial of which participants can decide between two or more alternatives, leading either to a monetary favorable or unfavorable outcome (e.g., wins and losses; e.g., Mushtaq, Wilkie, Mon-Williams, & Schaefer, 2016; Mussel, Reiter, Osinsky, & Hewig, 2015; Liu, Nelson, Bernat, & Gehring, 2014; Cui, Chen, Wang, Shum, & Chan, 2013; Kreussel et al., 2012; Osinsky, Mussel, & Hewig, 2012; Carlson, Foti, Mujica-Parodi, Harmon-Jones, & Hajcak, 2011; Foti et al., 2011; Walsh & Anderson, 2011; Moser & Simons, 2009; Hajcak et al., 2006; Hajcak, Holroyd, Moser, & Simons, 2005; Holroyd, Larsen, & Cohen, 2004). With no other explicit instruction, the instantaneous value (loss = negative; win = positive) of an outcome in such tasks directly corresponds with its long-term value as the individual will usually intend to make as much money as possible across the task. Thus, in most prior research on the FRN effect, the instantaneous value and the long-term value of an outcome have been perfectly confounded. In the two experiments presented here, we aimed to distinguish between these two outcome aspects by using special task instructions. In both experiments, participants performed a simple guessing task, consisting of two crucial task blocks which differed by instruction. In Experiment 1, the individual played a game in which she/he navigated through a magic maze (see Figure 1). In each trial, she/he was asked to choose one of three doors, directly leading to a passageway with a turnstile or an impasse containing a treasure chest. Importantly, the passageway was associated with a monetary loss and, therefore, a negative instantaneous value, whereas the impasse was associate with a monetary win and, hence, a positive instantaneous value. As more extended task context, in one block participants were instructed to collect as much money as possible (i.e., the standard block), whereas they should escape the maze in the other block (i.e., the reversed block). Therefore, instantaneous and long-term values converged in the standard block (i.e., positive instant value/task-supportive long-term value; negative instant value/task-unsupportive long-term value) and diverged in the reversed block (i.e., positive instant value/task-unsupportive long-term value; negative instant value/task-supportive long-term value). By comparing the four possible combinations of long-term and instantaneous value, we were able to test whether the FRN effect reflects only a single outcome aspect (i.e., instantaneous outcome value or long-term value), the additive combination of both, or more complex interactions. Whereas we used abstract visual stimuli to signal wins and losses in Experiment 1, we applied natural pictures of bunnies and spiders in Experiment 2. Such pictures probably possess a more inherent instantaneous value in terms of emotional valence than abstract symbols. Moreover, in Experiment 2, we added a third block type, in which no explicit long-term goal was formulated (we call this the task neutral block). By comparing the results of the two experiments, we are able to draw some conclusions with regard to the generalizability of any potential effects of instantaneous and long-term outcome value on the FRN.
Thirty-five individuals from the population of psychology students in Würzburg, Germany, initially responded to an announcement of the study. As six of these individuals did not show up at the experimental session, the final sample consisted of 29 participants (5 men, 24 women, mean age = 21.3 years, age range = 18–29 years). All had normal or corrected-to-normal vision and reported to be free of any mental or neurological disorder. For participation, they received course credit as well as a fixed monetary compensation of €6.10 (see below). Participants gave written informed consent. The study was approved by the local ethics committee and was in accordance with the declaration of Helsinki. After task completion, participants received a full debriefing with regard to the pseudorandom trial outcomes (see below).
Guessing Task and Outcome Ratings
Participants completed a guessing task in which they should imagine that they navigate through a magic virtual maze (see Figure 1). They were told that this maze consists of numerous rooms, in each of which there are three doors. They could choose one of the doors by button press. Each of the doors could lead either to a passageway to the next room or to an impasse. Moreover, participants were informed that in each passageway there is a turnstile where they have to pay 5 cents from their game account to pass through. In contrast, in each impasse there was a treasure chest containing 10 cents, which would be added to their account.1 Hence, passageways led to an instantaneous monetary loss whereas impasses led to an instantaneous monetary win. Participants were also told that, after entering an impasse, they would return to the last room, but because the maze is magic, the doors would have shuffled. Moreover, they were informed that they only have a limited number of moves before the game ends. Finally and most importantly, in the standard block participants were instructed to find as many treasure chests as possible. In contrast, in the reversed block they should try to find as many passageways as possible to escape the maze before the game ends. Thus, the long-term value of the outcomes differed between the two blocks, whereas the instantaneous value was always identical.
Each block consisted of 120 critical trials with a pseudorandom outcome order and an equal number of impasses (60) and passageways (60). In addition, at the end of each block a final trial was presented with an outcome consistent with the overall task-block goal (i.e., an impasse in the standard block and an exit sign in the reversed block). Therefore, each participant gained a total monetary outcome of 6.10 Euros during the whole task. Block order was counterbalanced across participants. Each trial started with a blank screen being presented for 250 msec. Afterwards, the three doors were shown until the participant chose one of them by pressing a respective button. A central fixation cross was then presented for 1000 msec before the outcome stimulus (i.e., an impasse with a treasure chest or a passageway with a turnstile) was shown for 1.500 msec. The task was presented on a 21-in. monitor and was controlled by Presentation experimental software (Neurobehavioral Systems, Inc., Albany, CA). Responses were given on customary keyboard, using the 〈left〉, 〈up〉, and 〈right〉 buttons. All stimuli were presented on a black background.
Following each block, participants rated the two outcomes on the 5-point valence scale of the Self-Assessment Manikin (Bradley & Lang, 1994; 1 = positive, 3 = neutral, 5 = negative). In addition, they should indicate on a 5-point scale what outcome they generally intended to find (1 = passageways, 5 = impasses).
EEG Recordings and Analyses
EEG was recorded with a sampling rate of 250 Hz at 31 scalp positions (Fp1, Fp2, F9, F7, F3, Fz, F4, F8, F10, FC5, FC1, FCz, FC2, FC6, T7, C3, C4, T8, TP9, CP1, CP2, TP10, P7, P3, Pz, P4, P8, PO9, O1, O2, and PO10), using a BrainAmp System (Brain Products GmbH, Gilching, Germany), Fast'nEasy electrode caps (Easy Cap GmbH, Herrsching, Germany), and Brainvision Recorder software (Brain Products GmbH). Recording sites were prepared so that impedances were below 10 kΩ. The online reference was mounted at position Cz, and the ground electrode was placed between positions Fpz and Fz. During recording, the signal was band-passed filtered (0.016–80 Hz).
Offline data processing was conducted using Brainvision Analyzer software. First, data were re-referenced to the mean of electrodes TP9 and TP10 (mastoids), and former reference at Cz was reinstated as a new data channel. Afterwards, data were further filtered using a 30-Hz (3 dB point) low-pass filter (Butterworth Zero Phase Filter) with a 48-dB/octave roll-off function. An independent component analysis (extended infomax algorithm)-based correction method was then applied to correct for ocular artifacts. Resulting data were segmented around the outcome onset (−200 to 800 msec). Segments with remaining artifacts were rejected when containing voltage steps of 20 μV/msec or more and if the max–min difference within the segment was equal to or larger than 150 μV. Afterwards, segments were averaged per condition and baseline-corrected, using the −200 to 0 msec time window. At least 20 segments were available per person and condition for averaging.
Quantification of Variables and Statistical Analysis
Postblock picture ratings were clearly not normally distributed and therefore analyzed using the nonparametric Wilcoxon test. For the valence ratings, we isolated the main effect of instantaneous value by aggregating valence ratings for each outcome (impasses/wins, passageways/losses) across both blocks. Conversely, to isolate the main effect of long-term value, we separately aggregated valence ratings for task-supportive and task-unsupportive outcomes across impasses and passageways. Finally, to analyze the interaction between instantaneous and long-term value, we calculated the difference between task-unsupportive and task-supportive outcomes separately for passageways and impasses and also compared these difference scores.
To quantify outcome-driven adjustments in behavior, we calculated rates of postoutcome switches in choice from trial n to trial n + 1. These rates could vary between 0 (no switching at all) and 1 (switching in choice behavior following each trial) and were analyzed using a 2 × 2 repeated-measures ANOVA with the factors Instantaneous Value (impasse/win, passageway/loss) and Long-term Value (task-supportive, task-unsupportive).
The FRN was quantified as mean amplitude between 220 and 300 msec at Fz and FCz. Mean FRN amplitudes were analyzed using a 2 × 2 × 2 repeated-measures ANOVA with the factors Electrode (Fz, FCz), Instantaneous Value (impasse/win, passageway/loss), and Long-term Value (task-supportive, task-unsupportive). Critical alpha level was set at .05. For significant effects, partial eta squared (ηp2) values are reported.
Explicit Outcome Ratings
As mentioned above, explicit ratings were analyzed using nonparametric Wilcoxon tests, and therefore, z statistics are reported in this section. With regard to the valence ratings, passageways/losses were generally rated more negative than impasses/wins (z = −3.04, p = .002; see Figure 2). Even more clearly, task-supportive outcomes were rated more positive than task-unsupportive outcomes (z = −4.65, p < .001). Moreover, we detected an interaction effect when comparing passageways and impasses with regard to the difference between task-unsupportive and task-supportive outcomes (z = −2.08, p = .037). Thus, the effect of long-term value was slightly more pronounced for passageways/losses (mean difference between negative and positive long-term value = 2.57) compared with impasses/wins (mean difference between negative and positive long-term value = 2.09).
In addition, participants reported that they intended to find the impasses in the standard block and the passageways in the reversed block (z = 4.79, p < .001; not shown in Figure 2; standard block: M = 4.76, SE = 0.08; reversed block: M = 1.34, SE = 0.16).
Mean postoutcome switching rates are shown in Figure 2. The main effect of Long-term Value was significant (F(1, 28) = 26.78, p < .001, ηp2 = .49), reflecting higher switching rates after task-unsupportive compared with supportive outcomes. The main effect of Instantaneous Value (F(1, 28) = 1.45, p = .24) and the interaction of both factors (F(1, 28) = 0.43, p = .84) were not significant.
Mean FRN values across electrode Fz and FCz are shown in Figure 2. Outcome-locked ERP waveforms at Fz are presented in Figure 3. The main effect of Electrode was significant, reflecting generally more negative FRN amplitudes at Fz compared with FCz (F(1, 28) = 18.86, p < .001, ηp2 = .40). The main effect of Instantaneous Outcome was not significant (F(1, 28) = 2.12, p = .16). However, we observed a significant main effect of Long-term Value (F(1, 28) = 5.74, p = .023, ηp2 = .17), which was qualified by a two-way interaction of Long-term Value × Instantaneous Value (F(1, 28) = 4.73, p = .038, ηp2 = .15). Analyses of simple effects revealed that there was a significant effect of Long-term Value for the impasses/wins (F(1, 28) = 8.57, p = .007, ηp2 = .23), reflecting more positive FRN amplitudes when the impasses/wins supported the task goal than when being opposed to the task goal. For the passageways/losses, the effect of Long-term Value was not significant (F(1, 28) = 0.76, p = .39). Finally, a significant three-way interaction of Electrode × Long-term Value × Instantaneous Value indicated that this pattern was more pronounced at electrode Fz (F(1, 28) = 9.16, p = .005, ηp2 = .25).
To analyze whether the FRN is linked to behavioral adjustments, we calculated simple correlations for each outcome condition. All these correlations were small (range: −.20 to .29) and did not significantly deviate from zero (all ps > .12). The same applied when calculating correlations between FRN and switching difference scores (task-unsupportive minus task-supportive) for impasses/wins (r = −.20, p = .30) and passageways/losses (r = .17, p = .39).
Discussion Experiment 1
The explicit ratings of the outcomes following each block indicate that participants generally complied with the block instructions. Our electrophysiological analyses clearly show that amplitude in the FRN time range does neither reflect the isolated instantaneous value of an action outcome nor its isolated long-term value alone. Rather, we observed an interaction of the two outcome aspects, which was driven by a modulatory effect of long-term value on the FRN response to instantaneously positive but not negative outcomes. In particular, amplitude in the FRN time range appears to be more positive when an instantaneously positive outcome fits the long-term task goals.
Critically, it could be argued that the abstract visual outcomes used in Experiment 1 do not possess an inherent instantaneous value. Thus, it might be possible that participants volitionally reformulated the instantaneous values of the outcomes in a block-specific fashion. This would also be consistent with the strong main effect of long-term value on explicit valence ratings. We therefore designed a second experiment, using visual stimuli that are more likely to possess an inherent value in terms of emotional valence, that is, pictures of spiders versus pictures of bunnies.
The sample comprised 35 women (mean age = 23.8 years, age range = 19–32 years), some of whom were Psychology students. All were recruited from the Würzburg community population, and none had participated in Experiment 1. All had a normal or corrected-to-normal vision and reported to be free of any mental or neurological disorder. Participants received either course credit for Psychology students or a monetary compensation of 15 Euros. All gave written informed consent. The study was approved by the local ethics committee and was in accordance with the declaration of Helsinki. After task completion, participants received a full debriefing with regard to the pseudorandom trial outcomes (see below).
Guessing Task and Outcome Ratings
The general task design in Experiment 2 was very similar to the one used in Experiment 1. Participants were asked to imagine that they are walking through a new kind of zoo (i.e., the maze zoo) in which only spiders and bunnies live. Moreover, they were told that this zoo consists of numerous rooms, each of which houses a single animal (i.e., either a spider or a bunny). In each trial, they could choose between two doors by button press. Afterwards, they would enter the chosen room and see the animal that is living there (i.e., the choice outcome). At the end of the room they will find two other doors leading to the next rooms and so on. They were also informed that after a certain number of rooms they would reach the exit of the zoo, which, however, does not mean that they have seen all of the rooms/animals.
On the basis of prior research, we assumed that spiders compared with bunnies should possess a more negative inherent emotional valence (Öhman & Mineka, 2001). Accordingly, the instantaneous outcome value should be more positive for bunnies compared with spiders. To manipulate the long-term value, we used different instructions in the three task blocks. In the standard block, the participant should imagine that she is working in the zoo and that the zoo director has asked her to find as many bunnies as possible in one walk-through. In contrast, participants were asked to find as many spiders as possible in the reversed block. Finally, in the neutral block they had no special instruction to find a specific animal but should imagine that they are just going through the zoo for a private walk.
The block order was counterbalanced across participants. Each block consisted of 80 trials with a pseudorandom outcome order and an equal number of bunnies (40) and spiders (40). Ten spider pictures and 10 bunny pictures were used, with each picture being presented four times per block. Each trial started with a blank screen being presented for 500 msec. Afterwards, the two doors were shown until the participant chose one of them by pressing a respective button. A central fixation cross was then displayed for 750 msec before the animal picture was shown for 1500 msec. The same hardware and software components were used as in Experiment 1. All stimuli were presented on a gray background.
Following each block, we asked participants to indicate emotional valence for the two animal categories on the respective scales of the Self-Assessment Manikin (Bradley & Lang, 1994; 1 = positive/not all arousing, 3 = neutral/somewhat arousing, 5 = negative/very arousing). Moreover, participants should indicate whether they intended to find the spiders (5) or the bunnies (1) on a 5-point scale.
EEG Recordings and Analyses
The same setups, procedures, and approaches for FRN quantification were used as described for Experiment 1.
Quantification of Variables and Statistical Analysis
Again, explicit ratings were analyzed using nonparametric tests. For valence ratings, we isolated the main effect of Instantaneous Value by aggregating across the three task blocks separately for spiders and bunnies. The main effect of Long-term Value was analyzed by entering the aggregated valence ratings for task-supportive outcomes, task-unsupportive outcomes, and task-neutral outcomes into a Friedman test. Any potential interaction between both factors was analyzed by pairwise comparisons using Wilcoxon tests.
Rates for postoutcome switching in choice behavior were calculated and entered in a 2 × 3 repeated-measures ANOVA with the with-subject factors Instantaneous Value (negative/spiders, positive/bunnies) and Long-term Value (in support of task goal, opposed to task goal, neutral).
FRN amplitudes were entered into a 2 × 2 × 3 repeated-measures ANOVA with the with-subject factors Electrode (Fz, FCz), Instantaneous Value (negative/spiders, positive/bunnies), and Long-term Value (in support of task goal, opposed to task goal, neutral). p values were adjusted using the Greenhouse–Geisser correction when the Mauchly test indicated a violation of sphericity assumption. In such cases, uncorrected degrees of freedom and epsilon values (ε) are reported. All other statistical procedures and tests were the same as described in Experiment 1.
Explicit Outcome Ratings
Bunnies compared with spiders were generally rated more positive (z = −4.74, p < .001; see Figure 2). Moreover, the main effect of Long-term Value was significant (χ2 = 8.22, p = .016). Pairwise comparisons by means of Wilcoxon tests showed that task-unsupportive outcomes were rated slightly more negative than task-neutral outcomes (z = −2.01, p = .045), whereas there was no significant difference between task-supportive and task-unsupportive outcomes (z = −1.62, p = .11) and also no difference between task-supportive and task-neutral outcomes (z = −0.03, p = .98). When analyzing the effect of long-term value separately for spiders and bunnies, we did not observe a significant effect for bunnies (χ2 = 0.89, p = .64) but for spiders (χ2 = 9.86, p = .007). Pairwise comparisons revealed that spiders were rated more negative when being task-unsupportive compared with task-supportive (z = −2.26, p = .024). The other comparisons did not reach statistical significance (all zs > −1.75, all ps > .08).
Finally, the three task blocks significantly differed with regard to what kind of animal participants intended to find, as indicated by a Friedman test (χ2 = 52.07, p < .001). Pairwise comparisons by means of Wilcoxon tests revealed significant differences between all three blocks (all zs < −3.58, all ps < .001). Thus, participants reported the intention to find the bunnies in the standard block, the spiders in the reversed block, and a tendency toward the bunnies in the neutral block (not shown in Figure 2; standard block: M = 4.83, SE = 0.51; reversed block: M = 1.91, SE = 1.42; neutral block: M = 1.94, SE = 1.00).
Mean postoutcome switching rates are shown in Figure 2. Although participants tended to more often switch in choice behavior following spiders compared with bunnies, this difference failed to reach statistical significance (F(1, 34) = 3.76, p = .061). The main effect of Long-term Value was significant (F(2, 68) = 14.62, p < .001, ε = .98, ηp2 = .30). Pairwise comparisons revealed that switching rates were much lower following task-supportive outcomes compared with task-unsupportive (t(34) = 4.68, p < .001) and task-neutral outcomes (t(34) = 4.33, p < .001). The latter two conditions did not differ significantly (t(34) = 0.52, p = .61). Moreover, there was a significant two-way interaction of Instantaneous Value and Long-term Value (F(2, 68) = 4.21, p = .027, ε = .82, ηp2 = .11). Analyses of simple effects revealed that only for task-neutral outcomes switching rates were significantly higher following spiders (mean = 0.52) compared with bunnies (mean = 0.40, t(34) = 2.52, p = .017). The comparison between spiders and bunnies was not significant for task-supportive (t(34) = 0.10, p = .92) and task-unsupportive outcomes (t(34) = 0.88, p = .39).
Mean FRN values across electrode Fz and FCz are shown in Figure 2. Outcome-locked ERP waveforms at electrode Fz are presented in Figure 4. FRN amplitude was generally more negative at Fz compared with FCz (F(1, 34) = 4.89, p < .034, ηp2 = .13). The main effect of Instantaneous Value was not significant (F(1, 34) = 1.65, p = .21). As in Experiment 1, we observed a significant main effect of Long-term Value (F(2, 68) = 8.16, p = .001, ηp2 = .19), which was further qualified by a significant interaction of Long-term Value × Instantaneous Value (F(2, 68) = 3.69, p = .04, ηp2 = .10). Analyses of simple effects revealed that for instantaneous positive outcomes (i.e., bunnies) there was a significant effect of Long-term Value (F(2, 68) = 14.78, p < .001, ε = .96, ηp2 = .30), whereas this effect was not significant for instantaneous negative outcomes (i.e., spiders; F(2, 68) = 0.51, p = .58, ε = .87). Pairwise comparisons showed that FRN amplitude was substantially reduced when bunnies were task-supportive compared with task-unsupportive (t(34) = 4.42, p < .001) and task-neutral (t(34) = 4.58, p < .001). The latter two conditions did not differ significantly (t(34) = 0.14, p = .89).
As in Experiment 1, we calculated correlations between FRN and switching rates (absolute scores and difference scores), using data from the standard block and the reversed block. Again, all correlations were small (−.24 to .27) and statistically not significant (all ps > .11).
Discussion Experiment 2
In Experiment 2, we replicated the results of Experiment 1, showing that amplitude in the FRN time range does neither reflect the isolated instantaneous outcome value nor the mere long-term outcome value. Rather, the FRN amplitudes were subject to a more complex interaction of both outcome aspects. As in Experiment 1, we only observed a modulatory influence of long-term task relevance on the FRN response to instantaneously positive outcomes (i.e., bunnies). For the FRN response to negative outcomes, however, long-term goals appear to be more or less irrelevant, resulting in virtually identical deflections in the FRN time range. Importantly and in contrast to Experiment 1, the results for the explicit ratings did not reveal strong influences of task instructions on emotional valence of the outcome stimuli. This generally supports our notion that the animal pictures used in Experiment 2 possess a more inherent emotional value than the more abstract symbols used in Experiment 1. As the FRN results in both experiments were very similar, it is rather unlikely that the results in the first experiment were driven by a volitional reformulation of instantaneous outcome values (e.g., the reformulation of a win as something bad in the reversed block).
The obvious similarity of the electrophysiological results in both experiments indicates that the observed interaction effect of instantaneous and long-term values is not restricted to a particular kind of outcomes. Instead, in the light of the vastly different outcome stimuli, the strong resemblance between the FRN patterns in Experiments 1 and 2 clearly point to a generic principle of FRN generation and its function in action evaluation. What, then, is this principle? Overall, the observed pattern is consistent with the idea that the FRN effect (i.e., the difference between negative and positive outcomes) results from a Rew-P in response to favorable action outcomes rather than a negativity in response to unfavorable outcomes (Holroyd et al., 2008; also see, Frömer et al., 2016; Gibbons et al., 2016; Sambrook & Goslin, 2016; Proudfit, 2015; Varona-Moya, Moris, & Luque, 2015; Becker et al., 2014; Kujawa et al., 2013; Kreussel et al., 2012; Luque et al., 2012; Foti et al., 2011; Holroyd et al., 2011; Hewig et al., 2007, 2010; Eppinger et al., 2008; Cohen et al., 2007; Potts et al., 2006). According to this assumption, the negative-going component in the FRN time range is a default N2 response of the brain, which is suppressed by a Rew-P when the outcome of an action is positive. Importantly, our findings indicate that, for this Rew-P to occur, an instantaneous positive value of an action outcome alone is not sufficient. Apparently, such instantaneously positive outcome must also possess a positive long-term value, that is, it must help the organism to reach more extended task goals. Conversely, a positive long-term value of an outcome per se is obviously also not sufficient to evoke the Rew-P because, for instantaneously negative outcomes, amplitudes in the FRN time window were insensitive to long-term values in both experiments. Thus, the Rew-P seems to be evoked only by action outcomes that are positive both in an instant and in a long-term, task-related fashion.
As we have outlined in the Introduction, it has recently been assumed that the pMCC selects extended behavioral sequences and maintains the system on-task. Especially for the latter, it may continuously integrate information about task progress, which it receives from a critique module (Holroyd & Yeung, 2012). In addition, there is already first evidence that amplitude differences in the FRN time range reflect the signaling of such evaluative information in the pMCC (Ribas-Fernandes et al., 2011). Our findings add to this literature by specifying the manner of outcome distinction in this process. In particular, it seems as if outcomes that possess both an instantaneous positive value and a long-term positive (i.e., task-supportive) value are distinguished from all other types of action outcomes in a given environment. Given that the pMCC is also crucially involved in hierarchical reinforcement learning (cf. Holroyd & McClure, 2015; Holroyd & Yeung, 2012), such distinction would be highly functional. In particular, it could foster the learning of an optimal behavioral strategy, that is, a sequence of actions each of which previously had a direct instant positive consequence and also brought the organism closer to accomplishing the task as a whole. Moreover, several prior studies have suggested that negative-going amplitude proportions in the FRN time window are related to subsequent adjustments in behavior (Sallet, Camille, & Procyk, 2013; Van der Helden, Boksem, & Blom, 2010; Cohen & Ranganath, 2007; Hewig et al., 2007; Holroyd & Krigolson, 2007; Yeung & Sanfey, 2004; also see Yasuda, Sato, Miyawaki, Kumano, & Kuboki, 2004). As the negative-going component of the FRN (i.e., the feedback-locked N2) appears to be the default response to an action outcome (cf. Holroyd et al., 2008), this could implicate that the pMCC initiates adjustments in task-oriented behavior in a default manner until an outcome occurs that has an instantaneous as well as a long-term positive value. In other words, the pMCC may maintain the system in an exploration state as long as behavioral outcomes in a given task setting are suboptimal and switches the system to an exploitation state when an optimal behavioral strategy has been found. At first glance, the absence of any substantive relation between amplitudes in the FRN time range and behavioral adaptations (i.e., postoutcome switching) in this study appears to be inconsistent with such functioning of the pMCC. However, it should also be noted that, in our tasks, all outcomes were presented in a pseudorandom order and therefore no learning of an adaptive, optimal strategy was possible. This could have obscured the interrelation between the FRN effect/Rew-P and postoutcome adjustments in behavior. Future studies may therefore further investigate the interrelation between the FRN effect/Rew-P and the exploration–exploitation dimension of behavior by considering both instantaneous and long-term values of action outcomes in a task where adaptive behavioral strategies can be learned.
As a potential limitation of our study, the manipulation of long-term compared with instantaneous outcome value might be considered as rather weak, especially in Experiment 1. In particular, the instantaneous value of an outcome in our first experiment might have had a higher personal relevance (i.e., losing or wining some money) than the more abstract prospect of losing or winning the game, which had no monetary consequences. This could have reduced the influence of long-term values especially for the instantaneous negative outcomes. However, it should also be noted that such differences in personal relevance between outcome aspects are rather unlikely for Experiment 2. In addition, the explicit outcome ratings in Experiment 1 clearly indicate that participants generally followed the instruction to achieve the long-term task goal, and this was also the case when task-supportive outcomes had an instantaneous negative value. Thus, it seems rather unlikely that our findings are mainly driven by differences in personal relevance between instantaneous and long-term values. Nevertheless, in future studies on this issue, efforts should be made to control for potential effects of personal relevance, for instance by including more rewarding and/or punishing long-term outcomes (e.g., winning or losing extra money when completing or failing the task).
In summary, our findings support the notion that amplitude variance in the FRN time range is mainly driven by a Rew-P. To the best of our knowledge, this is the first study to show that this Rew-P is only evoked by outcomes that possess an instantaneous positive value and also fit with the individual's long-term goals. Hence, our results add to recent literature according to which the pMCC, as a likely generator of the Rew-P, acts as a hierarchical reinforcement learning system, which selects and maintains extended behavioral sequences to achieve certain goals (e.g., Holroyd & McClure, 2015; Holroyd & Yeung, 2012). In particular, our study suggests that this system may integrate instantaneous and long-term action–outcome values. This, in turn, might be a crucial process in learning the optimal behavioral strategy in a given environmental setting, that is, one that leads to instant as well as long-term rewards. Future research may further investigate this topic by considering instantaneous and long-term outcome values in an orthogonal design that also allows the learning of optimal and suboptimal behavioral strategies.
Reprint requests should be sent to Roman Osinsky, Institute of Psychology, University Osnabrück, Seminarstr. 20, 49074 Osnabrück, Germany, or via e-mail: email@example.com.
Higher objective gains than losses were used as the subjective total value of losses is typically larger than the subjective value of gains (Tversky & Kahneman, 1992).