Frontal oscillatory dynamics in the theta (4–8 Hz) and beta (20–30 Hz) frequency bands have been implicated in cognitive control processes. Here we investigated the changes in coordinated activity within and between frontal brain areas during feedback-based response learning. In a time estimation task, participants learned to press a button after specific, randomly selected time intervals (300–2000 msec) using the feedback after each button press (correct, too fast, too slow). Consistent with previous findings, theta-band activity over medial frontal scalp sites (presumably reflecting medial frontal cortex activity) was stronger after negative feedback, whereas beta-band activity was stronger after positive feedback. Theta-band power predicted learning only after negative feedback, and beta-band power predicted learning after positive and negative feedback. Furthermore, negative feedback increased theta-band intersite phase synchrony (a millisecond resolution measure of functional connectivity) among right lateral prefrontal, medial frontal, and sensorimotor sites. These results demonstrate the importance of frontal theta- and beta-band oscillations and intersite communication in the realization of reinforcement learning.
Medial frontal cortex (MFC) monitors ongoing actions and their outcomes to adjust behavior adaptively (Ridderinkhof, Ullsperger, Crone, & Nieuwenhuis, 2004). More specifically, MFC activity has been related to conflict monitoring (Botvinick, Cohen, & Carter, 2004), cost–benefit analyses (Rushworth, Behrens, Rudebeck, & Walton, 2007; Kennerley, Walton, Behrens, Buckley, & Rushworth, 2006), and the evaluation of outcome history and modification of action values and selection (Nieuwenhuis, Holroyd, Mol, & Coles, 2004; Gehring & Willoughby, 2002; Holroyd & Coles, 2002). Although often considered separate and conflicting, these different functions may be part of a general action optimization learning function (Botvinick, 2007). When an action or outcome is suboptimal and MFC signals a need for adjustment, this also appears to lead to an increase in cognitive control, possibly via the additional recruitment of lateral pFC (Kerns, 2006; Kerns et al., 2004; Ridderinkhof et al., 2004). Lateral pFC is assumed to adjust higher-level decision-making strategies to changing contexts and demands and to integrate information over time (Lee & Seo, 2007; Tanaka et al., 2006; McClure, Laibson, Loewenstein, & Cohen, 2004).
Although the general knowledge about the roles of MFC and lateral pFC in action–outcome learning or feedback-based learning seems considerable, the question remains how these areas communicate the necessity and implementation of cognitive control and behavioral adjustment. Changes in oscillatory activity within brain regions and synchrony of oscillations between brain regions may be important measures of underlying mechanisms (Akam & Kullmann, 2010; Wang, Spencer, Fellous, & Sejnowski, 2010; Gregoriou, Gotts, Zhou, & Desimone, 2009; Womelsdorf et al., 2007; Fries, 2005). Several characteristics of brain oscillations constitute differing and valuable sources of information about the dynamics within and between brain areas: (1) Oscillatory power (amplitude) represents the activation magnitude in a brain area, (2) intertrial phase coherence signifies the timing of oscillations within a brain area over trials, and (3) intersite phase synchrony is the similarity in timing of oscillations in different brain areas.
Studies into the role of brain oscillations in conflict and reward situations demonstrate the relevance of oscillations in the theta band (4–8 Hz) for processing errors and losses. MFC theta power and intertrial phase coherence increase more after errors or negative performance feedback than after successful trials or positive feedback (Cohen, 2011; Cavanagh, Frank, Klein, & Allen, 2010; Cavanagh, Cohen, & Allen, 2009; Christie & Tata, 2009; Marco-Pallares et al., 2008, 2009; Cohen, Ridderinkhof, Haupt, Elger, & Fell, 2008; Cohen, Elger, & Ranganath, 2007; Trujillo & Allen, 2007; Luu, Tucker, & Makeig, 2004; Luu & Tucker, 2001). Lateral pFC theta power also increases after errors compared with successful trials (Cavanagh et al., 2009, 2010; Luu et al., 2004). Moreover, theta-band intersite phase synchrony between MFC and lateral pFC increases after errors (Cavanagh et al., 2009, 2010), suggesting increased communication between these areas. Frontal oscillatory theta-band activity also predicts posterror slowing and postcorrect speeding (Cavanagh et al., 2009, 2010) and model-derived prediction errors (Cavanagh et al., 2010). Thus, when feedback implies an unfavorable outcome and a possible need to adjust behavior, theta-band oscillations in the frontal network appear to be involved in communicating that message and implementing both task-specific and task-general adjustments.
Whereas theta-band oscillations are involved in processing negative feedback, positive feedback can induce an increase in higher beta or lower gamma power (20–30 Hz; Marco-Pallares et al., 2008, 2009; Cohen et al., 2007). This beta-band activity is often but not always observed, whereas the theta-band oscillations elicited by negative feedback are more consistently observed. The increase in beta-band activity appears to be induced specifically when positive feedback contains valuable information for the subject, for example, a monetary win (Marco-Pallares et al., 2008, 2009) or a correct response in a learning situation (Cohen et al., 2007). This is in line with the recent suggestion by Engel and Fries that beta-band oscillations signal the tendency to maintain “the status quo” (Engel & Fries, 2010), such that increases in beta-band activity promote the existing motor or cognitive set through endogenous top–down influence.
If MFC and pFC oscillations are indeed underlying action–outcome learning, they should be related not only to alterations of attention and predictions but also to subsequent behavioral changes. Only a small number of studies investigated the influence of activations in MFC and lateral pFC on next-trial performance (Chase, Swainson, Durham, Benham, & Cools, 2011; Cavanagh et al., 2010; Philiastides, Biele, Vavatzanidis, Kazzer, & Heekeren, 2010; Hester, Barre, Murphy, Silk, & Mattingley, 2008; Cohen & Ranganath, 2007; Yasuda, Sato, Miyawaki, Kumano, & Kuboki, 2004). Although most of these studies suggest that the level of MFC activity scales with the prediction error and predicts behavioral adjustment, a direct link between frontal oscillations and behavioral learning has not yet been reported.
Therefore, in the current study, we investigated how oscillatory dynamics in the theta and beta bands over MFC and lateral pFC (measured through surface EEG) were related to response learning and behavioral adjustment. Consistent with the observation that the level of MFC hemodynamic activity after errors is larger when the next trial is successful compared with when the next trial is another error (Hester et al., 2008), we hypothesized that larger increases in both MFC theta-band power and intertrial phase coherence after errors would also predict learning. Moreover, we expected larger increases in lateral pFC theta-band power and intertrial phase coherence and in theta-band oscillatory intersite phase synchrony between MFC and lateral pFC after errors as an implementation of cognitive control (Cavanagh et al., 2009, 2010). Because beta-band power was hypothesized to signal the importance of the continuation of the current situation, we expected increases in beta power to predict learning from positive feedback.
Results from EEG and MEG source-imaging studies suggest that the theta-band oscillations measured at medial frontal scalp sites might originate from MFC (i.e., anterior cingulate or supplemental motor area; Cohen, 2011; Keil, Weisz, Paul-Jordanov, & Wienbruch, 2010; Christie & Tata, 2009; Luu et al., 2004; Luu & Tucker, 2001). This is confirmed by intracranial measurements from the human MFC (Cohen et al., 2008; Wang, Ulbert, Schomer, Marinkovic, & Halgren, 2005).
In the current study, participants performed a time estimation task in which they had to use feedback to learn the correct time delay after which to press a button. This enabled us to study learning between consecutive trials. Besides MFC and lateral pFC, we added the contralateral cortical motor system as an area of interest because it implements the selected action and may be involved in processing feedback signals (Cohen & Ranganath, 2007). Motor control over the hand muscles induces changes in beta- and gamma-band oscillations (15–80 Hz) within the sensorimotor system and in cortico-muscular coupling (Schoffelen, Oostenveld, & Fries, 2005; Kilner, Baker, Salenius, Hari, & Lemon, 2000; Mima & Hallett, 1999). Cortico-muscular coupling with frontal brain areas varies with the required kind of control (Babiloni et al., 2008). Therefore, we explored whether cortico-muscular intersite phase synchrony might also reflect feedback learning. For comparability with previous studies, the feedback-related negativity (FRN) was also analyzed, an ERP elicited by valenced performance feedback (Holroyd & Coles, 2002; Miltner, Braun, & Coles, 1997).
Twenty-four right-handed first-year psychology students at the University of Amsterdam (11 men; mean age = 21.7 years, range = 18–29 years) participated in this experiment for course credits. They gave informed consent before participation. All procedures were executed in compliance with relevant laws and institutional guidelines and approved by the local ethics committee.
Participants performed a reinforcement learning task (Figure 1A), in which they had to learn when to press a button with the right thumb. The computer randomly selected a target RT between 300 and 2000 msec. Participants were instructed to learn to press the button at this target time by trial-and-error using feedback. A dot was presented until the response was made (maximum 2500 msec). Then 500 msec after the button press, feedback was presented for 1000 msec. A varying intertrial interval of 700–1200 msec separated trials. Responses in a 400 msec window surrounding the target time (±200 msec) were taken as correct. When a new target time was selected, this window decreased by 5% if participants were correct more than 70% of the last 15 trials and increased by 5% if they were incorrect more than 70% of the last 15 trials.
Feedback consisted of a smiley face (correct), an upward arrow (speed up), or a downward arrow (slow down), and the number of points gained or lost. On correct trials, participants were rewarded two points. This amount increased one point per consecutive correct trial, with a maximum of six points. On incorrect trials participants lost one point. If participants did not respond while the dot was on screen, a message was presented that no response was detected.
Two counters determined when a new target time was selected. A “success” counter increased on every correct trial and decreased on every incorrect trial but never went below zero. An “error” counter increased on incorrect trials if the “success” counter was at zero and decreased on correct trials if its value was larger than two. If one of the counters reached five, a new target time was selected. Effectively, this meant that participants had to be either correct or incorrect at least five times for a new target time to be selected. During the task subjects were not notified when the target time changed. New target times were selected at random with the exception that consecutive target windows could not overlap. We refer to a series of trials with the same target time as a “block.” We categorized blocks as learned or non-learned depending on whether the “success” or “error” counter reached five.
The task consisted of 600 trials. Figure 1B displays behavior from one subject on the first 100 trials to illustrate the dynamics of the task. Self-paced rest breaks were given between target times and at least 50 trials apart. Participants performed 50 trials of training. The entire experiment, including time to set up, lasted approximately 2 hr.
Behavioral Analyses and Experimental Conditions
The last trial of each block was excluded because feedback on the following trial was based on a new target time. Differences in average RT adjustments after each possible feedback (positive, speed up or slow down) were tested with a one-way ANOVA using Statistical Package for the Social Sciences for Windows (Version 15.0; SPSS, Inc., Chicago, IL) software. To confirm that learning only took place in the blocks that were classified as learned blocks, for each block we separated the trials into four bins according to the order of presentation (first quarter of trials in first bin, second quarter of trials in second bin, etc.). For each bin, we computed the difference between the average RT of the trials in the bin and the target window (zero if the RT was within the window). The average difference scores per bin were subjected to a 4 (Bin: 1–4) × 2 (Learning: yes, no) repeated measures ANOVA. Because we expected participants to need some time (and thus, trials) to learn a new target time, we predicted that the difference between average RTs and target window in the learned blocks but not in the nonlearned blocks would decrease over time (bins). Only trials from learned blocks were included in ERP and oscillation analyses. These trials were separated into four conditions: incorrect trials followed by incorrect trials, incorrect trials followed by correct trials, correct trials followed by incorrect trials, and correct trials followed by correct trials.
Signal Recording and Processing
Electrophysiological data were recorded with a sampling rate of 2048 Hz from 64 scalp electrodes (EEG), 2 ocular electrodes (VEOG), 2 electrodes on the right abductor pollicis brevis (EMG), and 2 reference electrodes on the earlobes, using a BioSemi Active Two system. Data were preprocessed with the EEGLAB toolbox (Delorme & Makeig, 2004) in Matlab (The MathWorks, Natick, MA). Data were downsampled to 512 Hz, and a 1.0-Hz high-pass filter was applied. All data were re-referenced to the average of the earlobe electrodes. Cue-locked epochs of −1500 msec precue until 4200 msec postcue were extracted (epoch end at least 1200 msec after feedback onset). Baseline correction was applied by aligning the time series to the average amplitude in a −400–200 msec precue interval. Trials with movement or other artifacts in the EEG signal were manually removed. After independent component analysis, blink and noise components were manually selected and removed. The resulting data were converted to current source density (CSD;Kayser & Tenke, 2006) to increase spatial selectivity and minimize volume conduction. CSD acts as a spatial high-pass filter, significantly improves topographical localization of surface EEG (Srinivasan, Winter, Ding, & Nunez, 2007), and is appropriate for intersite phase synchronization analyses (Cavanagh et al., 2009). Nonetheless, it does not solve the inverse problem or allow us to unambiguously determine the origin of the topographical dynamics. After exclusion of nonlearned blocks and trials with blinks or artifacts, the number of trials (SD) per condition was 68.38 (19.24) for incorrect followed by incorrect trials, 82.04 (14.87) for incorrect followed by correct trials, 54.88 (10.08) for correct followed by incorrect trials, and 148.08 (32.78) for correct followed by correct trials. The smallest number of trials in any condition in any subject was 27.
To investigate the FRN, the data per epoch were response-locked (this also means feedback-locking, because the response-feedback delay was a fixed 500 msec). ERP values per condition from channel FCz were averaged over a 200–300 msec postfeedback window and entered into a 2 (Current success: correct, incorrect) × 2 (Next-trial success: correct, incorrect) repeated measures ANOVA. The time window selection was based on existing literature (Holroyd & Coles, 2002; Miltner et al., 1997) and visual inspection of FCz ERPs. Greenhouse–Geisser corrections were applied.
To assess the role of theta-band oscillations in learning, statistical analyses were performed on theta-band power (activation magnitude) and intertrial phase coherence (activation timing) at medial frontal (FCz), left lateral prefrontal (F5), and right lateral prefrontal (F6) electrodes. For both measures, we averaged the signal per condition in the theta band (4–8 Hz). Participant- and condition-specific peaks were detected in a 100–500 msec postfeedback window. This peak-finding procedure allows for individual-specific patterns of brain activity. Average peak values from a 100-msec window (±50 msec from the peak), and peak latencies were entered into 2 (Current success) × 2 (Next-trial success) repeated measures ANOVAs.
On the basis of visual inspection of the time–frequency plots of FCz, we selected a band of 18–24 Hz to assess the role of beta power. Because sharp peaks were not observed in this frequency band, based on visual inspection of average beta power we entered the averaged activity at 0–300 and 300–800 msec postfeedback windows into ANOVAs to investigate both immediate and sustained effects.
Because all responses were made with the right hand and topographical ERP maps (Figure 2) demonstrated a relatively posterior peak during the response, we selected electrode CP3 to represent response-related sensorimotor activity (for analyses with C3, see supplementary material). Theta-band intersite phase synchrony in a 100–500 msec postfeedback window was investigated between medial frontal and left lateral prefrontal (FCz–F5), medial frontal and right lateral prefrontal (FCz–F6), medial frontal and sensorimotor (FCz–CP3), left lateral prefrontal and sensorimotor (F5–CP3), and right lateral prefrontal and sensorimotor electrodes (F6–CP3). Intersite phase synchrony between CP3 and muscle activity was examined in a −100–300 msec periresponse window. Peaks were detected and statistics performed as described for theta power and intertrial phase coherence.
Behavioral Results: Learning
Postfeedback RT adjustments differed significantly with feedback content (F(1, 23) = 832.24, p < .001): A downward arrow led to an average RT increase of 360 msec (SD = 68), an upward arrow led to a decrease of 427 msec (SD = 83), and after positive feedback, there was a minor although significant increase of 22 msec (SD = 25; t(23) = 4.40, p < .001). Of the average of 55.9 blocks (SD = 6.1), more than half (M = 35.3, SD = 6.1) were learned blocks. The difference-score between RTs and target windows of the blocks (Figure 1C) demonstrated a main effect of Learning (F(1, 23) = 18.68, p < .001), a main effect of Bin (F(1, 23) = 586.97, p < .001), and an interaction between Learning and Bin (F(1, 23) = 178.58, p < .001). Whereas in the nonlearned blocks improvement was limited to the first bins and the target window was not learned consistently, in the learned blocks the difference between RTs and target windows approached zero over time (and thus, bins) and learning was achieved.
Topographical CSD maps and condition-specific line plots (Figure 3) confirmed the presence of an FRN between 200 and 300 msec postfeedback, signifying an effect of Current success (FCz: F(1, 23) = 78.29, p < .001). There were no significant effect of Next-trial success (F(1, 23) = 1.73, p = .200) and no significant interaction effect (F(1, 23) = 0.09, p = .772) in this window. In a later time window of 300–500 msec postfeedback, there remained a significant main effect of Current success (F(1, 23) = 88.36, p < .001), although the direction of the effect reversed. In the later time window no effect of Next-trial success (F(1, 23) = 2.13, p = .158) or interaction effect (F(1, 23) = 2.49, p = .129) were found. Thus, time domain average ERPs did not predict learning.
Oscillations at Medial Frontal Scalp Sites
Learning from Negative Feedback and Activation Magnitude: Theta Power
FCz theta power increased following feedback in all conditions, particularly between 200 and 500 msec (Figure 4). This increase was larger for incorrect than correct trials (F(1, 23) = 39.34, p < .001; Figure 4A) and was larger for next-correct than next-incorrect trials (F(1, 23) = 5.53, p = .028; Figure 4C). A significant interaction effect indicated that the power increase predicted learning only on current incorrect trials (F(1, 23) = 15.76, p = .001). Follow-up t tests confirmed that the increase in theta power predicted next-trial accuracy during current incorrect trials (t(23) = −5.20, p < .001) but not during current correct trials (t(23) = 1.38, p = .180). The latency of the peak of the increase in theta power also differentiated between current incorrect and correct trials: Power peaked earlier for correct than incorrect trials (F(1, 23) = 9.59, p = .005; Figure 4C). There was no effect of Next-trial success on peak latency (F(1, 23) = 0.34, p = .567) and no interaction effect (F(1, 23) = 3.16, p = .089). Thus, postfeedback FCz theta power is stronger and peaks later for current incorrect compared with correct trials and predicts learning from negative feedback.
Learning from Positive Feedback and Beta Power
FCz beta power 0–300 msec postfeedback was significantly larger for next-correct trials than next-incorrect trials (F(1, 23) = 8.00, p = .010; Figure 5). There was no influence of Current success in this time window (F(1, 23) = 0.08, p = .780) and no interaction effect (F(1, 23) = 0.02, p = .881). However, beta power 300–800 msec postfeedback demonstrated effects of both current- and next-trial success: Beta power was larger for current correct than incorrect trials (F(1, 23) = 13.18, p = .001) and larger for next-correct than next-incorrect trials (F(1, 23) = 7.19, p = .013). Again no interaction effect was found (F(1, 23) = 0.02, p = .887). Thus, both early and sustained postfeedback beta power were predictive of next-trial accuracy, but current success only affected the sustained increase.
Activation Timing: Theta Intertrial Phase Coherence
Theta-band intertrial phase coherence at FCz was observed after feedback (Figure 6) and was larger on incorrect than correct trials (F(1, 23) = 12.72, p < .001; Figure 6A). There was a main effect of Next-trial success such that intertrial phase coherence was larger for next-correct than next-incorrect trials (F(1, 23) = 13.42, p = .001; Figure 6C), but no interaction with Current success (F(1, 23) = 0.23, p = .640) as observed for power. The latency of theta intertrial phase coherence differed between current correct and incorrect trials as well: The increase peaked later for incorrect than correct trials (F(1, 23) = 25.67, p < .001). The latency did not predict next-trial behavior (F(1, 23) = 0.61, p = .445), and the interaction between current and next-trial success was only at trend level (F(1, 23) = 2.98, p = .098). Thus, theta intertrial phase coherence was larger and peaked later for current incorrect than correct trials, and a larger increase in intertrial phase coherence predicted learning.
In summary, FCz theta-band oscillations reflected current valence and predicted behavioral adjustments based on negative feedback. Beta-band oscillations were more pronounced after positive feedback and predicted learning as well.
Oscillations at Lateral Prefrontal Scalp Sites
Feedback Processing and Prefrontal Theta Power
Theta power at electrodes F5 and F6 increased after feedback (Figure 7). Theta power at F6 (right side of the head) was significantly larger for incorrect than correct trials (F(1, 23) = 6.09, p = .021; Figure 7A) but did not predict next-trial performance (Next trial: F(1, 23) = 1.63, p = .214; Current trial × Next trial: F(1, 23) = 0.70, p = .413). The latency of the increase in theta power at F6 did not demonstrate significant main effects of Current success (F(1, 23) = 0.77, p = .390) or Next trial success (F(1, 23) = 0.69, p = .417) and no interaction effect (F(1, 23) = 2.14, p = .157). Although the same effect of Current success seems visible in theta power at F5 (left side of the head; Figure 7B), it did not significantly differ across conditions (Current trial: F(1, 23) = 0.91, p = .350; Next trial: F(1, 23) = 0.06, p = .817; Current trial × Next trial: F(1, 23) = 0.00, p = .988) or latency (Current trial: F(1, 23) = 0.22, p = .641; Next trial: F(1, 23) = 0.13, p = .725; Current trial × Next trial: F(1, 23) = 0.05, p = .833).
Differences in Prefrontal Theta Intertrial Phase Coherence
Theta-band intertrial phase coherence was observed at F5 and F6 after feedback (Figure 8). Intertrial phase coherence at F5 predicted next-trial accuracy (F(1, 23) = 6.17, p = .021; Figure 8B) such that intertrial phase coherence was higher on next-correct compared with next-incorrect trials. Intertrial phase coherence at F5 did not reflect current success (F(1, 23) = 2.18, p = .153; Figure 8A) and the interaction between current and next-trial success was only marginally significant (F(1, 23) = 3.20, p = .087). No main effects of Current success (F(1, 23) = 0.93, p = .346) or Next-trial success (F(1, 23) = 0.30, p = .588) on the latency of intertrial phase coherence at F5 were observed. There was a significant interaction effect between Current and Next-trial success (F(1, 23) = 6.54, p = .018).
Intertrial phase coherence at F6 reflected current success: Intertrial phase coherence was larger on incorrect than correct trials (F(1, 23) = 7.76, p = .011; Figure 8C). There were no significant relation with Next-trial behavior (F(1, 23) = 0.00, p = .972; Figure 8D) and no interaction effect (F(1, 23) = 2.43, p = .132). The latency of intertrial phase coherence at F6 demonstrated a marginally significant effect of Next-trial success (F(1, 23) = 3.97, p = .058) with a later peak for next-correct than next-incorrect trials. The latency did not reflect Current accuracy (F(1, 23) = 0.30, p = .590), and there was no interaction effect (F(1, 23) = 0.08, p = .778).
In summary, theta-band oscillatory activity at electrode F6 only responded strongly to current valence. Oscillations in the same band at F5 seemed more related to next-trial success, but the effects in this area were less coherent.
Communication between Areas: Theta-band Intersite Phase Synchrony
Increasing Control after Errors: Intersite Phase Synchrony
Theta-band intersite phase synchrony increased significantly for current incorrect compared with correct trials between FCz and F6 (F(1, 23) = 4.75, p = .040; Figure 9A), between F6 and CP3 (F(1, 23) = 13.18, p = .001; Figure 9B), and between FCz and CP3, although the latter effect was only marginally significant (F(1, 23) = 3.08, p = .093; Figure 9C). No significant differences in intersite phase synchronization were observed between FCz and F5 (F(1, 23) = 1.48, p = .236) or between F5 and CP3 (F(1, 23) = 0.16, p = .694). None of the connections between brain areas predicted behavior on the next trial, and there were no effects on the latency of the intersite phase synchrony. This overall pattern of results was similar when using C3 (see supplementary material). Thus, a larger increase in oscillatory intersite phase synchrony after current incorrect trials was seen only in a network of medial frontal, right lateral prefrontal and left sensorimotor sites.
A single cortical generator projecting to multiple scalp sites could inflate intersite phase synchrony between separate electrodes. If the synchrony dynamics between separate electrodes were caused by a single generator, the absolute difference in phase angle between the signals measured at the separate locations should be either zero (electrodes measure same side of the dipole) or π (electrodes measure opposite sides of the dipole). Therefore, we also computed the absolute phase angle differences between FCz and F6, between FCz and CP3, and between F6 and CP3 at the time and frequency of the participant- and electrode pair-specific peaks in theta-band intersite phase synchrony in a 100–500 msec postfeedback window. t Tests confirmed that the phase angle difference between FCz and F6 was not zero (t(23) = 19.49, p < .001) or π (t(23) = −4.63, p < .001), the phase angle difference between FCz and CP3 was not zero (t(23) = 7.45, p < .001) or π (t(23) = −6.28, p < .001), and the phase angle difference between F6 and CP3 was not zero (t(23) = 11.60, p < .001) or π (t(23) = −6.64, p < .001).
Moreover, we also investigated the peak frequencies of power at the separate electrodes and intersite phase synchrony between the electrodes. Peak frequencies over time in the theta range of power at FCz, power at F6, and intersite phase synchrony between FCz and F6 are presented in Figure 10. Visual inspection clearly demonstrates that the peak frequency of the intersite phase synchrony between the electrodes differs from the peak frequencies of the oscillatory power at the separate electrodes. The fact that the phase angles deviate from zero and π, the differences in peak frequencies between the separate sites and intersite phase synchrony, and the differences in learning-related effects between the separate sites and intersite phase synchrony, together strongly suggest that intersite phase synchrony is the result of functional connectivity between separate brain areas rather than a single generator.
Response Adjustment: Intersite Phase Synchrony between Brain and Muscle
During the response, theta intersite phase synchrony between CP3 and the right thumb muscle increased, and was stronger for correct trials than incorrect trials (F(1, 23) = 19.77, p < .001; Figure 11). When using C3 this effect was numerically in the same direction though not statistically significant (see supplementary material). There was no influence of Next-trial accuracy on the intersite phase synchrony between CP3 and muscle (F(1, 23) = 1.20, p = .285) and no interaction effect (F(1, 23) = 0.11, p = .740).
In summary, theta-band intersite phase synchrony between medial frontal, right lateral prefrontal, and sensorimotor sites was larger after current incorrect trials, whereas intersite phase synchrony between sensorimotor sites and muscle was larger during current correct trials.
Low-level Processing of Feedback in Visual Cortex
Positive feedback (face) and negative feedback (arrow) pictures differed in their visual features. We therefore analyzed the effect of current feedback on theta power, theta intertrial phase coherence and early (peak) beta power over low-level visual areas (time windows and peak/averaging procedures as described for frontal areas, electrodes Oz, O1, O2, PO7, PO8). Theta-band power (F(1, 23) = 6.39, p = .019) and intertrial phase coherence (F(1, 23) = 9.63, p = .005) were only significantly larger after positive than negative feedback at electrode O2. However, the direction of these effects was contrary to the effects of feedback valence on theta-band oscillations over frontal scalp locations. Moreover, because feedback signals in different modalities induce similar medial frontal ERPs (Miltner et al., 1997), differences in low-level feedback features do not seem to affect higher-level processing of the feedback content. Therefore, we ignored the effects of feedback differences on low-level visual processing.
In the current study, we examined the role of medial frontal and lateral prefrontal theta (4–8 Hz) and beta (18–24 Hz) band oscillations in feedback learning. We demonstrated that MFC oscillations were systematically related to the success of postfeedback action adjustments. As hypothesized, theta- and beta-band oscillations over medial frontal scalp sites, respectively, reflected current negative and positive feedback valence. Theta-band oscillations predicted learning from negative feedback, whereas beta-band oscillations predicted learning from positive and negative feedback. Moreover, negative feedback induced an increase in theta-band oscillations in a larger network consisting of right lateral prefrontal, sensorimotor, and medial frontal sites, suggesting that more brain regions are recruited when behavioral adjustments are required (see Figure 12 for schematic overview of results).
Theta-band Oscillations in MFC
Our finding that theta-band activity over medial frontal sites increased more after negative than after positive feedback is in line with previous reports on theta-band oscillations in this area. MFC theta-band oscillations are larger after errors compared with correct responses in conflict tasks (Cohen, 2011; Cavanagh et al., 2009; Cohen et al., 2008; Trujillo & Allen, 2007; Luu et al., 2004) and reinforcement learning (Cavanagh et al., 2010) and after losses compared with wins in gambling tasks (Marco-Pallares et al., 2008, 2009; Cohen et al., 2007). Increases in theta-band oscillations have also been proposed as underlying mechanism of the error-related negativity (ERN; Trujillo & Allen, 2007; Luu et al., 2004), an ERP over MFC after an erroneous response that may share similar neurobiological features with the FRN (Holroyd & Coles, 2002). Thus, increases in theta-band activity seem to be an important mechanism for the signaling of undesirable outcomes and a possible need to adjust behavior.
In the current study, theta-band activity over medial frontal sites not only signaled whether an adjustment was needed but predicted the success of the behavioral adjustment as well. To our knowledge this is the first report of a direct relationship between medial frontal theta-band oscillations and learning success, providing a possible neural mechanism for the adjustment process. The likelihood of MFC being involved in the evaluation of outcome history and the modification of action values and selection has widely been recognized (Botvinick, 2007; Nieuwenhuis et al., 2004; Ridderinkhof et al., 2004; Gehring & Willoughby, 2002; Holroyd & Coles, 2002). The importance of MFC for action–outcome learning is supported by single-cell recordings in monkey MFC. For example, MFC neurons seem to represent both desired outcomes and the responses to acquire them (Luk & Wallis, 2009; Matsumoto, Suzuki, & Tanaka, 2003). Activity of MFC neurons also differentiates to-be-learned from already-learned responses (Quilodran, Rothe, & Procyk, 2008; Matsumoto, Matsumoto, Abe, & Tanaka, 2007). Finally, monkeys with MFC lesions are unable to take into account a longer history of actions and outcomes in reinforcement-guided action selection, despite intact immediate behavioral adaptation (Rudebeck et al., 2008; Kennerley et al., 2006).
Imaging studies in humans also support the notion that MFC is an important node in the reinforcement learning network. MFC activity represents the prediction error necessary to compute the appropriate adjustment (Chase et al., 2011; Jessup, Busemeyer, & Brown, 2010; Philiastides et al., 2010; Bellebaum & Daum, 2008; Holroyd & Coles, 2008; Behrens, Woolrich, Walton, & Rushworth, 2007; Cohen, 2007; Brown & Braver, 2005). Furthermore, the sizes of the ERN and FRN not only represent the prediction error but, in some cases, also predict subsequent behavioral change (Philiastides et al., 2010; Cohen & Ranganath, 2007; Yasuda et al., 2004). When feedback is used to correct an error, MFC hemodynamic activity is increased compared with when the error is not corrected (Hester et al., 2008), and individual differences in MFC activation after errors are correlated with learning success. Hence, studies in both monkeys and humans imply that MFC activity is directly relevant for the ability to use feedback for learning.
The only study that investigated the role of frontal theta-band oscillations in learning (Cavanagh et al., 2010) reported correlations of medial frontal theta-band oscillations with posterror slowing but not with behavioral accuracy. However, MFC activity has been related to response–outcome learning, whereas stimulus–outcome learning is commonly attributed to pFC or OFC (Rudebeck et al., 2008; Lee & Seo, 2007; Rushworth et al., 2007). In the current study, the response rather than the stimulus was informative and determined the outcome of the action. Moreover, the design of the current task allowed subjects to use the acquired information for immediate behavioral adjustment. Therefore, accuracy may have relied more on processes in MFC in the current task than in other (probabilistic) learning tasks such as the one used by Cavanagh and colleagues, where the stimulus contains reward information and learned stimulus–outcome associations have to be kept in mind for a longer period.
Influence of Theta-band Oscillations in Other Brain Areas and Motor Output
We also found larger increases in theta activity over lateral prefrontal scalp sites and in theta-band intersite phase synchrony between medial frontal, lateral prefrontal, and sensorimotor sites after negative compared with positive feedback. The increases in lateral prefrontal theta-band activity and in intersite phase synchrony between lateral prefrontal and medial frontal sites mirror the effects after errors that have been reported in a flankers task (Cavanagh et al., 2009) and probabilistic learning task (Cavanagh et al., 2010), and seem to signal the need to increase cognitive control. Because we found no learning-predictive differences in intersite phase synchronization, this synchrony may be more important for signaling that behavior must be adjusted, whereas the actual adjustment of behavior may be carried out by other mechanisms.
The increase in intersite phase synchrony between medial frontal and lateral prefrontal sites on the one hand and sensorimotor sites on the other hand has not been reported before. In the current paradigm the representation of the correct action needs to be established and updated to optimize behavior. Because the amount of intersite phase synchrony between medial frontal and sensorimotor sites did not predict learning whereas medial frontal theta activity did, we suggest that differences in theta oscillations in MFC (and maybe also subcortical areas) shape motor plans to improve selection on the next encounter. The increase in theta-band intersite phase synchrony with sensorimotor cortex may then be part of the communication of the adjusted action plan for the next trial (Cohen & Ranganath, 2007), but whether the adjustment is effective and successful depends critically on the input from MFC to the motor system.
Oscillatory communication between areas is more effective at specific phase relationships between areas when input arrives at an optimal phase of the receiving neurons (Gregoriou et al., 2009; Sauseng & Klimesch, 2008; Womelsdorf et al., 2007). Different patterns of MFC theta-band oscillations can represent different action–reward associations (Womelsdorf, Johnston, Vinck, & Everling, 2010) and theta-band intersite phase synchrony in networks of relevant brain areas changes flexibly with task demands (Mizuhara & Yamaguchi, 2007). The fact that not only medial frontal theta power but also intertrial phase coherence predicted learning suggests that coherence in the phase of the oscillations strengthens specific connections and determines the effectiveness of the output of MFC to other areas.
Intersite phase synchrony between sensorimotor sites and thumb muscles increased more during a correct than an incorrect response. Interestingly, this difference in intersite phase synchrony also resided in the theta frequency band rather than the beta or gamma band. The fact that the differences were found in the theta band corroborates the idea that oscillations in this specific frequency band increase effective communication via a chain of phase-dependent synchronies between successive brain areas from pFC to the motor system and finally to the muscle.
Beta-band Oscillations and Learning from Positive Feedback
Positive feedback induced a larger increase in beta-band oscillations than negative feedback. This is in accordance with the increase in beta-band oscillations after positive outcomes that has been reported in gambling tasks (Marco-Pallares et al., 2008, 2009) and probabilistic reinforcement learning (Cohen et al., 2007). Because beta-band oscillations were sensitive to both the magnitude and the probability of gain, Marco-Pallares and colleagues (2008) suggest that oscillations in the beta range may be a neural marker of reward that originates from ventromedial OFC rather than MFC. A possible role of beta-band oscillations might be the synchronization of neural populations over long distances to couple frontal and striatal structures involved in reward processing.
In the current study, beta-band activity was not only responsive to positive feedback but functioned as a learning signal as well. In line with the idea of beta-band oscillations signaling the tendency to maintain “the status quo” (Engel & Fries, 2010), the increase in beta-band oscillations after positive feedback in our study can function as a mechanism to strengthen the current response set in favor of other options, thereby influencing future behavior. This explanation seems even more plausible because the current task required subjects to make the same movement at the same time on the next trial, which may have engaged top–down endogenous control.
We found learning-related increases in both theta and beta frequency bands over medial frontal sites. We suggest that differences in theta-band oscillations are underlying learning from errors and guide the adjustment of motor plans and their communication to other brain areas such as the motor system. The increase in control via the synchronization of oscillations in a chain of brain areas to an underlying theta rhythm seems to continue even into the response muscle. Increases in beta-band oscillations appear to signal the necessity of the continuation of the status quo in reward and motor networks after positive feedback.
This study was funded by a Vici grant from the Netherlands Organization for Scientific Research to K. R. R. M. X C. was supported by a Human Frontiers Science Program Grant.
Reprint requests should be sent to Irene van de Vijver, Department of Psychology, University of Amsterdam, Roetersstraat 15, 1018 WB, Amsterdam, the Netherlands, or via e-mail: email@example.com.