The PFC plays a central role in our ability to learn arbitrary rules, such as “green means go.” Previous experiments from our laboratory have used conditional association learning to show that slow, gradual changes in PFC neural activity mirror monkeys' slow acquisition of associations. These previous experiments required monkeys to repeatedly reverse the cue–saccade associations, an ability known to be PFC-dependent. We aimed to test whether the relationship between PFC neural activity and behavior was due to the reversal requirement, so monkeys were trained to learn several new conditional cue–saccade associations without reversing them. Learning-related changes in PFC activity now appeared earlier and more suddenly in correspondence with similar changes in behavioral improvement. This suggests that learning of conditional associations is linked to PFC activity regardless of whether reversals are required. However, when previous learning does not need to be suppressed, PFC acquires associations more rapidly.
The PFC is the brain area most central to higher order cognition and implicated in neuropsychiatric disorders (for reviews, see Stuss & Knight, 2002; Miller & Cohen, 2001). The foundation of this capacity is PFC's ability to direct the achievement of future goals through the learning of rule-based behavior (Miller, Nieder, Freedman, & Wallis, 2003; Miller & Cohen, 2001; Wallis, Anderson, & Miller, 2001; White & Wise, 1999), such as conditional learning (Petrides, 1985a, 1985b, 1986, 1990). Our laboratory has used conditional association learning paradigms to examine PFC neural activity while arbitrary rules were being acquired (Pasupathy & Miller, 2005; Asaad, Rainer, & Miller, 1998). Monkeys learned to associate each of two cue stimuli with the appropriate behavioral responses (Figure 1A). These stimulus-response associations were then repeatedly reversed after learning. We found that the activity of individual PFC neurons reflected the stimuli, the responses, or their association and that neural activity in PFC showed slow acquisition of the cue–response associations that was in accordance with slow improvements in behavioral performance (Pasupathy & Miller, 2005; Asaad et al., 1998).
In this investigation, we tested whether the slow changes in PFC activity and its correspondence with behavioral improvements might have been due to the use of behavioral reversals. PFC seems particularly critical for the cognitive flexibility needed to deal with reversals. It is involved in suppressing unwanted or inappropriate actions (Donohue, Wendelken, & Bunge, 2008; Xue, Aron, & Poldrack, 2008; Aron & Poldrack, 2006; Aron, Robbins, & Poldrack, 2004; Kelly et al., 2004; Perret, 1974) and is highly engaged during reversal learning (Ghahremani, Monterosso, Jentsch, Bilder, & Poldrack, 2010; Kehagia, Murray, & Robbins, 2010; Xue, Ghahremani, & Poldrack, 2008; Remijnse, Nielen, Uylings, & Veltman, 2005; Cools, Clark, Owen, & Robbins, 2002; O'Doherty, Kringelbach, Rolls, Hornak, & Andrews, 2001). Indeed, a classic observation of neuropsychiatric patients with frontal lobe damage is their ability to learn a single arbitrary rule but difficulty in switching to a new rule (e.g., Demakis, 2003; Milner, 1964). Thus, it is possible that our prior observation of slow changes in PFC activity correlated with slow behavioral improvement was due to the strong dependence of the task on PFC because of the reversals. Hence, we tested PFC activity during conditional association learning without reversals.
Monkeys performed a conditional association learning task without reversals (Figure 1A). Each trial began with the presentation of a fixation spot. Once monkeys fixated for 800 msec, a cue image was presented for 500 msec. The cue was then removed, and the monkeys were required to maintain fixation through a 1000-msec delay period. The fixation light was then extinguished, and two targets appeared on the right and left. Monkeys were required to make a direct saccade to the target associated with the cue object to obtain a reward. Each cue object was uniquely associated with a given response direction (e.g., Cue A → Right, Cue B → Left). After performance reached criterion (at least 30 correct trials of each cue and ≥90% correct over the last 10 trials per cue), a new block of trials was started (Figure 1B). Upon entering each new block, the old images were discarded (never shown again), and two new cue images were presented, again each associated with either the right or the left target (e.g., C → Right, D → Left). Thus, there was no proactive interference between cues—learning the new cues was not affected by the previous cues because they were not reversed and not in conflict.
Data were collected from two macaque monkeys (Macaca mulatta) that were cared for in accordance with National Institutes of Health guidelines and the policies of the MIT Committee on Animal Care. Recording wells were positioned over the dorsolateral PFC (Areas 9 and 46) on the basis of images obtained from structural MRIs. Eye movements were recorded using an infrared eye tracking system (Iscan, Burlington, MA) at a sampling rate of 240 Hz. Neural recordings were made using individual, epoxy-coated tungsten electrodes (FHC Inc., Bowdoin, ME). Up to 16 of these electrodes were lowered through the dura each day using custom-built screw microdrives. Electrodes were either driven independently or in pairs. No prescreening of neurons took place. This resulted in an unbiased sample of dorsolateral PFC neurons rather than simply those neurons that may be task related. Waveforms were amplified, digitized, and then stored for off-line sorting. Principal components analysis was subsequently used to sort the waveforms into individual neurons (Offline Sorter; Plexon Inc., Dallas, TX). All well-isolated neurons were accepted for study as long as a minimum of four blocks were completed while the neuron was recorded.
All data analysis procedures were similar to a previous study from our laboratory using a similar conditional association learning paradigm except with reversals (Pasupathy & Miller, 2005). One-way ANOVAs were assessed over four time epochs throughout the trial. The “cue” epoch was analyzed from 100 to 600 msec after cue onset and represented the time when the cue image was present (adjusted for the visual delay to PFC). The “delay” epoch was analyzed for 900 msec starting 100 msec after cue offset and captured the period when no image was physically present on the screen, but the monkey was presumably remembering the cue and/or upcoming saccade direction associated with the cue image. The “saccade” epoch was analyzed for 300 msec centered on saccade onset. Finally, the “reward” epoch was 250 msec starting 50 msec after reward onset. These epochs were only used to select those neurons that showed “direction sensitivity,” that is, those neurons that showed a significant effect of saccade direction (p < .01). However, similar results were obtained when all neurons were used for analysis because the remaining analysis methods examined only the amount of neuronal variance accounted for by direction (see below).
Selection of neurons that were direction sensitive was based on the last 10 correct trials per association before changing blocks. Saccade direction selectivity strength (R2 for the direction factor) was first quantified as the variance for the direction factor (σdir2) divided by the total variance (σtot2). This computation was repeated for each neuron across time and across trials by a double sliding window: A 100-msec centered window was slid in 10-msec steps over time (along the x-axis), and an eight-trial window was slid in one-trial steps over the first 30 correct trials per cue per trial block (along the y-axis, resulting in 23 correct trials shown in each figure). Results were collapsed across cues and trial blocks.
Saccade direction selectivity was then quantified as the percentage of each neuron's variance explained by saccade direction (PEVdir). PEVdir for the neural population was computed by averaging the R2 values from each neuron and normalizing by subtracting the population mean R2 during the baseline (fixation) period and then dividing by the population maximum. To examine the time course of direction selectivity with learning, a half-maximum PEVdir was computed on the basis of the highest PEVdir found during any window (any time in any trial) for the neural population. These values determined the rise time of learning. If neural activity on a given trial was too low to reach the half-maximum, the maximum activity on that trial was used instead. Rise times were fit with sigmoid curves of the form: .
Both monkeys were familiar with the conditional association learning task (Figure 1A). On each new block of trials, two novel cue stimuli (never before seen) were used (Figure 1B). Monkeys had to learn by trial and error which of the two saccade alternatives (left or right) was associated with which of the two cues. Figure 2A shows the percentage of correct performance of the monkeys averaged across both cues on all recorded blocks. Thus, for example, Trial 3 on the y-axis refers to the average percentage of correct the third time the monkeys saw Cue A and the third time they saw Cue B. Behavioral performance at the beginning of a block was not significantly above chance levels (p = .99), but performance quickly jumped above chance (from about 50% to approximately 60%) by the second time the monkey saw each cue (p < .001) and then gradually improved as the monkeys learned the cue–response associations (Figure 2A). Figure 2B plots the average behavioral RT as a function of correct trials averaged for each cue and block. RTs decreased most sharply over the first four correct trials of each cue but then showed only gradual improvement as the block progressed (Figure 2B). Figure 2B plots correct trials only (RT on incorrect trials is not relevant because many of those trials reflected random guesses, not true choices). Thus, the sharp decrease in RT on fourth correct trial on Figure 2B corresponds to a later trial in the all trial plots in Figure 2A (on average Trial 7 of the all trial plot in Figure 2A). Thus, the decrease in RT is lagging behind the monkeys' improvement in the percent correct. This makes sense if the monkeys' RT was improving once they were confident they were making the correct choice.
We examined activity of neurons in the dorsolateral PFC during learning; 192 neurons were recorded across 21 sessions from two monkeys (96 from monkey P and 96 from monkey A). There was an average of 74 (SD = 20) correct trials per block (minimum = 60 based on criterion, see Methods) and monkeys completed between four and nine blocks per recording session (mean = 6 blocks). We identified PFC neurons that showed selectivity for the direction of the forthcoming saccade by performing a one-way ANOVA on the average activity across each of the four analysis epochs (cue, delay, response, and reward). This revealed that 37% (71/192) of all PFC neurons were saccade direction selective (i.e., showed a significant difference in activity between trials requiring right vs. left saccades) in at least one of the four epochs (all tests evaluated at p < .01). Previous studies from our laboratory using the identical task (but with reversals) have shown that PFC neurons show a learning-related increase in early trial activity (during and just after the cue) that reflected the forthcoming saccade direction (Pasupathy & Miller, 2005; Asaad et al., 1998). As in these previous studies, we assessed learning-related changes for the saccade direction-selective neurons by using the percentage of variance explained by saccade direction (PEVdir, see Methods) as the measure of neural information.
Figure 3A shows the average population saccade direction information as a function of time within a trial and as a function of correctly performed trials of each cue within a block (averaging across all cues and blocks). Thus, the y-axis in Figure 3A is in correspondence with the RT plot in Figure 2B. By contrast, a given (correct) trial on Figure 3A corresponds to a later trial in the performance level plot in Figure 2A, which uses all trials. Early in learning (correct Trials 1–4), there is little information about the saccade the monkey will (correctly) make at the end of the delay. However, starting at Trial 5, there is a relatively sudden increase in information about the forthcoming saccade direction that continues to gradually increase in strength with learning. As in previous studies, we quantified this by plotting the time at which the average amount of information about saccade direction reached its half maximum on each trial (black circles, Figure 3A and B). On correct Trials 1–4, saccade information was weak, and the half maximum was not reached until an average of 663 msec after cue onset. However, on correct Trial 5, the relatively sharp increase in PEVdir caused the half maximum to be reached sooner after cue onset, where it remained through the course of learning. Note that this corresponds with the decrease in monkeys' RT, which showed the sharpest decrease over correct Trials 1–4 and levels off by Trial 5. A sigmoid fit to the rise times again indicated two relatively stable phases of activity with a sharp increase after several learning trials (Figure 3B). Although this time to half maximum was eventually stable (starting with correct Trial 5) at an average of 261 msec after cue onset, the overall PEVdir continued strengthening as learning progressed. For example, the mean PEVdir from the last half of the cue period is plotted in Figure 3C. Note how there is a sharp increase between correct Trials 3 and 5, corresponding to when behavioral RT is sharply decreasing, followed by continued gradual improvement throughout learning. Thus, this sudden change in PFC activity corresponds well with this RT decrease. Indeed, this mean PEVdir was significantly correlated with the decrease in the monkeys' RTs (r = −.93, p < .001). Note, however, that the rapid increase in neural rise times (Figure 3B) and the mean PEVdir (Figure 3C) occur well after the largest jump in the monkeys' percentage of correct performance, which occurred on the second absolute, not correct, trial (Figure 2A). Thus, the sharp change in PFC neural activity is in better correspondence with the sharp change in monkeys' RT but not its earlier sharp jump above chance performance.
We recorded neural activity from the dorsolateral PFC while monkeys performed an association learning task that required them to repeatedly learn to associate new visual cues with a rightward or leftward saccade. PFC neurons showed a relatively strong increase in information about the forthcoming saccade after a few correct trials. This was similar to that found in the BG during association learning with reversals and in contrast to that found in PFC during reversals, which was much slower and more gradual; it only became relatively strong after 15 correct trials (Pasupathy & Miller, 2005). In previous work with reversals, behavior improved gradually in correspondence with the gradual changes in PFC activity. By contrast, in this study without behavioral reversals, a sharp increase in PFC saccade direction activity with learning occurred much earlier, after just 5 trials. However, PFC activity was still in correspondence to the monkeys' behavior (RTs), which also showed a quick jump in performance followed by gradual improvement.
Association learning with reversals means that during a new trial block, cue–response associations are switched. Therefore, the same cue stimuli are present but are associated with the opposite saccade direction. When monkeys perform this task, their performance drops to near 0% upon a reversal (entry of a new block), whereas the monkeys continue to perform the old associations and then jump back to chance before gradually improving (Pasupathy & Miller, 2005). This reversing of stimuli means that in addition to learning the new cue–response associations, monkeys must inhibit the old learned associations that are no longer correct. Therefore, with reversals there is “proactive interference” between the former associations (no longer relevant) and the new associations (currently relevant).
Inhibition of previously relevant behavior and resolution of conflict because of proactive interference is thought to be a major function of PFC (Burke, Takahashi, Correll, Leon Brown, & Schoenbaum, 2009; Aron, Behrens, Smith, Frank, & Poldrack, 2007; Badre & Wagner, 2004, 2005, 2006; Aron et al., 2004; Asahi, Okamoto, Okada, Yamawaki, & Yokota, 2004; Dalley, Theobald, Eagle, Passetti, & Robbins, 2002). Inhibitory control can be lost in neuropsychiatric patients or monkeys with frontal damage or dysfunction (Caycedo, Miller, Kramer, & Rascovsky, 2009; Krueger et al., 2009; Aron & Poldrack, 2005; Meunier, Bachevalier, & Mishkin, 1997; Iversen & Mishkin, 1970). Neural correlates of associative learning without reversals are seen in many brain areas (such as the medial-temporal lobe and BG) and damage to any of these areas can disrupt learning (Bedard & Sanes, 2009; Cohn, McAndrews, & Moscovitch, 2009; Braun et al., 2008; Finke et al., 2008; Aosaki et al., 1994; Murray, Gaffan, & Mishkin, 1993). This is in contrast to reversal learning, which is more dependent on PFC per se. Therefore, in this study, we tested whether simpler forms of associative learning (i.e., without reversals) were also reflected in PFC. We found that it was and still tightly linked to behavior and that they both showed more rapid improvement than with behavioral reversals.
The authors thank A. Pasupathy, E. G. Antzoulatos, S. L. Brincat, T. J. Buschman, J. R. Cromer, C. Diogo, K. MacCully, L. N. Nation, D. Ouellette, M. V. Puig, J. E. Roy, M. Siegel, and M. Wicherski. This work was supported by NINDS grant 5R29NS035145-04.
Reprint requests should be sent to Earl K. Miller, The Picower Institute for Learning and Memory, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA 02139, or via e-mail: firstname.lastname@example.org.
These authors contributed equally to this work.