Abstract

Offering reward during encoding typically leads to better memory [Adcock, R. A., Thangavel, A., Whitfield-Gabrieli, S.,Knutson, B., & Gabrieli, J. D. E. Reward-motivated learning: Mesolimbic activation precedes memory formation. Neuron, 50, 507–517, 2006]. Whether such memory benefit persists when tested in a different task context remains, however, largely understudied [Wimmer, G. E., & Buechel, C. Reactivation of reward-related patterns from single past episodes supports memory-based decision making. Journal of Neuroscience, 36, 2868–2880, 2016]. Here, we ask whether reward at encoding leads to a generalized advantage across learning episodes, a question of high importance for any everyday life applications, from education to patient rehabilitation. Although we confirmed that offering monetary reward increased responses in the ventral striatum and pleasantness judgments for pictures used as stimuli, this immediate beneficial effect of reward did not carry over to a subsequent and different picture–location association memory task during which no reward was delivered. If anything, a trend for impaired memory accuracy was observed for the initially high-rewarded pictures as compared to low-rewarded ones. In line with this trend in behavioral performance, fMRI activity in reward (i.e., ventral striatum) and in memory (i.e., hippocampus) circuits was reduced during the encoding of new associations using previously highly rewarded pictures (compared to low-reward pictures). These neural effects extended to new pictures from same, previously highly rewarded semantic category. Twenty-four hours later, delayed recall of associations involving originally highly rewarded items was accompanied by decreased functional connectivity between the hippocampus and two brain regions implicated in value-based learning, the ventral striatum and the ventromedial PFC. We conclude that acquired reward value elicits a downward value-adjustment signal in the human reward circuit when reactivated in a novel nonrewarded context, with a parallel disengagement of memory–reward (hippocampal–striatal) networks, likely to undermine new associative learning. Although reward is known to promote learning, here we show how it may subsequently hinder hippocampal and striatal responses during new associative memory formation.

INTRODUCTION

Reward is a powerful tool to guide learning. It is usually the case that immediate reward drives various forms of learning (Kringelbach & Berridge, 2016; Seitz, Kim, & Watanabe, 2009; Singer & Frank, 2009; Bouton, 2007), including improving relearning after forgetting (Miendlarzewska, Ciucci, Cannistraci, Bavelier, & Schwartz, 2018). These effects of reward on learning are mediated by interactions between reward and memory networks (Bartra, McGuire, & Kable, 2013; Jocham, Klein, & Ullsperger, 2011; Lisman & Grace, 2005). Reward may in turn become associated with specific contexts in which it was delivered (Loh et al., 2016; Rigoli, Friston, & Dolan, 2016; Palminteri, Khamassi, Joffily, & Coricelli, 2015; Nakahara, Itoh, Kawagoe, Takikawa, & Hikosaka, 2004). In particular, midbrain dopamine neurons can represent context-dependent prediction error (Nakahara et al., 2004), whereas the representation of value in the human ventral striatum and medial/orbital PFC integrates value-relevant information (Bartra et al., 2013) from memory (the hippocampus), current emotional state (the amygdala), and cognitive goals (PFC; Samanez-Larkin & Knutson, 2015; Haber & Knutson, 2010). Whether such context dependency of reward representation could impair, rather than promote, subsequent learning remains unclear (Wimmer, Braun, Daw, & Shohamy, 2014; Wimmer & Shohamy, 2012).

On the one hand, evoking the memory of a rewarding episode is usually associated with positive feelings and may restore a state of reward motivation. For example, effects of extrinsic incentives have been shown to “spill over” and lead to various forms of associative generalization (Miendlarzewska, Bavelier, & Schwartz, 2016), including increased response speed and vigor as in Pavlovian-to-instrumental transfer. This and related processes of memory reactivation have been documented to engage the hippocampus and the surrounding cortices in the medial temporal lobe together with the dopaminergic reward circuit, in particular the ventral striatum and the substantia nigra and ventral tegmental area (SN/VTA; Dudai, Karni, & Born, 2015; Cohen et al., 2014; Dudai, 2012; Wimmer & Shohamy, 2012). Retrieving a rewarding memory episode may also yield a preference for an object explicitly or implicitly associated with that episode (Hütter, Kutzner, & Fiedler, 2013; De Houwer, Thomas, & Baeyens, 2001). In addition, according to the Penumbra hypothesis (Lisman, Grace, & Duzel, 2011), the presence of dopamine at the hippocampal synapses, which may be triggered by recalling a rewarded memory, can enhance new memory formation (Atherton, Dupret, & Mellor, 2015; Thomas, 2015; Redondo & Morris, 2011; Wittmann et al., 2005). In this view, past reward should facilitate learning of new information.

On the other hand, presenting a stimulus for which reward was expected but is no longer offered may induce disappointment. Whereas such a shift in motivation may be particularly detrimental in an educational context, most studies reporting undermining effects of changing reward contexts or contingencies have not involved learning tasks (Ma, Jin, Meng, & Shen, 2014; Chib et al., 2012; Murayama, Matsumoto, Izuma, & Matsumoto, 2010). Similarly, when value-associated information appears in a novel context, participants tend to make economically suboptimal choices because of the lingering of the remembered value: When switched to a new context, participants' choices reflect option values with reference to the previously available alternative options and not in line with objective reward probabilities (Klein, Ullsperger, & Jocham, 2017).

Here, we used fMRI to probe the effects of reward conditioning on subsequent nonrewarded learning of object–location associations (a task known to engage the hippocampus; Bridge & Voss, 2014; Manelis, Reder, & Hanson, 2012; Takashima et al., 2009; Sommer, Rose, Gläscher, Wolbers, & Büchel, 2005). We hypothesized that initially rewarded stimuli trigger a reevaluation process when presented in a different task context in the absence of reward, which would hinder, rather than facilitate, the formation of new associations with the reward-related stimuli. We tested this hypothesis by examining the effects of reward conditioning on the early phase of a subsequent nonrewarded associative learning task, that is, at a time when reward was most likely to still exert lingering effects. We observed that, in such a new, nonrewarded task context after reward conditioning, the ventral striatum and hippocampus were relatively deactivated at encoding of associations with previously highly (vs. low-) rewarded stimuli, and that memory recall of those associations was poorer when tested 24 hr later. Because striatal regions display the characteristics of a prediction error that is used to update the relative value of options, a distinctive feature of this study was to assess whether such prediction error signaling would impair associative learning in a new context in which reward was no longer offered. Moreover, because memory retrieval typically triggers not only the reactivation of specific information about the stimulus but also about similar stimuli in memory (e.g., Horner, Bisby, Bush, Lin, & Burgess, 2015), we also tested whether reward-biased learning would affect stimuli that were semantically related to those previously paired with reward value.

In a two-step procedure, we first conditioned pictures from two distinct semantic categories with two levels of reward (high and low, respectively). In a subsequent object–location associative learning task, we tested how participants learned to associate these pictures to locations in the absence of reward. In this associative learning task, we also included a set of related but new pictures to simultaneously test whether any delayed effect of reward conditioning might transfer to nonconditioned but semantically related pictures.

METHODS

Participant Details

Twenty-five participants took part in the experiment. Data from five participants were not used because of technical problems during acquisition (n = 2), excessive motion (total displacement > 3 mm; n = 1), and noncompliance with the learning task (n = 2). Thus, data from 20 participants were included in the analyses (13 women; age: mean = 24.35 years, SD = 3.71, range = 19–34 years). None of the participants reported a history of neurological, psychiatric, or medical disorders or any current medical problems and had normal or corrected-to-normal visual acuity. In addition, all had normal-range scores in the French versions of Beck Depression Inventory (Beck, Steer, & Brown, 1996; group mean score = 5.5, SD = 4.3) and state anxiety measured by the State-Trait Anxiety Inventory (Spielberger, 1983; group mean score = 36.9, SD = 7.5). All participants were students of the University of Geneva recruited by advertisements and provided written informed consent for participation. The study protocol was approved by the Ethics Committee of the Geneva University Hospitals, which abides by Helsinki principles.

Method Details

Stimuli

Stimuli used were photographs obtained from an Internet search engine and belonging to two broad semantic categories: the sea and the savanna. The pictures were selected from a large picture dataset (n = 150) based on ratings performed by an independent group of 10 participants. Ratings were performed on five different 5-point Likert scales assessing emotional valence, arousal, familiarity, and also how interesting the content and visual composition of the pictures were.

The conditioning task used 80 pictures (40 from sea category and 40 from savanna category). Twenty of these pictures were used for a pleasantness task and 36 for the associative learning task (see below; while 24 were fillers to further strengthen category-specific conditioning). Another 36 “new” pictures (18 from sea, 18 from savanna categories) were not presented during conditioning but only used during the associative learning task, for testing transfer of conditioning effects. Thus, stimuli used in the associative learning task formed four separate lists of 18 photos (two lists for each semantic category), each containing the same number of exemplars from the following subcategories: single animal, multiple animals, vehicles, landscapes, human activity, and objects.

All pictures were high-resolution photographs scaled to 512 × 512 pixels. The mean luminosity was equalized to the overall mean using an in-house MATLAB script. Apparent contrast was calculated by dividing the standard deviation of luminance values by the mean luminance of each filtered picture. Luminance values were obtained using ImageJ (Schneider, Rasband, & Eliceiri, 2012). ANOVA performed on these metrics demonstrated that apparent contrasts did not differ significantly between the lists. Spatial frequencies of all pictures were analyzed in eight different bands using discrete wavelet transform (Delplanque, N'diaye, Scherer, & Grandjean, 2007). Using MANOVA, we determined that the prepared lists did not differ in terms of spatial frequencies. Pictures presented on a 1024 × 1280 screen were viewed by the participants in the mirror mounted on the head coil.

Experimental Design

The experiment consisted of two fMRI scanning sessions performed 24 hr apart. The first session was preceded by an instructional training session and comprised two successive tasks: a reward conditioning task followed by a nonrewarded object–location learning task (three cycles). The second session comprised a delayed recall test for locations of pictures learned on the previous day (Figure 1A) as well as an anatomical scan and a proton-density scan.

Figure 1. 

Experimental design. (A) The experiment comprised three tasks: reward conditioning followed by a nonrewarded associative learning task and delayed recall. The learning task was composed of three learning cycles, each with two encoding and recall runs for the picture–location associations for a total of 72 pictures (36 per run). Finally, 24 hr later, the participants performed a 24-hr delayed recall test for learned picture–location associations. (B) Example of a trial in reward conditioning. The task was to categorize the picture as semantically related to the sea or the savanna. For each trial, one category was consistently rewarded with 10 points (0.50 CHF, HR); and the other, with 1 point (0.05 CHF, LR) for a correct response. (C) Trials in the picture–location association learning task. In an encoding trial, a picture appeared in the middle of the screen and moved toward one of the six locations of the screen. The participants' task was to memorize the position for each picture. In a recall trial, the participants indicated the remembered location for the picture with one of six buttons on the response pads held in both hands. (D) On every trial of the delayed recall test, the participants were additionally asked to indicate their response confidence (0 = guessing, 4 = confident) and answer a source memory (temporal context) question (“Which run?” 1|2|I don't know).

Figure 1. 

Experimental design. (A) The experiment comprised three tasks: reward conditioning followed by a nonrewarded associative learning task and delayed recall. The learning task was composed of three learning cycles, each with two encoding and recall runs for the picture–location associations for a total of 72 pictures (36 per run). Finally, 24 hr later, the participants performed a 24-hr delayed recall test for learned picture–location associations. (B) Example of a trial in reward conditioning. The task was to categorize the picture as semantically related to the sea or the savanna. For each trial, one category was consistently rewarded with 10 points (0.50 CHF, HR); and the other, with 1 point (0.05 CHF, LR) for a correct response. (C) Trials in the picture–location association learning task. In an encoding trial, a picture appeared in the middle of the screen and moved toward one of the six locations of the screen. The participants' task was to memorize the position for each picture. In a recall trial, the participants indicated the remembered location for the picture with one of six buttons on the response pads held in both hands. (D) On every trial of the delayed recall test, the participants were additionally asked to indicate their response confidence (0 = guessing, 4 = confident) and answer a source memory (temporal context) question (“Which run?” 1|2|I don't know).

Reward conditioning task.

The first task was a reward conditioning procedure with 80 unique trials in which participants gained points for correctly assigning a picture to “sea” or “savanna” (40 trial-unique pictures per category) by a button press (Figure 1B). One picture category was associated with high potential reward (HR = 10 points), whereas the other yielded low reward (LR = 1 point). The assignment of one semantic category to a given reward level was counterbalanced across participants. Correspondence of the buttons (left/right and sea/savanna) stayed constant throughout the task for a given participant and was counterbalanced across participants and across reward-category assignment. Participants were informed that points will be converted into real money that would add to their monetary compensation (10 points = 0.5 CHF) and that the maximum amount that they could win in the task was 440 points = 22 CHF.

Each trial of the reward conditioning task began with a fixation cross, followed by a cue (a color photograph from either the sea or savanna semantic category) presented at the center of the screen for 1.5 sec (Figure 1B). After a variable time interval (mean = 2.5 sec, with min = 1.5 sec and max = 3.5 sec), a response screen appeared for 1.5 sec during which participants categorized the preceding picture by pressing, with their right hand, the left or right button to select one of the semantic categories written on the left or right part of the screen. Next, a feedback display was presented indicating whether the cue yielded an HR (10 points) or an LR (1 point). The HR feedback was a smiling piggy bank with animated golden coins falling into it; the LR was a sad-looking piggy bank with one silver coin falling into it. Participants were told that successful categorization of a picture in one of the categories (either sea or savanna) would always be associated with an HR whereas the other would always yield an LR, that no reward would be given for incorrect responses, and that this reward scheme would not change during the task. Participants collected points if they responded correctly while the response screen was on, but not if they responded incorrectly, or too early or too late, in which cases feedback was provided (“wrong button” or “too early”/”too late”). The conditioning task lasted about 18 min. No more than four trials of a same category appeared in a row. Intermediate feedback with accumulated points appeared four times during the task (after every 20 trials) with a message “You won xx points. Try to win some more!”. The final score converted into CHF appeared at the end of the task.

To obtain a behavioral index of conditioning strength, participants rated the pleasantness for a subset of 20 pictures (10 from each of the semantic categories) presented one at a time. They moved the cursor on a horizontal scale from “unpleasant” (left) to “pleasant” (right) using the button box. This measure was collected twice: before and after the conditioning task. The distance of the final placing of the cursor from the center of the screen was used as a dependent measure in the computation of a behavioral conditioning index for the HR and LR pictures separately (i.e., average rating after minus before conditioning). This part of the task was not scanned, and time for response was unlimited. These 20 pictures were not used in the subsequent associative learning task. Because of extreme pleasantness scores (>2.5 SDs below the group mean), data from one participant were excluded from the pleasantness analysis (but retained for all other data analyses).

Nonrewarded object–location associative learning task.

The second task consisted of three cycles of object–location association learning composed of subsequent runs of encoding and recall. Each cycle contained two such encoding–recall runs, each consisting of 36 trials for a total of 72 unique picture–location associations repeated in each cycle. Because our focus was on the portion of the task where the effects of conditioning were lingering, our analysis was performed on the first cycle of the task only.

There were four conditions: 18 pictures that have been HR conditioned in the first task (ConHR), 18 that were LR conditioned (ConLR), and 18 new pictures from each semantic category that formed the transfer conditions (transfer high-reward [TrHR] and transfer low-reward [TrLR]). The pictures were presented in a semi-randomized order within an encoding or recall run. In rapid event-related fMRI designs, activation of the reward system in the HR condition could spill over to the next low-rewarded trial. Therefore, to isolate and potentiate the likelihood of seeing a differential effect of HR and LR conditions on memory, HR and LR trials were blocked. Pictures from one condition were grouped into mini-blocks of nine consecutive trials, and mini-blocks were distributed within each cycle as follows: the order of pictures in a mini-block was randomized from cycle to cycle and between encoding and testing. Such a structure—single presentations of the individual stimuli in a cycle and randomization of the stimuli within a mini-block—allows to fully benefit from the advantages of blocked presentations (i.e., potentiating the differential effect of the levels of reward) while preventing possible adaptation effects over the course of a mini-block.

One cycle was composed of four fMRI runs (two for encoding and two for recall; Figure 1A), and each fMRI run contained one mini-block of each of the four conditions, for a total of 36 pictures per run. This temporal separation was necessary because it is difficult to memorize more than 36 associations.

Locations were assigned to the pictures such that, for a given condition, three pictures were associated with one location, evenly distributed across the two blocks. Care was taken that the subcategories (i.e., animals, human activity, vehicles) were shuffled across the possible screen locations so as to prevent the participants from inadvertently relying on location subcategory patterns. At each encoding/recall run, the order of pictures within the blocks changed. The condition order within the blocks (e.g., ConHR, TrLR, ConLR, TrHR) changed across the runs and cycles. However, at both immediate and delayed recall, blocks within a run appeared in the same order as during the corresponding encoding run. That way, we ensured that a similar amount of time had passed from encoding to recall of each association.

Participants' task was to observe and memorize the placement of each picture on the screen (one of six fixed positions) during the encoding phase and to indicate the remembered position with one of six response buttons during the recall phase (Figure 1C). Participants were explicitly asked to give a response on every trial, even if unsure. We suggested the participants form stories as an example of an encoding strategy (such as “people on the beach went North-East”); however, no formal control over strategies was applied. They were also told that the object–location assignment was random and that, although the order of trials differed between the cycles, each picture–location association was unique and remained the same throughout the task.

Participants were trained outside the scanner to learn to associate the screen locations (six dots) with buttons of the response boxes they held in both hands on a version of the task with a separate set of black-and-white drawings (not used in the main task). Once in the scanner, the same training version of the task was repeated to facilitate the visuomotor mapping of the picture locations onto the motor response in a supine position.

The tasks were programmed using Cogent toolbox (Cogent 2000, v.1.32, www.vislab.ucl.ac.uk/cogent_2000) implemented in MATLAB v7.9 (R2009b, The MathWorks, Inc.).

Object–location delayed recall.

Twenty-four hours after the first session, participants came back to the laboratory to perform a delayed memory test that lasted about 25 min (Figure 1A). Each trial required three successive responses: an object–location decision, a confidence rating for that response, and a source judgment (Figure 1D). For the source memory question, we used the temporal separation between the first and second runs of 36 pictures in the learning on the day before, to test the participants' memory of temporal context in which an association was presented. At delayed recall, all trials were shuffled randomly.

At the beginning of each trial, the picture was displayed at the center of screen with six white dots indicating possible locations. For the first object–location response, as during the first session, participants had 1 sec to deliberate and 2 sec to respond by selecting a location using one of the six buttons. The screen subsequently changed to display “How confident are you?” with four response options: “Confident = 3 | Rather certain = 2 | Somewhat sure = 1 | Guessing = 0.” Participants had 3 sec to respond using their right-hand button box. Finally, the screen changed to display the source memory question “Which run of the study phase?” with three response options: “First | Second | Don't know” for 5 sec (Figure 1D). The “don't know” source option was offered to reduce potential contamination by guessing on the source decision, as has been implemented in similar studies (e.g., Duarte, Henson, Knight, Emery, & Graham, 2010). Each trial lasted on average 11.5 sec. Confidence responses were used as covariate in the analyses of location response (see below Functional MRI data analysis section). Source memory responses are not reported in detail here. On average, about 2% of trials were excluded from analysis because of lack of timely response on the location question.

All tasks were conducted inside the MRI scanner with continuous MRI data acquisition throughout the tasks, except for the pleasantness rating, which was not scanned. Once in the MRI scanner, participants were given noise-dampening earplugs and headphones as well as four-button MRI-compatible response boxes (Current Designs Inc.) in their right and left hand.

At the end of the procedure, participants were debriefed and asked for their personal preference about the semantic categories and for the category associated with HR in the conditioning task. All 20 participants correctly remembered the HR category, and 14 reported having no general preference between the sea and the savanna.

Psychometric Questionnaires

After the termination of the experiment, participants filled out the Behavioral Inhibition and Activation Scale (Carver & White, 1994). The three Behavioral Activation Scale (BAS) subscales include items related to the pursuit of appetitive goals (BAS drive), the inclination to seek out new rewarding situations (BAS fun seeking), and positive affect/excitability (BAS reward responsiveness [RR]; e.g., “When good things happen to me, it affects me strongly”). Because the subscale RR of the BAS was found to correlate with the connectivity between striatal and sensory regions (DelDonno et al., 2017) and with shorter RTs in conditions of HR motivation (Chaillou, Giersch, Hoonakker, Capa, & Bonnefond, 2017), we included it as a covariate in our analyses.

MRI Data Acquisition Parameters

A 3-T whole-body MRI scanner (TIM Trio) with the product 32-channel head coil was used in the experiment. Earplugs were used to attenuate scanner noise, and head movement was restricted using memory foam pillows. Functional images were acquired using a multiplexed EPI sequence (Feinberg et al., 2010) with repetition time (TR) = 650 msec, echo time (TE) = 30 msec, flip angle = 50°, 36 slices, 64 × 64 pixels, 3 × 3 mm voxel size, and 3.9-mm slice spacing. The multiband acceleration factor was 4, and parallel acquisition technique was not used. A high-resolution structural T1 scan and a proton-density weighted scan were acquired at the end of the second scanning session. Structural images were acquired with a T1-weighted 3-D sequence (magnetization prepared rapid gradient echo; TR = 1900 msec, TE = 2.27 msec, flip angle = 9°, parallel acquisition technique factor = 2, 256 × 256 × 192 voxels, 1 × 1 × 1 mm voxel size).

In addition, a proton-density scan was acquired to visualize the SN/VTA in the midbrain (based on the procedure described by D'Ardenne, McClure, Nystrom, & Cohen, 2008), with the following parameters: 20 axial slices with no gap, TR = 6000 msec, TE = 8.4 msec, flip angle = 149°, field of view = 205 mm, matrix = 205 × 205 × 60, and voxel size = 0.8 × 0.8 × 3.0 mm (Schott et al., 2006). Structural proton density–weighted images from each participant were normalized to the standard Montreal Neurological Institute (MNI) template and averaged across participants.

Statistical Analyses

fMRI Data Preprocessing

EPI images were preprocessed using SPM software SPM8 (Wellcome Trust Centre for Neuroimaging) implemented in MATLAB R2012a (The MathWorks, Inc.). To avoid T1 saturation effects, image acquisition for each run started after 10 dummy volumes had been recorded. Functional images were spatially realigned to the mean of the images, coregistered to the anatomical scan, spatially normalized to the standard MNI EPI template, and spatially smoothed with an isotropic 8-mm FWHM Gaussian kernel (Friston et al., 1994).

Inspection of motion parameters obtained after image realignment using ArtRepair (Mazaika, Hoeft, Glover, & Reiss, 2009) revealed that all but one participant's total motion was less than 3 mm (n = 1 was excluded from analysis). Selected participants' (n = 20) structural volumes were normalized to the MNI T1 template before creating a mean T1 image used for visualization in reported figures.

fMRI Data Analysis

Data were analyzed using SPM toolbox (v12, Wellcome Department of Cognitive Neurology, www.fil.ion.ucl.ac.uk/spm) implemented in MATLAB v.8.3 (The MathWorks, Inc.).

Separate first-level models were built for the conditioning task, the first cycle of the nonrewarded object–location learning task, and the delayed recall of the learning task. Functional data were analyzed by convolving the onset of each event with a hemodynamic response function. The six movement parameters estimated during realignment were also included to capture residual (linear) movement artifacts. The sets of voxel values obtained from the different contrasts constituted maps of t statistics. The individual summary statistical images were used in a second-level analysis, corresponding to a flexible factorial design analysis. Above-threshold activation using an SPM 12's default whole-brain FWE correction at p < .05 with a minimum cluster size of 5 contiguous voxels was reported. Where noted, ROI-based analyses with small volume correction (SVC) were carried out to complement whole-brain FWE-corrected results.

The general linear model (GLM) for the conditioning task included the cue and feedback onset, separately for HR and LR. For the nonrewarded object–location learning task, the GLM included eight event types, corresponding to the four conditions (ConHR, ConLR, TrHR, and TrLR) for encoding and recall phases, separately. The same recall conditions were modeled for the delayed recall test. Each regressor modeled the BOLD activity corresponding to the onset of a picture with the addition of a parametric modulator representing time (linearly descending with trial count). We included this time modulation as the initial effect of reward conditioning is expected to undergo extinction as the items are presented again and again without reward.

The recall regressors also included a second modulator, that is, the scaled Euclidean distance to target (DTT) for the particular response. Euclidean DTT in pixels was simply divided by the absolute maximum possible DTT. As a result of the scaling, the modulator takes values between 0 and 1, where 0 is a correct answer and 1 is the absolute maximum error. Group effects were investigated using separate second-level flexible factorial models for encoding and recall, with correction for condition variance nonsphericity.

The participant-level covariate RR (z-scored RR; subscale of BAS) was included in the second-level fMRI analyses of data acquired in the first session because we found it had a significant influence on the behavioral results. The focus of the analysis of Cycle 1 of the learning task was on the effect of reward level, with critical contrasts between conditions of HR and LR (ConHR + TrHR > ConLR + TrLR).

First-level GLM of the data from the second session (acquired 24 hr later) included the following regressors: location recall with two parametric modulators, DTT and response confidence (per each condition), and the onset of the source memory question.

Regions of Interest

Building on the literature describing the interactions between reward learning and memory, we selected three ROIs to examine in detail the interaction of the reward system with the spatial learning system. Specifically, we focused on the bilateral hippocampus interaction with the SN/VTA (Ripollés et al., 2016; Adcock, Thangavel, Whitfield-Gabrieli, Knutson, & Gabrieli, 2006) and bilateral ventral striatum regions (Wimmer, Daw, & Shohamy, 2012; Wimmer & Shohamy, 2012). Bilateral hippocampus masks (left and right) were defined from WFU-Pickatlas v3.0.5 (Maldjian, Laurienti, Kraft, & Burdette, 2003). A bilateral ventral striatum ROI was defined based on the online meta-analysis tool Neurosynth.org (Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011) of studies associated with “reward anticipation” (reverse inference). The image was thresholded with FDR correction of p < .01; visualized in bspmview toolbox (v. 20151217, Bob Spunt, California Institute of Technology) at t value > 5, with a minimum cluster size of 40; and then smoothed with robust smoothing (default settings) to isolate the clusters. The bilateral ventral striatum cluster contained 97 voxels and was saved as a binary image matching the size of the fMRI contrast images. An anatomically defined SN/VTA area was manually defined in MRIcron (Rorden, Karnath, & Bonilha, 2007) on the group mean of the proton density images, where it can be distinguished from surrounding structures as a bright stripe (after Schott et al., 2006) and saved as a binary mask (64 voxels). These ROIs were used for functional connectivity analyses. We also extracted signal change from the mesolimbic ROIs defined above (ventral striatum and SN/VTA) to test for a link between activation of these regions during the conditioning task and the subsequent encoding of the nonrewarded learning task. The same ROIs were used for SVC correction in the main linear contrasts when indicated. SVC correction is applied using FWE cluster correction.

Functional Connectivity

All connectivity analyses were carried out using CONN Toolbox v.17f (Whitfield-Gabrieli & Nieto-Castanon, 2012) implemented in SPM 12 (www.fil.ion.ucl.ac.uk/spm/) using ROI masks described in ROIs section (left and right hippocampus, bilateral ventral striatum, SN/VTA) as seed regions. This toolbox permits computation of temporal correlations of BOLD signals between selected ROIs, or between selected ROIs to other voxels in the brain, and has been used in earlier functional connectivity studies of reward processing (Alba-Ferrara, Müller-Oehring, Sullivan, Pfefferbaum, & Schulte, 2015; Peciña & Berridge, 2013). Generalized psychophysiological interaction (gPPI; task-modulation effects) analyses were performed using smoothed functional images modeled as zero-duration events with onsets on picture display (in the encoding of the nonrewarded object–location task from Cycle 1) and picture location question (at delayed recall) for each of the four conditions. This analysis used the hemodynamic-response-function-convolved impulse time series, bandpass-unfiltered, as parametric/linear modulators of the connectivity between two ROIs or voxels. The time series of activity from a seed ROI were extracted and plugged into the original GLMs to compute a correlation to test whether this correlation changed as a function of a specific contrast. Briefly, gPPI allows for an analysis of task-associated connectivity without the two-condition constraint necessary for traditional PPI analysis by controlling for the main effects of any number of conditions across the scanning session in a single model. “Task-associated” connectivity can therefore be analyzed independent of task-associated effects on BOLD response. We used the Conn toolbox's (v.17) recommended FDR seed-level correction for the ROI-to-ROI and seed-to-voxel analyses reported in the paper.

gPPI analyses were performed to model voxels whose covariance with the hippocampus, ventral striatum, and SN/VTA was influenced by reward (High, Low) and picture status condition (Conditioned, Transfer). Individual participants' motion parameters and main effects of task condition were modeled as nuisance covariates. Reported connectivity values are the Fisher-transformed correlation coefficients extracted for each condition at the second level. For display purposes, we extracted connectivity values for each participant for each condition in the ROI-to-ROI analysis on data from the recall phase of Cycle 1 (Figure 7). To illustrate mean group connectivity values for clusters detected in the seed-to-voxel analysis of data from Day 2, we extracted connectivity values from an F contrast of all recall conditions within the obtained clusters using an inclusive mask (Figure 7B).

Behavioral Data Analysis

For the conditioning task, only successful trials (with a response executed in time) were included in the analysis. Misses and incorrect responses constituted only 0.56% of all trials.

For the object–location learning task, performance was assessed as RTs and DTT using trial-wise data. For correlations, mean DTT of a given category per subject was used. Euclidean distance was calculated as xnx02+yny02, where n refers to the location on the screen (in pixels) selected by the participant and 0 is the target location. Consequently, there were six possible values the DTT can take: five values for incorrect responses and zero for a correct response (distance of zero).

To verify learning performance, we measured memory accuracy #correctresponsestotal#oftrials for each condition and cycle. The average group performance was well above the chance level of 16.67% throughout the task. Note that n = 1 participant who failed to show improvement from Cycle 1 to Cycle 3 was excluded from all analyses (accuracy of about 25% across the three cycles; reported in Table 1 below).

Table 1. 
Percentage of Correct Responses (Mean and 95% Confidence Interval)
Past RewardPicture StatusCycle 1Cycle 2Cycle 3Day 2
HR Con 46.5% (10%) 66.4% (10%) 76.4% (11%) 57.7% (11%) 
Tr 49.0% (11%) 70.2% (10%) 80.8% (7%) 57.9% (8%) 
LR Con 48.9% (9%) 66.7% (9%) 80.4% (8%) 60.1% (11%) 
Tr 46.2% (10%) 70.4% (10%) 78.8% (10%) 62.4% (11%) 
Past RewardPicture StatusCycle 1Cycle 2Cycle 3Day 2
HR Con 46.5% (10%) 66.4% (10%) 76.4% (11%) 57.7% (11%) 
Tr 49.0% (11%) 70.2% (10%) 80.8% (7%) 57.9% (8%) 
LR Con 48.9% (9%) 66.7% (9%) 80.4% (8%) 60.1% (11%) 
Tr 46.2% (10%) 70.4% (10%) 78.8% (10%) 62.4% (11%) 

Only data from Cycle 1 and delayed recall 24 hr later are reported in this paper.

RTs for the conditioning and associative learning tasks were log10-transformed before analysis. Misses were excluded from data analysis (0% in the reward conditioning task, 1.6% of all responses in the associative learning task). Responses with RT < 100 msec were regarded as impulsive and excluded from behavioral data analysis (3.9% of responses in the reward conditioning task, 0.002% of responses in the associative learning task). RT and DTT data were analyzed at a single-trial level using a linear mixed model (Baayen & Milin, 2010) implemented in SPSS v.22 (IBM Corp. Released 2013; IBM SPSS Statistics for Windows, IBM Corp.), using restricted maximum likelihood estimation method. A linear mixed model (also known as random effects model) accounts for within-participant correlation of repeated measurements with the inclusion of a random intercept for participants. In addition, and unlike traditional repeated-measures analyses, mixed models can handle unbalanced data (e.g., unequal trials per condition) and trial-level covariates (such as confidence ratings) and do not require normally distributed data. All linear mixed models included random factor Participant and fixed factors Reward (high, low) and Picture Status (conditioned, transfer). RR trait scores were z-scored on group level and included as a participant-level covariate in behavioral data and fMRI analyses of the associative learning task. For the analysis of RT data in the associative learning task at immediate and delayed recall, we additionally separated the trials into those correctly and incorrectly recalled, thus adding one fixed factor Correctness (correct, incorrect). DTT data from the delayed test were analyzed excluding responses reported as guesses (∼12.5%), but with a covariate confidence for the retained nonguessed responses.

RESULTS

Reward Conditioning

Behavior

The experiment began with a reward conditioning task in which monetary rewards of two levels (HR = 10 points, 0.50 CHF; LR = 1 point, 0.05 CHF) were offered for correct picture-categorization responses (Figure 1B). Reward conditioning success was assessed via pleasantness ratings performed before and after conditioning on a visual analog scale for a subset of pictures (10 for each reward level) and confirmed that preference scores changed as a function of reward level (linear mixed model with fixed factor Reward [high, low] and random factor Participant: F(1, 378) = 4.42, p = .036; Figure 2A). We used the average difference score (postconditioning minus preconditioning preference; Figure 2A) measured on a visual analog scale from “unpleasant” to “pleasant” as a behavioral index of each participant's conditioning strength and found that it correlated positively with individual trait RR, a subscale of the BAS (Carver & White, 1994; Pearson's r = .45, p = .043). There was no main effect of Reward on RTs in the conditioning task (RTs; linear mixed model with fixed factor Reward [high, low]: F(1, 1507) = 0.123, p = .726), but the random factor Participant was significant (Wald's Z = 3.023, p = .003), pointing to significant inter-participant differences. Indeed, when we included RR as a covariate, we found that participants with higher trait RR classified pictures faster in the conditioning task than those with lower trait RR (a main effect of the covariate RR on RTs: F(1, 9.5) = 19.4, p = .002), but no significant interaction with factor Reward (RR × Reward: F(1, 743.1) = 2.275, p = .132).

Figure 2. 

Main results of the reward conditioning task. (A) Change in behavioral ratings of pleasantness of HR and LR pictures because of conditioning (after–before) represented as mean ± 95% confidence interval. (B) Statistical brain maps comparing HR to LR pictures in the reward conditioning task. Clusters in the bilateral ventral striatum (VS), left superficial amygdala (SA), SN/VTA, and vmPFC visualized at whole-brain p(FWE) < .05. * denotes p < .05 for the effect of reward.

Figure 2. 

Main results of the reward conditioning task. (A) Change in behavioral ratings of pleasantness of HR and LR pictures because of conditioning (after–before) represented as mean ± 95% confidence interval. (B) Statistical brain maps comparing HR to LR pictures in the reward conditioning task. Clusters in the bilateral ventral striatum (VS), left superficial amygdala (SA), SN/VTA, and vmPFC visualized at whole-brain p(FWE) < .05. * denotes p < .05 for the effect of reward.

fMRI—Reward Conditioning

Because trait RR was found to significantly interact with behavioral performance in the conditioning task, we systematically included it as a covariate in our second-level fMRI models to account for potential individual modulations in the effects of reward on the nonrewarded object–location associative learning task. Comparing HR to LR picture presentation (main regressors) revealed robust activation clusters within the mesocorticolimbic dopaminergic reward circuit, including the SN/VTA, bilateral ventral striatum, left superficial amygdala, and ventromedial PFC (vmPFC; Figure 2B and Table 2). This pattern of results is consistent with previous fMRI studies reporting reward-anticipatory activation primarily in the ventral striatum and the VTA (Garrison, Erdeniz, & Done, 2013; Sescousse, Caldú, Segura, & Dreher, 2013; Liu, Hairston, Schrier, & Fan, 2011; O'Doherty, Buchanan, Seymour, & Dolan, 2006; Knutson, Fong, Adams, Varner, & Hommer, 2001). During the presentation of the reward feedback, significant activation was observed mostly in visual occipital cortex, likely reflecting perceptual differences between the reward feedback type: HR was presented as four moving coins falling into a piggy bank, whereas LR was represented as one coin falling. At a lower threshold (p < .001 uncorrected), activation was detected also in the vmPFC (corrected with an SVC using a functionally defined mask; reported in Table 2). We conclude that the conditioning task successfully induced reward learning with a higher anticipatory reward response for the HR compared to LR pictures.

Table 2. 
Activation during the Conditioning Task Showing Reward Effects during the Conditioning Task at Whole-Brain p(FWE) < .05
ContrastRegionPeak Z(# Voxels)Location (x, y, z)
Cue HR > LR 
  SN/VTA 7.13 (144) 10, −20, −16 
R premotor and motor cortex (hand movement) 6.26 (196) 12, −14, 58 
SMA 5.93 (54) −16, 0, 48 
L amygdala extending to pallidum 5.8 (64) −16, −2, −10 
R putamen/pallidum 5.6 (60) 26, 6, 6 
R postcentral gyrus (somatosensory) 5.55 (14) 18, −40, 60 
R occipitotemporal (visual) regions 5.55 (47) 60, −68, −12 
L postcentral gyrus 5.49 (22) −38, −36, 54 
Middle cingulate gyrus 5.48 (28) 4, −4, 36 
R inferior frontal gyrus (dorsolateral PFC) 5.47 (47) 40, 34, 6 
Mid-cingulate gyrus 5.43 (7) −12, −8, 36 
SMA 5.39 (19) −6, −12, 60 
R middle frontal gyrus 5.24 (10) 10, 50, −2 
L insula 5.09 (5) −36, −6, 12 
Feedback HR > LR 
  L visual fusiform gyrus 5.84 (283) −22, −84, −18 
Lingual gyrus 5.44 (67) 0, −92, −10 
R lingual gyrus 5.05 (29) 20, −78, −18 
L occipital lobe 4.96 (8) −22, −86, −4 
vmPFC 4.62 (121) −4, 36, −12 
ContrastRegionPeak Z(# Voxels)Location (x, y, z)
Cue HR > LR 
  SN/VTA 7.13 (144) 10, −20, −16 
R premotor and motor cortex (hand movement) 6.26 (196) 12, −14, 58 
SMA 5.93 (54) −16, 0, 48 
L amygdala extending to pallidum 5.8 (64) −16, −2, −10 
R putamen/pallidum 5.6 (60) 26, 6, 6 
R postcentral gyrus (somatosensory) 5.55 (14) 18, −40, 60 
R occipitotemporal (visual) regions 5.55 (47) 60, −68, −12 
L postcentral gyrus 5.49 (22) −38, −36, 54 
Middle cingulate gyrus 5.48 (28) 4, −4, 36 
R inferior frontal gyrus (dorsolateral PFC) 5.47 (47) 40, 34, 6 
Mid-cingulate gyrus 5.43 (7) −12, −8, 36 
SMA 5.39 (19) −6, −12, 60 
R middle frontal gyrus 5.24 (10) 10, 50, −2 
L insula 5.09 (5) −36, −6, 12 
Feedback HR > LR 
  L visual fusiform gyrus 5.84 (283) −22, −84, −18 
Lingual gyrus 5.44 (67) 0, −92, −10 
R lingual gyrus 5.05 (29) 20, −78, −18 
L occipital lobe 4.96 (8) −22, −86, −4 
vmPFC 4.62 (121) −4, 36, −12 

Effects of Reward Conditioning on Subsequent Nonrewarded Associative Learning Task

In the learning task that followed the conditioning procedure, the participants learned to assign one spatial location (of six possible locations) to pictures that has been reward conditioned (18 ConHR and 18 ConLR) as well as to new pictures that belonged to the same two semantic categories (referred to as transfer pictures; 18 TrHR and 18 TrLR). We henceforth refer to this factor with two levels—Conditioned and Transfer—as “picture status.” The task was composed of alternating encoding (memorize) and recall (respond) runs (Figure 1A).

Behavior—Associative Learning Task

Main analyses.

Consistent with the effect observed in the conditioning task, trait RR interacted with recall performance for both DTT and RT measures. Specifically, we found that individuals with high RR responded faster and more accurately on trials with pictures previously conditioned with HR compared to LR (negative correlation between RR and mean DTT for ConHR > ConLR Spearman's ρ = −0.46, p = .041; negative correlation between RR and RTs ConHR > ConLR Spearman's ρ = −0.4174, p = .0671; no effects were found when comparing transfer pictures, DTT TrHR > TrLR ρ = −0.15, p = .52; RTs: p = .68). Consequently, RR was used as a covariate in second-level fMRI analyses.

More in detail, RTs were faster for correct responses, F(1, 1375.95) = 42.957, p < .001, and for previously conditioned pictures in comparison to transfer pictures (main effect of Picture Status: F(1, 1364.312) = 5.739, p = .017; mixed model with random factor Participant and fixed factors Reward [high, low], Picture Status [conditioned, transfer], and Correctness [on target, off target] and covariate RR as well as all two- and three-way interactions of main factors). We also found a triple interaction of Picture Status × Correctness × RR, F(1, 1365.789) = 5.164, p = .023. There was also a trend for an interaction between Picture Status × Correctness, F(1, 1372.14) = 3.146, p = .076, because of faster RTs for correct responses to previously conditioned pictures. Main effect of reward was not significant, F(1, 1377.453) = 2.458, p = .117, neither was the interaction Reward × RR, F(1, 1319.929) = 0.637, p = .425 (p values for other effects > .385). We next split the analysis following the interaction pattern and discovered that, for directly conditioned but not transfer pictures, the RR covariate was significant, F(1, 87.003) = 4.198, p = .043, and again interacted with Correctness, F(1, 674.393) = 6.593, p = .01, as well as with Correctness and Reward, F(1, 683.628) = 5.084, p = .024 (main effect of Correctness: F(1, 679.1) = 6.738, p = .01; main effect of Reward: F(1, 685.448) = 0.635, p = .426, ns). However, the RR had no effect on RTs to transfer pictures, F(1, 39.178) = 0.495, p = .486 (main effect of Correctness: F(1, 689.99) = 52.291, p < .001; main effect of Reward: F(1, 690.956) = 2.793, p = .095, ns].

We found no significant main effect of previous reward conditioning on our DTT in the first cycle of the task considering all 72 pictures (Reward: F(1, 1394.1) = 0.001, p = .975; Picture Status: F(1, 1394.13) = 0.1, p = .75; Reward × Picture Status: F(1, 1394.05) = 2.08, p = .15; RR: F(1, 18.02) = 0.4, p = .536; mixed model with fixed effects Reward [high, low] and Picture Status [conditioned, transfer], interaction Reward × Picture Status, covariate RR, and random effect of Participant).

Behavioral effects of trial position within mini-blocks.

To avoid any progressive change (increase or decrease) of reward effect within a mini-block, we made sure that each individual picture was presented only once during Cycle 1, thus preventing any habituation effect from the repetition of identical pictures. Yet, blocked presentation of rewarding events could potentially increase expectation (i.e., reduce reward prediction error-like activity) over the course of one mini-block. To ensure that the reward effect was not significantly affected by the succession of pictures of the same reward condition within a mini-block, we analyzed the behavioral data considering the trial position in mini-block at encoding in Cycle 1 (thereafter “Trial Position”) as a covariate.

For the recall data from Cycle 1, the analysis of RT data included fixed factors Reward (high, low), Picture Status (conditioned, transfer), and Correctness (on target, off target); covariate Trial Position; all two-way interaction terms; and random factor Participant. The results revealed no main effect of the covariate and no significant interactions with the covariate: main effect of covariate Trial Position, F(1, 1363.179) = 2.426, p = .12; Reward × Trial Position, F(1, 1363.676) = 0.029, p = .865; Picture Status × Trial Position, F(1, 1363.659) = 0.005, p = .944; and Correctness × Trial Position, F(1, 1367.609) = 3.16, p = .076.

For analyzing the DTT data at recall in Cycle 1, we built a model including fixed factors Reward (high, low) and Picture Status (conditioned, transfer) and covariate Trial Position as well as all interaction terms and random factor Participant. We found no significant effects of Trial Position, F(1, 1367.233) = 1.773, p = .183, and no interactions of Reward × Trial Position, F(1, 1367.823) = 0.008, p = .929, and Picture Status × Trial Position, F(1, 1367.767) = 0.147, p = .702.

In summary, these results suggest that recall performance (RT and DTT) in Cycle 1 did not interact with trial position within the mini-block during encoding in Cycle 1.

fMRI—Associative Learning Task

HR vs. LR.

Comparing brain activity elicited by previously high- versus low-rewarded pictures (ConHR + TrHR > ConLR + TrLR) at encoding of picture–location associations yielded no significant voxels, even at a lenient threshold (p < .001 uncorrected). Interestingly, the opposite contrast (ConHR + TrHR < ConLR + TrLR) revealed robust activation in the bilateral ventral striatum and in the right hippocampus (Figure 3 and Table 3).

Figure 3. 

Effects of reward conditioning on brain activation in a subsequent nonrewarded object–location learning task. (A) Activation of the right hippocampus and the bilateral ventral striatum during encoding of previously LR conditioned and semantically related transfer pictures versus HR conditioned and transfer pictures. Whole-brain contrast (ConLR + TrLR > ConHR + TrHR) corrected with FWE at p < .05. (B) Mean parameter estimates ± 95% confidence interval extracted from the activation clusters shown in A, presented for visualization purpose only. Con = previously conditioned; Tr = transfer.

Figure 3. 

Effects of reward conditioning on brain activation in a subsequent nonrewarded object–location learning task. (A) Activation of the right hippocampus and the bilateral ventral striatum during encoding of previously LR conditioned and semantically related transfer pictures versus HR conditioned and transfer pictures. Whole-brain contrast (ConLR + TrLR > ConHR + TrHR) corrected with FWE at p < .05. (B) Mean parameter estimates ± 95% confidence interval extracted from the activation clusters shown in A, presented for visualization purpose only. Con = previously conditioned; Tr = transfer.

Table 3. 
fMRI Analysis of the Nonrewarded Associative Learning Task for the Contrast Comparing LR to HR Pictures at Encoding
ContrastRegionPeak Z(# Voxels)Location (x, y ,z)
Encoding (ConHR + TrHR < ConLR + TrLR) 
  R OFC 6.56 (46) 2, 68, −12 
R ventral striatum/R putamen 6.26 (420) 16, 6, −10 
L ventral striatum/L putamen 5.93 (292) −22, 8, −12 
−16, 10, −2 
−14, 4, −12 
Anterior cingulate (motor) cortex 5.49 (13) 12, 8, 36 
L parietal lobule 5.47 (68) −62, −40, 44 
R postcentral gyrus 5.23 (19) 56, −30, 60 
R hippocampus 5.06 (22) 22, −24, −20 
R entorhinal cortex 4.20 (37) 20, −4, −30 
ContrastRegionPeak Z(# Voxels)Location (x, y ,z)
Encoding (ConHR + TrHR < ConLR + TrLR) 
  R OFC 6.56 (46) 2, 68, −12 
R ventral striatum/R putamen 6.26 (420) 16, 6, −10 
L ventral striatum/L putamen 5.93 (292) −22, 8, −12 
−16, 10, −2 
−14, 4, −12 
Anterior cingulate (motor) cortex 5.49 (13) 12, 8, 36 
L parietal lobule 5.47 (68) −62, −40, 44 
R postcentral gyrus 5.23 (19) 56, −30, 60 
R hippocampus 5.06 (22) 22, −24, −20 
R entorhinal cortex 4.20 (37) 20, −4, −30 

Clusters reported at whole-brain cluster p(FWE) < .05.

We also compared signal extracted from the a priori defined reward-related ROIs (ventral striatum and SN/VTA) during the conditioning task and the subsequent first encoding of the nonrewarded learning task. In both regions, response to HR feedback during conditioning (contrast HR > LR feedback; Table 2) correlated negatively with the response of the same region during the subsequent learning of locations for the same pictures (ventral striatum: Spearman's ρ = −0.52, p = .017; SN/VTA: Spearman's ρ = −0.63, p = .0036). These results suggest that the more a participant's reward areas responded to high monetary reward during the conditioning task, the lower was their activation when subsequently learning the location of HR conditioned pictures during the nonrewarded object–location task.

Reward effects in subsequent encoding cycles of the associative learning task.

We performed additional exploratory whole-brain analyses to assess the reward effect in subsequent encoding cycles, beyond Cycle 1. Comparing the HR and LR for conditioned and transfer pictures revealed a strong decrease of activity in the ventral striatum for Cycle 1 compared to the data for the same contrast from Cycle 2 (Figure 4A). No significant activation was found when directly comparing reward effect in Cycle 2 versus Cycle 3, even at p < .001 uncorrected (Figure 4B). Together, these results indicate that, as expected, the reward-value-related modulation of activity predominated at the beginning of the task as compared to during subsequent repetitions of the cycles.

Figure 4. 

Comparison between cycles. (A) Effect of reward at encoding during Cycle 1 versus Cycle 2. (B) Effect of reward at encoding during Cycle 1 versus Cycle 3. uncorr. = uncorrected.

Figure 4. 

Comparison between cycles. (A) Effect of reward at encoding during Cycle 1 versus Cycle 2. (B) Effect of reward at encoding during Cycle 1 versus Cycle 3. uncorr. = uncorrected.

The main effects of reward in Cycle 2 and in Cycle 3 were evaluated separately (Table 4). In Cycle 2, we did not observe any activation for HR < LR for both conditioned and transfer pictures at the corrected threshold (FWE, p < .05); there was some inferior temporal gyrus activation at p < .01 uncorrected, as illustrated in Figure 5A. In Cycle 3, HR < LR (Conditioned + Transfer) at p(FWE) < .05 revealed a 7-voxel activation in the left hippocampal area (Figure 5B). As mentioned in the Methods section, all other fMRI analyses of the nonrewarded associative learning task focus solely on Cycle 1.

Table 4. 
Whole-Brain Reward Effect in Cycle 2 and Cycle 3 (FWE p < .05)
ContrastRegionPeak Z(# Voxels)Location (x, y, z)
Encoding (ConHR + TrHR < ConLR + TrLR) in C2 
  No active voxels at the adopted corrected threshold 
Encoding (ConHR + TrHR < ConLR + TrLR) in C3 
  Visual cortex 77 −42, −86, 12 
Hippocampus 4.82 −32, −22, −12 
ContrastRegionPeak Z(# Voxels)Location (x, y, z)
Encoding (ConHR + TrHR < ConLR + TrLR) in C2 
  No active voxels at the adopted corrected threshold 
Encoding (ConHR + TrHR < ConLR + TrLR) in C3 
  Visual cortex 77 −42, −86, 12 
Hippocampus 4.82 −32, −22, −12 
Figure 5. 

(A) Effect of LR in Cycle 2 (ConLR + TrLR > ConHR + TrHR) at p < .01, uncorrected (uncorr.). (B) Effect of LR in Cycle 3 (ConLR + TrLR > ConHR + TrHR) at p < .05 FWE.

Figure 5. 

(A) Effect of LR in Cycle 2 (ConLR + TrLR > ConHR + TrHR) at p < .01, uncorrected (uncorr.). (B) Effect of LR in Cycle 3 (ConLR + TrLR > ConHR + TrHR) at p < .05 FWE.

Picture status during the associative learning task.

We examined the effects of prior reward on the encoding of object–location associations separately for conditioned and transfer pictures. We first conducted a whole-brain FWE analysis. For high versus low conditioned pictures, only bilateral ventral striatum was significantly less activated, and for high versus low transfer pictures, only the right hippocampus was significant (Figure 6 and Table 5). Because these effects were in the similar direction (HR < LR) for conditioned and transfer pictures in both areas (see Figure 3B), the net effect of reward was stronger when pooling both conditioned and transfer conditions together. Indeed, in a second analysis step using an SVC with an a priori defined anatomical bilateral hippocampal mask on the contrast comparing high versus low conditioned pictures (see Methods), we verified a cluster in the right hippocampus (Table 5; Figure 6A, left). Applying the same approach with the a priori bilateral functional ventral striatum mask for the transfer pictures (TrHR < TrLR), we identified a cluster of activation (Table 5; Figure 6B, right). Overall, both regions showed similar trends for a reduced activity for HR versus LR for each of the picture status.

Figure 6. 

Effects of reward conditioning on subsequent nonrewarded object–location learning, for conditioned and transfer pictures separately. For each panel, we illustrate the right hippocampal activation (y = 26) and the bilateral ventral striatum (y = 8). (A) Activation for conditioned pictures only: ConHR < ConLR. (B) Activation for transfer pictures alone: TrHR < TrLR. FWE (p < .05) = whole-brain FWE correction at p < .05; SVC (p < .05) = SVC (cluster FWE p < .05) using a priori ROIs on whole-brain contrasts thresholded at p < .001 uncorrected (see Methods).

Figure 6. 

Effects of reward conditioning on subsequent nonrewarded object–location learning, for conditioned and transfer pictures separately. For each panel, we illustrate the right hippocampal activation (y = 26) and the bilateral ventral striatum (y = 8). (A) Activation for conditioned pictures only: ConHR < ConLR. (B) Activation for transfer pictures alone: TrHR < TrLR. FWE (p < .05) = whole-brain FWE correction at p < .05; SVC (p < .05) = SVC (cluster FWE p < .05) using a priori ROIs on whole-brain contrasts thresholded at p < .001 uncorrected (see Methods).

Table 5. 
fMRI Analysis Comparing LR to HR Pictures at Encoding for the Conditioned and Transfer Pictures, Separately
ContrastRegionPeak Z(# Voxels)Location (x, y, z)
Encoding (ConHR < ConLR) 
  orbitofrontal cortex/frontal pole 7.26 (61) −4, 66, −12; −16, 62, −10 
R frontal pole 6.02 (25) 6, 66, −18 
R ventral striatum 6.15 (200) 16, 8, −10; 20, 6, 4 
L ventral striatum 5.55 (102) −20, 8, −10; −18, 10, 2 
Caudate 5.52 (16) −24, −4, 6 
SVC (cluster FWE p < .05) using bilateral hippocampal ROI at whole-brain p < .001 R hippocampus 4.38 (35) 36, −14, −16; 40, −20, −22 
R anterior CA (hippocampus) 3.56 (2) 26, −12, −24 
Encoding (TrHR < TrLR) 
  R motor cortex 5.28 (36) 64, −8, 24 
4.88 (7) 48, −16, −8 
R hippocampus 4.8 (9) 24, −26, −22 
SVC (cluster FWE p < .05) using ventral striatum ROI at whole-brain p < .001 R ventral striatum 4.35 (25) 16, 12, −6; 14, 8, −8 
L ventral striatum 3.97 (12) −14, 10, −8 
ContrastRegionPeak Z(# Voxels)Location (x, y, z)
Encoding (ConHR < ConLR) 
  orbitofrontal cortex/frontal pole 7.26 (61) −4, 66, −12; −16, 62, −10 
R frontal pole 6.02 (25) 6, 66, −18 
R ventral striatum 6.15 (200) 16, 8, −10; 20, 6, 4 
L ventral striatum 5.55 (102) −20, 8, −10; −18, 10, 2 
Caudate 5.52 (16) −24, −4, 6 
SVC (cluster FWE p < .05) using bilateral hippocampal ROI at whole-brain p < .001 R hippocampus 4.38 (35) 36, −14, −16; 40, −20, −22 
R anterior CA (hippocampus) 3.56 (2) 26, −12, −24 
Encoding (TrHR < TrLR) 
  R motor cortex 5.28 (36) 64, −8, 24 
4.88 (7) 48, −16, −8 
R hippocampus 4.8 (9) 24, −26, −22 
SVC (cluster FWE p < .05) using ventral striatum ROI at whole-brain p < .001 R ventral striatum 4.35 (25) 16, 12, −6; 14, 8, −8 
L ventral striatum 3.97 (12) −14, 10, −8 

Clusters reported at whole-brain cluster p(FWE) < .05, unless SVC correction is specified.

Functional connectivity during the associative learning task.

Motivated by previous research reporting stronger connectivity during encoding of reward-paired stimuli between SN/VTA, ventral striatum, and the hippocampus (Wimmer & Shohamy, 2012; Wittmann et al., 2005), we tested the functional connectivity between these ROIs using a psychophysiological interactions model (gPPI) implemented in Conn Toolbox (Whitfield-Gabrieli & Nieto-Castanon, 2012). In a multi ROI-to-ROI connectivity analysis, we observed reduced connectivity for the ventral striatum ROI with the SN/VTA, and with the left hippocampus, in the contrast HR > LR (Con + Tr) during encoding at Cycle 1 (main effect of reward for ventral striatum seed: F(3, 17) = 11.68, p(FDR corrected) = .0009; ventral striatum–SN/VTA connectivity t test: t(19) = −3.18, p(FDR) = .0075; ventral striatum–left hippocampus connectivity: t test t(19) = −3.17, p(FDR) = .0075; Figure 7).

Figure 7. 

Functional brain connectivity during encoding at Cycle 1 (Day 1). (A) Multi ROI-to-ROI functional connectivity with the ventral striatum (VS) during encoding at Cycle 1. (B) Mean connectivity estimates ± 95% confidence interval showing that the SN/VTA and left hippocampus (L Hipp) ROIs were less functionally connected to the VS ROI when encoding the location of HR compared to LR pictures. * denotes p < .05 for the main effect of reward.

Figure 7. 

Functional brain connectivity during encoding at Cycle 1 (Day 1). (A) Multi ROI-to-ROI functional connectivity with the ventral striatum (VS) during encoding at Cycle 1. (B) Mean connectivity estimates ± 95% confidence interval showing that the SN/VTA and left hippocampus (L Hipp) ROIs were less functionally connected to the VS ROI when encoding the location of HR compared to LR pictures. * denotes p < .05 for the main effect of reward.

Effects of trial position within mini-blocks on reward-related activation during the associative learning task.

As blocked presentation of rewarding events could potentially increase expectation (i.e., reduce reward prediction error-like activity) over the course of one mini-block, we investigated time-dependent changes in reward effects within a mini-block for the fMRI data. Thus, to test whether the overall decrease in fMRI activity for previously rewarded pictures might be partly because of trial position within a mini-block, we performed a control analysis, which included separate regressors for the first three trials, middle three trials, and last three trials (separately for the directly conditioned HR trials, the TrHR trials, the direct LR trials, and the TrLR trials). Because this new design incorporated a temporal dimension (here three successive triplets of trials), we did not include an additional parametric modulator by time. Next, we extracted the parameter estimates from the ventral striatum ROI (Figure 8). A repeated-measure ANOVA performed on these values, with Reward, Picture Status, and Trial Position as factors, replicated the original findings of less activation for high- than low-rewarded pictures (with the Greenhouse–Geisser correction for nonsphericity; main effect of Reward: F(1, 19) = 7.039, p = .016; no effect of Picture Status: F(1, 19) = 0.001, p = .973; no Reward × Picture Status interaction: F(1, 19) = 856, p = .366). Critically, there was no interaction of Trial Position × Reward, F(1.582, 30.06) = 1.46, p = .247. Together, these results demonstrate that the average decrease in activity for HR compared to LR pictures was not because of some dynamic change in prediction error that would unfold over the course of the mini-blocks.

Figure 8. 

Ventral striatum activity during encoding of Cycle 1 for successive triplets of trials within each mini-block. Extracted beta values from SPM model with mini-blocks divided into three successive bins of three trials each. Mean ± SEM.

Figure 8. 

Ventral striatum activity during encoding of Cycle 1 for successive triplets of trials within each mini-block. Extracted beta values from SPM model with mini-blocks divided into three successive bins of three trials each. Mean ± SEM.

Effects of Reward Conditioning at 24-hr Delayed Memory Test

Behavior

Participants returned to the laboratory exactly 24 hr later and were asked to recall the learned picture locations from the nonrewarded associative learning task. On each recall trial, they also indicated the confidence of their response, including an option “guessing.” The proportion of “guess” responses did not differ between the picture categories (on average, guesses constituted 12.2 ± 10% [mean ± SD] of all responses; ANOVA with factors Reward and Picture Status yielded no main effects and no interactions, all ps > .41). Note that, because no effects of covariate RR were found in the data from the second day of the experiment, it was dropped from these analyses.

For RTs, we additionally included response correctness as a factor in the analysis and found that the responses were faster for HR (main effect of Reward: F(1, 1211.242) = 3.999, p = .046) and for all correct responses (main effect of Correctness: F(1, 1213.433) = 34.227, p < .001; no effect of Picture Status: F(1, 1210.929) = 0.277, p = .599, ns; Picture Status × Correctness: F(1, 1212.425) = 1.725, p = .189, ns; other interactions: ps > .35; mixed model with fixed factors Reward, Picture Status, and Response Correctness [on target, off target], Reward × Picture Status and Reward × Correctness interactions, and random factor Participant).

DTT was analyzed excluding “guess” responses and including a trial-level covariate of level of confidence. We found that responses for HR pictures tended to produce higher DTT (i.e., worse memory performance). This was the case for both originally conditioned HR and TrHR pictures (marginally significant main effect of Reward: F(1, 1213.532) = 3.753, p = .053; exact results for the four conditions, ConLR: M = 0.209, SD = 0.333; ConHR: M = 0.235, SD = 0.348; TrLR: M = 0.203, SD = 0.335; TrHR: M = 0.242, SD = 0.355; main effect of covariate Confidence: F(1, 1228.376) = 248.18, p < .001; no Picture Status × Reward interaction: F(1, 1213.6) = 0.002, p = .966; no effect of Picture Status: F(1, 1213.1) = 0.488, p = .485; mixed model with fixed factors Reward and Picture Status, trial-level covariate Confidence [z scored for the total data subset], and a random intercept for Participant).

Effects of trial position within mini-blocks on delayed behavioral performance.

Similarly to what we did for immediate recall performance (RT and DTT; see section Effects of Reward Conditioning on Subsequent Nonrewarded Associative Learning Task above), here we tested whether trial position within a mini-block at initial encoding may have affected delayed memory performance, because of the succession of pictures of the same reward condition within a mini-block. We therefore added Trial Position in mini-block at encoding as a covariate in the analyses of RT and DTT data.

The analysis of RT data included fixed factors Reward (high, low), Picture Status (conditioned, transfer), and Correctness (on target, off target) and covariate Trial Position as well as all two-way interaction terms and random factor Participant. This analysis yielded no significant main effect of Trial Position, F(1, 1207.676) = 0.227, p = .634, and no significant interactions of Reward × Trial Position, F(1, 1207.832) = 0.322, p = .57, Picture Status × Trial Position, F(1, 1207.712) = 0.018, p = .892, and Correctness × Trial Position, F(1, 1210.605) = 2.722, p = .099 (other main effects: Reward, F(1, 1208.35) = 2.069, p = .151; Correctness, F(1, 1223.66) = 2.246, p = .134; Picture Status, F(1, 1207.788) = 0.018, p = .893).

For DTT data, we included fixed factors Reward and Picture Status, covariate Trial Position, and random factor Participant. The main effect of the covariate was nonsignificant, F(1, 1211.184) = 0.007, p = .934, as were the interactions (Reward × Trial Position: F(1, 1211.384) = 0.123, p = .726; Picture Status × Trial Position at encoding: F(1, 1211.365) = 0.324, p = .569).

In summary, none of these additional behavioral analyses disclosed any significant interaction of the trial position within the mini-block during encoding in Cycle 1 and subsequent delayed memory recall.

fMRI Connectivity Analysis—24-hr Delayed Memory Test

With the functional connectivity of fMRI data acquired while the participants were encoding the associations with previously HR pictures (vs. LR), we found reduced coupling from the seed region SN/VTA with the ventral striatum and the left hippocampus. Consequently, we tested the SN/VTA and the left hippocampus as seeds for functional connectivity analysis during recall at Day 2. We performed a functional seed-to-voxel connectivity analysis (gPPI) on a priori selected hippocampal seed regions. During recall 24 hr after learning, functional connectivity with the left hippocampus was significantly lower for recall of HR versus LR picture locations (ConHR + TrHR < ConLR + TrLR) in two clusters: the vmPFC (peak coordinates: x = 8, y = 32, z = −12; k = 121, cluster p(FDR) = .014) and the right nucleus accumbens (peak: x = 8, y = 10, z = −16; k = 137, cluster p(FDR) = .014; Figure 9). Same analysis for the right hippocampus seed revealed a main effect of reward in the contrast HR < LR (Con + Tr). Connectivity with the right hippocampus was lower for right orbitofrontal cortex (OFC; k = 172; x = 36, y = 62, z = −4; cluster p(FWE) = .004) and inferior temporal gyrus (k = 161; x = +54, y = −46, z = −20; cluster p(FWE) = .006) during recall of high compared to low value associations. Note that the connectivity was significantly lower for conditions with worse associative memory performance (i.e., HR) and, as in behavioral data, the effect was general for both originally rewarded stimuli and those that were semantically related. The SN/VTA seed yielded no significant voxels in the comparison of ConHR versus ConLR pictures, at the adopted threshold of cluster-based p(FDR) < .05.

Figure 9. 

Functional brain connectivity during delayed (24-hr) recall of the object–location task. (A) Reduced functional connectivity during picture–location recall 24 hr after learning for stimuli from previously HR-conditioned category. Activity in the seed region in the left hippocampus (CA) is significantly less correlated with both the vmPFC (+8, +32, −12) and the ventral striatum (+8, +10, −16) for the contrast ConHR + TrHR < ConLR + TrLR. (B) Data represent the group average of Fisher-transformed correlation coefficients extracted for each condition for each participant at the second level for the clusters ventral striatum and vmPFC. Error bars are ±95% confidence intervals. * denotes p < .05 for the main effect of reward.

Figure 9. 

Functional brain connectivity during delayed (24-hr) recall of the object–location task. (A) Reduced functional connectivity during picture–location recall 24 hr after learning for stimuli from previously HR-conditioned category. Activity in the seed region in the left hippocampus (CA) is significantly less correlated with both the vmPFC (+8, +32, −12) and the ventral striatum (+8, +10, −16) for the contrast ConHR + TrHR < ConLR + TrLR. (B) Data represent the group average of Fisher-transformed correlation coefficients extracted for each condition for each participant at the second level for the clusters ventral striatum and vmPFC. Error bars are ±95% confidence intervals. * denotes p < .05 for the main effect of reward.

DISCUSSION

Research on how extrinsic incentives affect learning and decision-making has demonstrated an intriguing paradox. Although much experimental evidence shows that activation of the reward circuit enhances learning and recognition memory (Igloi, Gaggioni, Sterpenich, & Schwartz, 2015; Gruber, Gelman, & Ranganath, 2014; Murty & Adcock, 2014; Adcock et al., 2006; Wittmann et al., 2005), several studies pointed out situations in which offering monetary reward reduces performance—an effect accompanied by reduced activation of the ventral striatum (Murayama et al., 2010).

Our study contributes to deciphering the complex relationship between the timing of reward and its effects to facilitate or interfere with establishing new associations in memory, so important for practical purposes such as education. In other words, would a previously rewarded item when presented again in a different learning context facilitate or hinder that novel learning? Here, we hypothesized that initially rewarded stimuli trigger a reevaluation process when presented in a different task context in the absence of reward, which would hinder, rather than facilitate, the formation of new associations with the reward-related stimuli. We tested this hypothesis by assessing the effects of reward conditioning on a subsequent nonrewarded associative learning task at a time when reward was most likely to still exert lingering effects. We report a negative reward prediction error-like signal in the ventral striatum paralleled with reduced hippocampus activity during the encoding of these new associations, an effect that correlated with trait RR. At the behavioral level, we found a trend for worse memory formation for previously highly rewarded (HR) pictures compared to LR pictures, 24 hr after picture–location learning, despite an increased preference for ConHR pictures right after the conditioning task. Furthermore, we found indication that these neural and memory effects may affect not only the specific reward-associated items but also new, never-conditioned pictures from the same semantic category.

One previous behavioral study demonstrated a reward-driven enhancement of recognition memory that emerged 24 hr after encoding pictures from the same semantic category (Patil, Murty, Dunsmoor, Phelps, & Davachi, 2017) that had not been directly paired with performance-dependent reward. Here, we report an opposite effect presumably because of the crucial difference in timing of reward association. In contrast to our design, the nonrewarded task, for which a memory enhancement was observed by Patil and colleagues, was administered before the reward association phase. The enhancement of recognition memory was thus a retroactive effect of reward that affected the consolidation process because of postencoding interactions between the reward and memory systems. Comparing the study by Patil et al. and ours emphasizes that timing of reward association may determine the direction of interaction between the ventral striatum, the VTA, and the hippocampus, and their consequences on delayed recall accuracy.

Reactivation of Reward Memory in a New Context Leads to Adjustment of Associated Reward Values

First, at encoding during the novel unrewarded picture–location association task, presentation of a previously HR-conditioned picture (now, in a different context that no longer predicts reward) resulted in decreased BOLD signal in the ventral striatum and the hippocampus (Figure 3). Human participants have been shown to rely on relative rather than absolute values, their choices and neural value representation reflecting contextual adaptation to previously learned values in the striatum (Klein et al., 2017) as well as in the SN/VTA (Hétu, Luo, D'Ardenne, Lohrenz, & Montague, 2017). Thus, the associated value signal in the striatum displays the characteristics of a relative value prediction error. We found a similar relative value prediction error-like signal in the ventral striatum. In our task, the consequence of such memory-related relative value coding may have contributed to poorer learning of novel picture–location association. Because the ventral striatum has been shown to reflect the current value of a stimulus (Levy & Glimcher, 2012; Jocham et al., 2011; Lim, O'Doherty, & Rangel, 2011), we may interpret this result as devaluation of the stimulus mediated by the ventral striatum and functional connectivity with the SN/VTA (Figure 7), suggesting the involvement of the mesocorticolimbic dopaminergic pathway (Hauser et al., 2009).

Downward Value Adjustment Spreads to Semantically Similar Stimuli

In the current experiment, we found that the ventral striatum and hippocampus signaled a negative reward prediction error signal for pictures from the entire semantic category (sea or savanna) that had been paired with a high level of reward (Figures 3 and 9).

Previous studies reported that the ventral striatum prediction error during reward learning incorporates information related not only to the predictive cue (the specific rewarded pictures in our case) but also to perceptually similar cues (encoded by the hippocampus), accounting for generalization of reward expectation (Aberg, Doell, & Schwartz, 2015; Gerraty, Davidow, Wimmer, Kahn, & Shohamy, 2014; Kahnt, Park, Burke, & Tobler, 2012). The degree of perceptual generalization at a transfer test depends on dopaminergic transmission and correlates with activation in the hippocampus (Kahnt & Tobler, 2016). Our results suggest that fMRI signal related to reward memory in the ventral striatum and the hippocampus areas could also represent a semantic (rather than purely perceptual) dimension of generalization, analogous to such a generalization previously reported in the fear learning domain (Dunsmoor, White, & LaBar, 2011). Moreover, these effects also influenced the connectivity between the SN/VTA and the ventral striatum (Figure 7), which could indicate a role of the dopaminergic circuit in value generalization (Bunzeck, Dayan, Dolan, & Duzel, 2010).

Downward Value Adjustment May Impede Subsequent Associative Memory Formation

At odds with the literature showing a positive modulation of memory by reward, we found that learning of new associations with previously highly rewarded material did not lead to better but, if anything, rather worse memory and that the interaction between the hippocampus and the ventral striatum appeared to mediate this effect.

Specifically, we found decreased functional connectivity between the ventral striatum and the left hippocampus during encoding of picture–location associations with previously highly rewarded pictures (Figure 7), including the inferior temporal gyrus. In addition, delayed recall of associations with picture carrying a history of HR 24 hr after encoding was accompanied by decreased functional connectivity between the left hippocampus and the vmPFC as well as the ventral striatum (Figure 9), structures of the brain's automatic valuation system (Lebreton, Jorge, Michel, Thirion, & Pessiglione, 2009). These results suggest impaired new memory formation when signal in the value-related brain areas decreases because of relative cue-value adjustment, resulting in a decrease in memory accuracy after night-time consolidation—the time when an effect of memory modulation by reward has been shown to emerge (Igloi et al., 2015; Javadi, Tolat, & Spiers, 2015; Adcock et al., 2006). We stipulate that the magnitude of dopaminergic ventral striatum activation modulated hippocampal activity, consistent with the role of dopamine in memory modulation (Miendlarzewska et al., 2016; Thomas, 2015; Lisman et al., 2011; Lisman & Grace, 2005). To summarize, although the brain revealed low activation in the reward circuit for HR history stimuli (compared to LR stimuli) on both days of the experiment, the behavioral trend emerged only after 24 hr. Such delayed effects are consistent with a large body of evidence supporting that memory consolidation mechanisms, including the lasting modulation of memory retention by reward, may require sleep (Igloi et al., 2015; Javadi et al., 2015).

Limitation

One limitation of the study may relate to the design of the study, in which HR and LR trials were grouped into mini-blocks. This structure was chosen to minimize that activation of the reward system in response to an HR trial would spill over to an immediately following LR trial. Yet, blocked presentation of rewarding events could potentially increase expectation (i.e., reduce reward prediction error-like activity) during a mini-block. We therefore analyzed the time course of the ventral striatum activity within mini-blocks and found no evidence that the average decrease in activity for HR compared to LR pictures could be because of dynamic changes (e.g., in prediction error) that would unfold over the course of the mini-blocks.

Conclusion

Despite the potentially important implications for learning in an educational context, the consequences of past reward on new memory formation have not received much attention in cognitive neuroscience. Our findings go beyond previous human studies of this phenomenon to show that the downward value adjustment after discontinuation of rewards may generalize to semantically related stimuli and that this undermining effect can impact both encoding and postconsolidation recall activity of the hippocampus.

We found that, although participants reported increased subjective preference for ConHR pictures, they showed no learning improvement and reduced ventral striatum and hippocampal activation for these same pictures as well as for semantically similar new exemplars (compared to LR pictures) during the subsequent object–location task and at recall 24 hr later.

Our study provides neural evidence to help explain the potential paradoxical undermining effects of monetary incentive on subsequent performance in human volunteers that has been observed in some situations (Murayama et al., 2010). We show that the removal of the incentive leads to a reduction in relative reward (negative prediction error) in the ventral striatum.

Author Contributions

Conceptualization, E. M., K. A., S. S.; Methodology, E. M., K. A., D. B., S. S.; Software, E. M. and S. S.; Formal Analysis, E. M.; Investigation, E. M.; Writing–original draft, E. M.; Writing–review & editing, E. M., K. A., S. S., D. B.; Visualization, E. M.; Supervision, S. S., D.B.; Funding Acquisition, S. S.

Funding Information

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (http://dx.doi.org/10.13039/501100001711), Grant numbers: 320030-135653, 320030-159862, 51NF40-104897.

Acknowledgments

This work was supported by the National Center of Competence in Research Affective Sciences financed by the Swiss National Science Foundation (grant number: 51NF40-104897 to S. S.) and hosted by the University of Geneva and by the Swiss National Science Foundation (grant numbers: 320030-159862 and 320030-135653 to S. S.).

Reprint requests should be sent to Sophie Schwartz, Department of Basic Neuroscience, University of Geneva, 211132 Geneva, Switzerland, or via e-mail: Sophie.Schwartz@unige.ch.

REFERENCES

REFERENCES
Aberg
,
K. C.
,
Doell
,
K. C.
, &
Schwartz
,
S.
(
2015
).
Hemispheric asymmetries in striatal reward responses relate to approach–avoidance learning and encoding of positive–negative prediction errors in dopaminergic midbrain regions
.
Journal of Neuroscience
,
35
,
14491
14500
.
Adcock
,
R. A.
,
Thangavel
,
A.
,
Whitfield-Gabrieli
,
S.
,
Knutson
,
B.
, &
Gabrieli
,
J. D. E.
(
2006
).
Reward-motivated learning: Mesolimbic activation precedes memory formation
.
Neuron
,
50
,
507
517
.
Alba-Ferrara
,
L.
,
Müller-Oehring
,
E. M.
,
Sullivan
,
E. V.
,
Pfefferbaum
,
A.
, &
Schulte
,
T.
(
2015
).
Brain responses to emotional salience and reward in alcohol use disorder
.
Brain Imaging and Behavior
,
10
,
136
146
.
Atherton
,
L. A.
,
Dupret
,
D.
, &
Mellor
,
J. R.
(
2015
).
Memory trace replay: The shaping of memory consolidation by neuromodulation
.
Trends in Neurosciences
,
38
,
560
570
.
Baayen
,
R. H.
, &
Milin
,
P.
(
2010
).
Analyzing reaction times
.
International Journal of Psychological Research
,
3
,
12
28
.
Bartra
,
O.
,
McGuire
,
J. T.
, &
Kable
,
J. W.
(
2013
).
The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value
.
Neuroimage
,
76
,
412
427
.
Beck
,
A. T.
,
Steer
,
R. A.
, &
Brown
,
G. K.
(
1996
).
Beck depression inventory-II
(
Vol. 78
, pp.
490
498
).
San Antonio, TX
:
Psychologial Press
.
Bouton
,
M. E.
(
2007
).
Learning and behavior
.
Sunderland, UK
:
Sinauer Associated, Inc. Publishers
.
Bridge
,
D. J.
, &
Voss
,
J. L.
(
2014
).
Hippocampal binding of novel information with dominant memory traces can support both memory stability and change
.
Journal of Neuroscience
,
34
,
2203
2213
.
Bunzeck
,
N.
,
Dayan
,
P.
,
Dolan
,
R. J.
, &
Duzel
,
E.
(
2010
).
A common mechanism for adaptive scaling of reward and novelty
.
Human Brain Mapping
,
31
,
1380
1394
.
Carver
,
C. S.
, &
White
,
T. L.
(
1994
).
Behavioral inhibition, behavioral activation, and affective responses to impending reward and punishment: The BIS/BAS scales
.
Journal of Personality and Social Psychology
,
67
,
319
.
Cohen
,
N.
,
Pell
,
L.
,
Edelson
,
M. G.
,
Ben-Yakov
,
A.
,
Pine
,
A.
, &
Dudai
,
Y.
(
2014
).
Peri-encoding predictors of memory encoding and consolidation
.
Neuroscience and Biobehavioral Reviews
,
50
,
128
142
.
D'Ardenne
,
K.
,
McClure
,
S. M.
,
Nystrom
,
L. E.
, &
Cohen
,
J. D.
(
2008
).
BOLD responses reflecting dopaminergic signals in the human ventral tegmental area
.
Science
,
319
,
1264
1267
.
Delplanque
,
S.
,
N'diaye
,
K.
,
Scherer
,
K.
, &
Grandjean
,
D.
(
2007
).
Spatial frequencies or emotional effects? A systematic measure of spatial frequencies for IAPS pictures by a discrete wavelet analysis
.
Journal of Neuroscience Methods
,
165
,
144
150
.
Duarte
,
A.
,
Henson
,
R. N.
,
Knight
,
R. T.
,
Emery
,
T.
, &
Graham
,
K. S.
(
2010
).
Orbito-frontal cortex is necessary for temporal context memory
.
Journal of Cognitive Neuroscience
,
22
,
1819
1831
.
Dudai
,
Y.
(
2012
).
The restless engram: Consolidations never end
.
Annual Review of Neuroscience
,
35
,
227
247
.
Dudai
,
Y.
,
Karni
,
A.
, &
Born
,
J.
(
2015
).
The consolidation and transformation of memory
.
Neuron
,
88
,
20
32
.
Dunsmoor
,
J. E.
,
White
,
A. J.
, &
LaBar
,
K. S.
(
2011
).
Conceptual similarity promotes generalization of higher order fear learning
.
Learning & Memory
,
18
,
156
160
.
Feinberg
,
D.
,
Moeller
,
S.
,
Smith
,
S. M.
,
Auerbach
,
E.
,
Ramanna
,
S.
,
Gunther
,
M.
, et al
(
2010
).
Multiplexed echo planar imaging for sub-second whole brain FMRI and fast diffusion imaging
.
PLoS One
,
5
,
e15710
.
Friston
,
K. J.
,
Holmes
,
A. P.
,
Worsley
,
K. J.
,
Poline
,
J.-P.
,
Frith
,
C. D.
, &
Frackowiak
,
R. S. J.
(
1994
).
Statistical parametric maps in functional imaging: A general linear approach
.
Human Brain Mapping
,
2
,
189
210
.
Garrison
,
J.
,
Erdeniz
,
B.
, &
Done
,
J.
(
2013
).
Prediction error in reinforcement learning: A meta-analysis of neuroimaging studies
.
Neuroscience & Biobehavioral Reviews
,
37
,
1297
1310
.
Gerraty
,
R. T.
,
Davidow
,
J. Y.
,
Wimmer
,
G. E.
,
Kahn
,
I.
, &
Shohamy
,
D.
(
2014
).
Transfer of learning relates to intrinsic connectivity between hippocampus, ventromedial prefrontal cortex, and large-scale networks
.
Journal of Neuroscience
,
34
,
11297
11303
.
Gruber
,
M. J.
,
Gelman
,
B. D.
, &
Ranganath
,
C.
(
2014
).
States of curiosity modulate hippocampus-dependent learning via the dopaminergic circuit
.
Neuron
,
1
11
.
Haber
,
S. N.
, &
Knutson
,
B.
(
2010
).
The reward circuit: Linking primate anatomy and human imaging
.
Neuropsychopharmacology
,
35
,
4
26
.
Hauser
,
T. U.
,
Eldar
,
E.
, &
Dolan
,
R. J.
(
2017
).
Separate mesocortical and mesolimbic pathways encode effort and reward learning signals
.
Proceedings of the National Academy of Sciences
,
144
,
E7395
E7404
.
Hétu
,
S.
,
Luo
,
Y.
,
D'Ardenne
,
K.
,
Lohrenz
,
T.
, &
Montague
,
P. R.
(
2017
).
Human substantia nigra and ventral tegmental area involvement in computing social error signals during the ultimatum game
.
Social Cognitive and Affective Neuroscience
,
12
,
1972
1982
.
Horner
,
A. J.
,
Bisby
,
J. A.
,
Bush
,
D.
,
Lin
,
W.-J.
, &
Burgess
,
N.
(
2015
).
Evidence for holistic episodic recollection via hippocampal pattern completion
.
Nature Communications
,
6
,
7462
.
Igloi
,
K.
,
Gaggioni
,
G.
,
Sterpenich
,
V.
, &
Schwartz
,
S.
(
2015
).
A nap to recap or how reward regulates hippocampal–prefrontal memory networks during daytime sleep in humans
.
eLife
,
4
,
e07903
.
Javadi
,
A.
,
Tolat
,
A.
, &
Spiers
,
H. J.
(
2015
).
Sleep enhances a spatially mediated generalization of learned values
.
Learning and Memory
,
22
,
532
536
.
Jocham
,
G.
,
Klein
,
T. A.
, &
Ullsperger
,
M.
(
2011
).
Dopamine-mediated reinforcement learning signals in the striatum and ventromedial prefrontal cortex underlie value-based choices
.
Journal of Neuroscience
,
31
,
1606
1613
.
Kahnt
,
T.
,
Park
,
S. Q.
,
Burke
,
C. J.
, &
Tobler
,
P. N.
(
2012
).
How glitter relates to gold: Similarity-dependent reward prediction errors in the human striatum
.
Journal of Neuroscience
,
32
,
16521
16529
.
Kahnt
,
T.
, &
Tobler
,
P. N.
(
2016
).
Dopamine regulates stimulus generalization in the human hippocampus
.
eLife
,
5
,
1
20
.
Klein
,
T. A.
,
Ullsperger
,
M.
, &
Jocham
,
G.
(
2017
).
Learning relative values in the striatum induces violations of normative decision making
.
Nature Communications
,
8
,
16033
.
Knutson
,
B.
,
Fong
,
G. W.
,
Adams
,
C. M.
,
Varner
,
J. L.
, &
Hommer
,
D.
(
2001
).
Dissociation of reward anticipation and outcome with event-related fMRI
.
NeuroReport
,
12
,
3683
3687
.
Kringelbach
,
M. L.
, &
Berridge
,
K. C.
(
2016
).
Neuroscience of reward, motivation, and drive
. In
Recent Developments in Neuroscience Research on Human Motivation (Advances in Motivation and Achievement
(
Vol. 19
, pp.
23
35
).
Emerald Group Publishing Limited
.
Lebreton
,
M.
,
Jorge
,
S.
,
Michel
,
V.
,
Thirion
,
B.
, &
Pessiglione
,
M.
(
2009
).
An automatic valuation system in the human brain: Evidence from functional neuroimaging
.
Neuron
,
64
,
431
439
.
Levy
,
D. J.
, &
Glimcher
,
P. W.
(
2012
).
The root of all value: A neural common currency for choice
.
Current Opinion in Neurobiology
,
22
,
1027
1038
.
Lim
,
S.-L.
,
O'Doherty
,
J. P.
, &
Rangel
,
A.
(
2011
).
The decision value computations in the vmPFC and striatum use a relative value code that is guided by visual attention
.
Journal of Neuroscience
,
31
,
13214
13223
.
Lisman
,
J.
, &
Grace
,
A.
(
2005
).
The hippocampal–VTA loop: Controlling the entry of information into long-term memory
.
Neuron
,
46
,
703
713
.
Lisman
,
J.
,
Grace
,
A.
, &
Duzel
,
E.
(
2011
).
A neoHebbian framework for episodic memory; role of dopamine-dependent late LTP
.
Trends in Neurosciences
,
34
,
536
547
.
Liu
,
X.
,
Hairston
,
J.
,
Schrier
,
M.
, &
Fan
,
J.
(
2011
).
Common and distinct networks underlying reward valence and processing stages: A meta-analysis of functional neuroimaging studies
.
Neuroscience & Biobehavioral Reviews
,
35
,
1219
1236
.
Loh
,
E.
,
Kumaran
,
D.
,
Koster
,
R.
,
Berron
,
D.
,
Dolan
,
R.
, &
Duzel
,
E.
(
2016
).
Context-specific activation of hippocampus and SN/VTA by reward is related to enhanced long-term memory for embedded objects
.
Neurobiology of Learning and Memory
,
134
,
65
77
.
Maldjian
,
J. A.
,
Laurienti
,
P. J.
,
Kraft
,
R. A.
, &
Burdette
,
J. H.
(
2003
).
An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets
.
Neuroimage
,
19
,
1233
1239
.
Manelis
,
A.
,
Reder
,
L. M.
, &
Hanson
,
S. J.
(
2012
).
Dynamic changes in the medial temporal lobe during incidental learning of object–location associations
.
Cerebral Cortex
,
22
,
828
837
.
Mazaika
,
P.
,
Hoeft
,
F.
,
Glover
,
G. H.
, &
Reiss
,
A. L.
. (
2009
).
Methods and software for fMRI analysis for clinical subjects
.
Neuroimage
.
47
,
S58
.
Miendlarzewska
,
E. A.
,
Bavelier
,
D.
, &
Schwartz
,
S.
(
2016
).
Influence of reward motivation on human declarative memory
.
Neuroscience & Biobehavioral Reviews
,
61
,
156
176
.
Miendlarzewska
,
E. A.
,
Ciucci
,
S.
,
Cannistraci
,
C. V.
,
Bavelier
,
D.
, &
Schwartz
,
S.
(
2018
).
Reward-enhanced encoding improves relearning of forgotten associations
.
Scientific Reports
,
8
,
8557
.
Murayama
,
K.
,
Matsumoto
,
M.
,
Izuma
,
K.
, &
Matsumoto
,
K.
(
2010
).
Neural basis of the undermining effect of monetary reward on intrinsic motivation
.
Proceedings of the National Academy of Sciences, U.S.A.
,
107
,
20911
20916
.
Murty
,
V. P.
, &
Adcock
,
R. A.
(
2014
).
Enriched encoding: Reward motivation organizes cortical networks for hippocampal detection of unexpected events
.
Cerebral Cortex
,
24
,
2160
2168
.
Nakahara
,
H.
,
Itoh
,
H.
,
Kawagoe
,
R.
,
Takikawa
,
Y.
, &
Hikosaka
,
O.
(
2004
).
Dopamine neurons can represent context-dependent prediction error
.
Neuron
,
41
,
269
280
.
O'Doherty
,
J. P.
,
Buchanan
,
T. W.
,
Seymour
,
B.
, &
Dolan
,
R. J.
(
2006
).
Predictive neural coding of reward preference involves dissociable responses in human ventral midbrain and ventral striatum
.
Neuron
,
49
,
157
166
.
Palminteri
,
S.
,
Khamassi
,
M.
,
Joffily
,
M.
, &
Coricelli
,
G.
(
2015
).
Contextual modulation of value signals in reward and punishment learning
.
Nature Communications
,
6
(
1
),
8096
.
Patil
,
A.
,
Murty
,
V. P.
,
Dunsmoor
,
J. E.
,
Phelps
,
E. A.
, &
Davachi
,
L.
(
2017
).
Reward retroactively enhances memory consolidation for related items
.
Learning & Memory
,
24
,
65
69
.
Peciña
,
S.
, &
Berridge
,
K. C.
(
2013
).
Dopamine or opioid stimulation of nucleus accumbens similarly amplify cue-triggered “wanting” for reward: Entire core and medial shell mapped as substrates for PIT enhancement
.
European Journal of Neuroscience
,
37
,
1529
1540
.
Redondo
,
R. L.
, &
Morris
,
R. G. M.
(
2011
).
Making memories last: The synaptic tagging and capture hypothesis
.
Nature Reviews Neuroscience
,
12
,
17
30
.
Rigoli
,
F.
,
Friston
,
K. J.
, &
Dolan
,
R. J.
(
2016
).
Neural processes mediating contextual influences on human choice behaviour
.
Nature Communications
,
7
,
12416
.
Ripollés
,
P.
,
Marco-Pallarés
,
J.
,
Alicart
,
H.
,
Tempelmann
,
C.
,
Rodríguez-Fornells
,
A.
, &
Noesselt
,
T.
(
2016
).
Intrinsic monitoring of learning success facilitates memory encoding via the activation of the SN/VTA–hippocampal loop
.
eLife
,
5
,
1
36
.
Rorden
,
C.
,
Karnath
,
H.-O.
, &
Bonilha
,
L.
(
2007
).
Improving lesion-symptom mapping
.
Journal of Cognitive Neuroscience
,
19
,
1081
1088
.
Samanez-Larkin
,
G. R.
, &
Knutson
,
B.
(
2015
).
Decision making in the ageing brain: Changes in affective and motivational circuits
.
Nature Reviews Neuroscience
,
16
,
278
289
.
Schneider
,
C. A.
,
Rasband
,
W. S.
, &
Eliceiri
,
K. W.
(
2012
).
NIH Image to ImageJ: 25 years of image analysis
.
Nature Methods
,
9
,
671
675
.
Schott
,
B. H.
,
Seidenbecher
,
C. I.
,
Fenker
,
D. B.
,
Lauer
,
C. J.
,
Bunzeck
,
N.
,
Bernstein
,
H.-G.
, et al
(
2006
).
The dopaminergic midbrain participates in human episodic memory formation: evidence from genetic imaging
.
Journal of Neuroscience
,
26
,
1407
1417
.
Seitz
,
A.
,
Kim
,
D.
, &
Watanabe
,
T.
(
2009
).
Rewards evoke learning of unconsciously processed visual stimuli in adult humans
.
Neuron
,
61
,
700
707
.
Sescousse
,
G.
,
Caldú
,
X.
,
Segura
,
B.
, &
Dreher
,
J. C.
(
2013
).
Processing of primary and secondary rewards: A quantitative meta-analysis and review of human functional neuroimaging studies
.
Neuroscience & Biobehavioral Reviews
,
37
,
681
696
.
Singer
,
A.
, &
Frank
,
L.
(
2009
).
Rewarded outcomes enhance reactivation of experience in the hippocampus
.
Neuron
,
64
,
910
921
.
Sommer
,
T.
,
Rose
,
M.
,
Gläscher
,
J.
,
Wolbers
,
T.
, &
Büchel
,
C.
(
2005
).
Dissociable contributions within the medial temporal lobe to encoding of object–location associations
.
Learning & Memory
,
12
,
343
351
.
Spielberger
,
C.
(
1983
).
State–trait anxiety inventory
.
Palo Alto, CA
:
Consulting Psychologists Press
.
Takashima
,
A.
,
Nieuwenhuis
,
I. L. C.
,
Jensen
,
O.
,
Talamini
,
L. M.
,
Rijpkema
,
M.
, &
Fernández
,
G.
(
2009
).
Shift from hippocampal to neocortical centered retrieval network with consolidation
.
Journal of Neuroscience
,
29
,
10087
10093
.
Thomas
,
S. A.
(
2015
).
Neuromodulatory signaling in hippocampus-dependent memory retrieval
.
Hippocampus
,
25
,
415
431
.
Whitfield-Gabrieli
,
S.
, &
Nieto-Castanon
,
A.
(
2012
).
Conn: A functional connectivity toolbox for correlated and anticorrelated brain networks
.
Brain Connectivity
,
2
,
125
141
.
Wimmer
,
G. E.
,
Braun
,
E. K.
,
Daw
,
N. D.
, &
Shohamy
,
D.
(
2014
).
Episodic memory encoding interferes with reward learning and decreases striatal prediction errors
.
Journal of Neuroscience
,
34
,
14901
14912
.
Wimmer
,
G E.
, &
Buechel
,
C.
(
2016
).
Reactivation of reward-related patterns from single past episodes supports memory-based decision making
.
Journal of Neuroscience
,
36
,
2868
2880
.
Wimmer
,
G. E.
,
Daw
,
N.
, &
Shohamy
,
D.
(
2012
).
Generalization of value in reinforcement learning by humans
.
European Journal of Neuroscience
,
35
,
1092
1104
.
Wimmer
,
G. E.
, &
Shohamy
,
D.
(
2012
).
Preference by association: How memory mechanisms in the hippocampus bias decisions
.
Science
,
338
,
270
273
.
Wittmann
,
B. C.
,
Schott
,
B. H.
,
Guderian
,
S.
,
Frey
,
J. U.
,
Heinze
,
H.-J.
, &
Düzel
,
E.
(
2005
).
Reward-related FMRI activation of dopaminergic midbrain is associated with enhanced hippocampus-dependent long-term memory formation
.
Neuron
,
45
,
459
467
.
Yarkoni
,
T.
,
Poldrack
,
R. A.
,
Nichols
,
T. E.
,
Van Essen
,
D. C.
, &
Wager
,
T. D.
(
2011
).
Large-scale automated synthesis of human functional neuroimaging data
.
Nature Methods
,
8
,
665
670
.