## Abstract

With multiple learning and memory systems at its disposal, the human brain can represent the past in many ways, from extracting regularities across similar experiences (incremental learning) to storing rich, idiosyncratic details of individual events (episodic memory). The unique information carried by these neurologically distinct forms of memory can bias our behavior in different directions, raising crucial questions about how these memory systems interact to guide choice and the factors that cause one to dominate. Here, we devised a new approach to estimate how decisions are independently influenced by episodic memories and incremental learning. Furthermore, we identified a biologically motivated factor that biases the use of different memory types—the detection of novelty versus familiarity. Consistent with computational models of cholinergic memory modulation, we find that choices are more influenced by episodic memories following the recognition of an unrelated familiar image but more influenced by incrementally learned values after the detection of a novel image. Together this work provides a new behavioral tool enabling the disambiguation of key memory behaviors thought to be supported by distinct neural systems while also identifying a theoretically important and broadly applicable manipulation to bias the arbitration between these two sources of memories.

## INTRODUCTION

Decades of research have demonstrated that experiences are encoded within neurally distinct learning and memory systems (Poldrack et al., 2001; Squire & Zola, 1996; Knowlton, Squire, & Gluck, 1994; McDonald & White, 1993), each of which can bias future choices. For example, after reading a compelling article, you could later recall your experience reading a particular finding—an episodic memory (Tulving, 1983)—as well as increase your valuation of the journal itself and, consequently, the likelihood of your searching for new articles within it—incremental reward learning (Sutton & Barto, 1998). This multiple memory system framework has proven powerful, providing a structure for understanding learning and memory disorders (Small, Schobel, Buxton, Witter, & Barnes, 2011; Yassa, Mattfeld, Stark, & Stark, 2011; Frank, Seeberger, & O'reilly, 2004; Knowlton et al., 1994) and driving recent investigation into the distinct learning mechanisms that underlie each separate system. Understanding the interactions between these forms of learning has important implications for how memory guides later behavior. However, most research so far has focused on studying each system in isolation, leaving open important questions about how they interact, how they guide decisions, and which factors bias their use.

Incremental learning of value is often studied by characterizing how values are updated through experience (Daw, 2011)—a form of reinforcement learning thought to depend on dopamine release in the striatum (Schultz, 1998, 2016; Schonberg et al., 2010; Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006; Barto, 1995). Experiments in this field tend to present the same choice options across hundreds of trials. Each choice is reinforced with a reward or loss. The difference between the obtained outcome and the choice's expected value (reward prediction error) is thought to be signaled by dopamine release (Schultz, 1998; Sutton & Barto, 1998), so that unexpected outcomes incrementally nudge the values attributed to each option toward the experienced outcome. In this way, incremental learning extracts the decontextualized value of each choice option by averaging across experiences and discarding the individual episodes.

A separate line of research has focused on episodic memory and its dependence on hippocampal processes (Tulving & Markowitsch, 1998; Cohen & Eichenbaum, 1993; Squire, 1992). This research tends to assess the influence of past experiences on behavior by asking participants to directly reflect on them. Experiments often begin with an encoding phase in which participants are presented with a unique stimulus on each trial. This is followed by a retrieval phase in which new stimuli are intermixed with those from the encoding session (old stimuli). Because old stimuli were only presented once before, they unambiguously cue memory for a single episode, which participants are asked to recall. Damage to the hippocampus robustly impairs both the encoding of these episodic memories and their retrieval (Gilboa et al., 2006; Clark, Broadbent, Zola, & Squire, 2002; Nadel & Moscovitch, 1997; Scoville & Milner, 1957). Analogously, hippocampal fMRI activity has been related to the successful encoding (Ranganath et al., 2004; Davachi, Mitchell, & Wagner, 2003) and retrieval (Eldridge, Knowlton, Furmanski, Bookheimer, & Engel, 2000) of episodic memory in healthy populations, providing converging evidence for the hippocampus' involvement in episodic memory.

Unlike these experimental paradigms, in our everyday experiences, we are constantly tasked with choosing between options that could trigger the retrieval and use of distinct episodic memories as well as incrementally learned values. Which system will dominate in these less prescriptive contexts? Not only is this quandary frequently encountered, it also has important implications. For example, breaking persistent maladaptive behaviors, such as addiction, often requires overriding ingrained habitual responses, presumed to depend on striatal incremental learning (Volkow, Wang, Fowler, Tomasi, & Telang, 2011), with episodic memories of newly learned strategies, presumed to depend on the hippocampus (Tulving & Markowitsch, 1998). Thus, the challenge facing adaptive memory is often not just selecting the right memories but instead biasing behavior toward the most appropriate memory system.

At a neural level, one factor that may arbitrate between these systems is cholinergic regulation of hippocampal-dependent memory. Neuromodulation plays a central role in theories of incremental learning, with dopamine regulating when new learning should occur (Barto, 1995). Cholinergic neuromodulation has also been hypothesized to play a somewhat analogous role in hippocampal memory; it has been argued to bias when the hippocampus engages in the encoding versus the retrieval of episodic memories (Easton, Douchamps, Eacott, & Lever, 2012; Meeter, Murre, & Talamini, 2004; Hasselmo, Wyble, & Wallenstein, 1996). Specifically, neurocomputational models (Meeter et al., 2004; Hasselmo et al., 1996) and empirical findings (Vandecasteele et al., 2014; Douchamps, Jeewajee, Blundell, Burgess, & Lever, 2013) suggest that acetylcholine levels maintain prolonged hippocampal “states,” which last for a few seconds at a time (Pabst et al., 2016; Hasselmo & Fehlau, 2001). The value of these states lies in their ability to accommodate the competing computational demands of memory formation and retrieval (O'Reilly & McClelland, 1994), optimally timing each memory process based on contextual factors. Specifically, the recognition of familiar contexts decreases cholinergic transmission to favor further episodic memory retrieval in the presence of potential memory cues. Conversely, the detection of novel contexts increases cholinergic transmission (Giovannini et al., 2001) to favor episodic memory encoding in the presence of new and unexpected information. We have previously explored the consequences of this hypothesis for human behavior, demonstrating that recent exposure to familiar images increases the ability to later recall unrelated associations (Patil & Duncan, 2018) and increases the likelihood that people use episodic memory cues to make decisions (Duncan & Shohamy, 2016). By contrast, recent exposure to novel images increases the subsequent formation of episodic memories, which can later be used to guide future actions (Duncan & Shohamy, 2016).

Our earlier research, however, only assessed the impact of familiarity- and novelty-evoked states on episodic memories; memory cues were unambiguously associated with single past events. Thus, the important question remains—can novelty bias the type of memory used by individuals when they have multiple memory sources at their disposal? The cholinergic framework would predict that episodic memories are more likely to dominate over competing incrementally learned values to guide choice in familiar context, because recognizing one familiar cue primes episodic memory to take advantage of other familiar episodic memory cues.

To test this prediction, we modified our episodic decision-making paradigm (Duncan & Shohamy, 2016) so that people are now free to make choices using both individual episodic memories and incrementally learned values. The few established paradigms that assess both hippocampal- and striatal-mediated learning do so by pitting the systems against each other, reinforcing either the hippocampal- or striatal-dependent behavior at different points (Packard, 2009). This approach, however, offers limited insights into how people arbitrate between two sources of memories when both are viable sources of information. By contrast, here we combine elements of incremental reward learning and episodic memory tasks into a single paradigm in which both types of learning are simultaneously reinforced. Specifically, participants chose one of two cards dealt in a computerized card game. Each card had two dimensions: (1) a distinctive object, which was repeated at most once during the experiment, and (2) a deck color, which was present on every trial and probabilistically related to reward distributions. Thus, the participant could either use the distinctive objects to cue episodic memories of a specific card's values or they could use the colored decks by incrementally updating and refining their values across trials. Importantly, we controlled trial sequences and outcomes to decorrelate the object and deck values, allowing us to independently estimate the influence of each on participants' choices.

We used this paradigm to assess whether recent novelty and familiarity can arbitrate between the use of episodic memories and incrementally learned values. First, we demonstrated that people are more likely to select cards using episodic object memories following the successful retrieval and use of other object memories. Conversely, the use of object memories negatively impacted the subsequent use of incrementally learned deck values. These biases were robustly observed in both reward (Experiment 1A) and loss experiments (Experiment 1B), underscoring the power of the manipulation. Second, we further challenged the underlying mechanisms driving the bias using a contextual novelty manipulation; we displayed an unrelated novel or familiar scene image before dealing cards on each trial (Experiment 2). Despite the incidental relationship between the scene images and the task, familiar scenes increased the use of episodic memories of object values at the cost of incrementally learned deck values. This combination of results supports a neurally inspired model of memory in which accessing the contents of one system increases the subsequent accessibility of other unrelated memories stored within the same system.

## EXPERIMENT 1

In Experiments 1A and 1B we designed a new task to disambiguate the contributions of episodic and incremental learning to behavior. On each of a series of trials, participants chose between two cards for the chance to win a monetary reward. Each card had two features: a distinctive object and the color of the deck from which the card was drawn. We designed these features to tap distinct forms of memory. As in our previous work, each object repeated at most once in the task so that using memory for the objects required the rapid acquisition of an object value association (Duncan & Shohamy, 2016). Here, we added a second dimension to this choice problem—two decks (red and blue) that were available on every trial and were probabilistically associated with reward outcomes. This noisy and nonstationary relationship between decks and outcomes could be efficiently extracted by incrementally updating the values of decks across trials, forming a recency-weighted average of value across experience. We first confirmed that participants could employ both types of learning without experiencing trade-offs or interactions between them in both reward (Experiment 1A) and loss (Experiment 1B) experiments. We then tested our prediction that, when people have multiple types of memory at their disposal, recognizing familiar cards would increase the use of episodic memory on the next trial, potentially at the cost of incremental value learning.

### Methods

Sixty-one members of the Columbia University community (30 women, mean age = 20.2 years) participated in the study for pay ($12/hr + bonus earnings). Participants were divided between Experiment 1A (31 participants) and Experiment 1B (30 participants). All participants provided informed consent at the beginning of the study, and all procedures were approved by the Columbia Morningside ethics board. #### Stimuli Five hundred forty-three color images of different commonplace objects were used as stimuli. Additionally, two decks of virtual playing cards (red and blue decks) were generated. At any point in the experiment, one deck drew outcomes from a lucky distribution (average outcome = 63¢), whereas the other drew from an unlucky distribution (average outcome = 37¢; Table 1). The left versus right position of the card was randomly assigned for every trial. Table 1. Distribution of Old Object Card Values by Deck Luckiness 20¢40¢60¢80¢$1Mean
Lucky deck 16% 18% 16% 18% 17% 15% 49¢
Unlucky deck 17% 16% 17% 16% 16% 16% 49¢

#### Reinforcement Learning Model

We modified a temporal difference reinforcement learning model (Sutton & Barto, 1998) to derive tailored estimates of each participant's learning of both deck and object values based on their personal history of choices and experienced outcomes. This model assumes that the value, Q, of the chosen deck, dc, is updated on every trial, t, based on the difference between the expected value of the chosen deck and the obtained reward, r. This difference is termed the prediction error, δ
$δt=rt−Qt−1dc$
The degree to which this prediction error updates the value of the chosen deck depends on the learning rate parameter, α
$Qtdc=Qt−1dc+αδt$
As α approaches 1, the value of the chosen deck will approach the most recently received outcome. As α approaches 0, the value of the chosen deck will be minimally updated. Intermediate values of α will result in deck values that integrate across outcomes, reflecting an incremental learning process. We fit α to participants' behavior, restricting its range to be between 0 and 1.
The value of the unchosen deck, du, was not updated:
$Qtdu=Qt−1du$
The deck values and the participant's history with the object on the card were used to compute the probability, P, of selecting the card from the red deck using a softmax (logistic) choice rule:
$Pdred=11+exp−βdQred−Qblue+βoVo+βrO$
Here, the inverse temperature parameter, βd, controls how closely the difference in deck values tracks choices. βo controls the influence of the old object card's value, Vo. Vo is positive when the old card is in the red deck, negative when the old card is in the blue deck, and 0 when there is no old card. βr controls how much participants prefer cards with old objects, O (0 = red deck card is new; 1 = red deck card is old). Each β parameter was fit to participants' choice with a lower bound of 0.

We estimated the four free parameters (α, βd, βo, and βr) for each participant by minimizing the sum of the negative log likelihoods of choices given the estimated probability, P, of each choice using constrained nonlinear optimization (fmincon, MATLAB). We repeated the search with all parameters set at five random starting points to avoid local optima and selected the iteration, which resulted in the lowest negative log likelihoods. To generate Q-value regressors, we used the average α across participants to generate trial-by-trial Q-value estimates based on each participant's personal trial sequence. This approach reduces overfitting and the noisiness in individual participants' estimates (Schönberg, Daw, Joel, & O'Doherty, 2007; Daw, O'Doherty, Dayan, Seymour, & Dolan, 2006).

#### Stimuli

One hundred ninety-five scenes and 460 objects were used as stimuli. Five scenes were randomly assigned to the “familiar” condition, and participants were preexposed to them before starting the experiment. Additionally, two decks of virtual playing cards (red and blue decks) were generated. These decks drew from the same lucky and unlucky distributions as were used in Experiments 1A and 1B (Table 1).

#### Procedure

Participants performed the same card task as in Experiment 1A. The only modifications were the insertion of novel and familiar scenes and a reduction in the total number of trials. Specifically, each choice was preceded by the 1-sec presentation of a novel or familiar scene (referred to for the participants as a “decorative mat”). Participants were told that the purpose of the mat was to prepare them for the upcoming cards. The scene remained on the screen for the subsequent 1.5 sec decision period. Participants were preexposed to the five familiar scenes in a brief task immediately preceding the card task; each scene was presented five times (randomly ordered), and participants were asked to indicate whether each image displayed an indoor or outdoor scene.

We designed the experiment to determine whether incidental contextual novelty influences which type of memory is used to make value-based decisions. A challenge for this aim is that incremental learning, by definition occurs over repeated experiences; thus, manipulations to incremental learning are best assessed across trials. With this in mind, we divided the experiment into two blocks, which were designed to manipulate the use of episodically learned values, but which could be used to measure how our manipulation impacts both episodic and incremental memory use. Specifically, we used the results of our related prior experiments to position contextual scenes such that their combined influence during object value encoding (trials with two new cards) and retrieval (trials with one old card) would either drive participants toward or away from using episodic memory.

We previously found that novel images enhance object value encoding, whereas familiar images enhance object value retrieval. We used this manipulation to create blocks that enhance either encoding or retrieval of episodic memories to guide choice. We created “proepisodic” blocks by having novel scenes always precede new–new trials, with the aim of enhancing episodic encoding of the new cards (Figure 4A) and having familiar scenes precede old–new trials, with the aim of enhancing the retrieval and use of object value memories. In “antiepisodic” blocks, we did the reverse: Familiar scenes always preceded new–new trials, whereas novel scenes preceded old–new trials. Participants performed one block of each (approximately 150 trials), and the order of blocks was counterbalanced across participants. Because each scene was presented immediately before and concurrently with specific cards, the scenes could additionally become associated with cards in memory, and thus, their content could also prime memories for card values. To avoid this possibility, the trial sequences were designed such that the same scene was never present during both the learning of a card's value and the subsequent retrieval and use of that value. Additionally, when possible, novel/familiar scene status was reversed for pairs of participants such that similar choices for one participant would be made in the opposite experimental condition as another participant.

Figure 4.

Design and results of Experiment 2. (A) Trials were divided into blocks designed to manipulate the use of episodically learned values. In proepisodic blocks, object values were always encoded in the context of novel scenes. These objects were then later repeated in the context of familiar scenes. The antiepisodic blocks had the reverse scene contingencies. Scenes were described as “decorative mats” and were not predictive of card content or value. (B) Object values were more often used in pro- as compared with antiepisodic blocks, whereas deck learning was superior in anti- as compared with proepisodic blocks. The graphs on the left panel plot how well choices (old vs. new object) were predicted by the old object values in each memory block. The graphs in the right panel plot how likely the lucky deck was to be chosen based on the number of trials that elapsed since a reversal in deck luckiness. *p < .05.

Figure 4.

Design and results of Experiment 2. (A) Trials were divided into blocks designed to manipulate the use of episodically learned values. In proepisodic blocks, object values were always encoded in the context of novel scenes. These objects were then later repeated in the context of familiar scenes. The antiepisodic blocks had the reverse scene contingencies. Scenes were described as “decorative mats” and were not predictive of card content or value. (B) Object values were more often used in pro- as compared with antiepisodic blocks, whereas deck learning was superior in anti- as compared with proepisodic blocks. The graphs on the left panel plot how well choices (old vs. new object) were predicted by the old object values in each memory block. The graphs in the right panel plot how likely the lucky deck was to be chosen based on the number of trials that elapsed since a reversal in deck luckiness. *p < .05.

We hypothesized that the novel/familiar quality of the scene itself, regardless of the actual content of previous associations with the scene, would influence how people use memory to make choices, with novel scenes enhancing encoding of episodic value and familiar scenes enhancing retrieval of episodic value. In line with the results of Experiments 1A and 1B, we additionally hypothesized that modulating the encoding and use of episodic memory would impact the extent to which participants incrementally learn about deck values.

### Results and Discussion

As with Experiment 1, participants' choices were influenced by both the previously experienced values of object cards (β = 2.05, 95% CI = [1.64, 2.47], p < .00001) and by deck luckiness (β = 0.18, 95% CI = [0.10, 0.26], p < .0001). Additionally, the use of object and deck values was not significantly correlated across participants, r(36) =,.08, p = .62, suggesting that participants were able to independently use both types of memories to guide their choices. Object value was a nonsignificant negative predictor of choice in 2 of 38 participants, and these participants were removed from subsequent analyses. Including these participants does not change the pattern of results presented below.

As predicted, the degree to which participants' choices relied on episodic versus incremental learning differed as a function of the experimental block manipulation. A direct comparison of choices in each of the two conditions revealed that participants were significantly more likely to base their choices on memory for distinct object value experiences in the proepisodic block as compared with the antiepisodic block (β = 0.50, p = .03; Figure 4B). Conversely, in the antiepisodic block, participants were more likely to use incrementally learned deck values to guide their choices (β = −0.013, 95% CI = [−0.03, 0], p = .05; Figure 4B), as measured by a more shallow learning curve in the proepisodic block as compared with the antiepisodic block. Thus, when choices can be driven by either episodic or incremental learning, familiar contexts enhance the likelihood that people's choices will be guided by episodically rather than incrementally learned value. Importantly, familiarity was manipulated in a contextual feature, which was incidental to the primary card game; and thus, these results unambiguously support a general state mechanism over a selective attention mechanism.

## GENERAL DISCUSSION

With multiple distinct learning and memory systems at its disposal (Poldrack et al., 2001; Squire & Zola, 1996; Knowlton et al., 1994; McDonald & White, 1993), the human brain has the capacity to represent past experiences in several ways, from extracting regularities across similar experiences and distilling decontextualized values to storing rich idiosyncratic details of individual events. Far from redundant, these different types of memory representations have the potential to drive behavior in different directions, raising crucial questions about how and when different memory systems guide choices. Here, we devised a new approach to estimate the independent influence of episodic memories from incremental learning on people's choices. Designing a task in which both sources of value learning could adaptively guide behavior allowed us to also identify a factor that arbitrates between the use of these different types of memory—recent mnemonic processing. Specifically, people were more likely to use episodic memories on the trials following retrieval of an episodic memory, even when the recently retrieved memory had no bearing on the task at hand.

Our results suggest that the simple act of episodic retrieval primes our brains to retrieve other, ostensibly unrelated episodic memories within the ensuing seconds. The demonstration of this behavioral phenomenon confirms the predictions of cholinergic models of hippocampal function (Easton et al., 2012; Meeter et al., 2004; Hasselmo et al., 1996). According to these models, cholinergic modulation of the hippocampus establishes sustained biases, which shape hippocampal processing for seconds (Pabst et al., 2016; Hasselmo & Fehlau, 2001). Higher cholinergic levels, evoked by detecting novel contexts (Giovannini et al., 2001), are thought to bias the hippocampus toward forming distinctive memories by suppressing the reactivation of related associations. Conversely, lower cholinergic levels, evoked by recognizing familiar contexts, are thought to bias the hippocampus toward memory reactivation, such as the value associated with a particular object card. Thus, this framework predicts that use of episodic memories should be increased following familiarity or in the presence of familiar contexts. Consistent with this prediction, our prior research has demonstrated that recent recognition of familiar images improves people's ability to recall other associations (Patil & Duncan, 2018) and use memories to guide decisions (Duncan & Shohamy, 2016). Our prior research, however, was restricted to testing situations where only episodic memory cues are available to guide behavior. We, thus, provide an important extension here by demonstrating that these same contextual familiarity manipulations have the power to bias the competition between different types of memory in more realistic (and complex) situations where both memory systems vie for behavior control.

We demonstrated this in two ways: First, people were more likely to use episodic object memories on trials following the successful use of another episodic object memory (Experiments 1A and 1B), and second, these memories were also more often used following the incidental presentation of a familiar image (Experiment 2). The second finding provides stronger support for our memory state framework, because multiple factors could contribute to the within-task autocorrelation in object memory use observed in the first experiments. For example, successfully using an object memory may bias attention toward the objects on cards on the next trial (hence, away from deck color). Of note, the magnitude of memory modulation observed in Experiments 1A and 1B was substantially larger than that observed in Experiment 2. This difference suggests that attention-based biases may also contribute to the original effects. However, it should be noted that we manipulated familiarity in very different ways across experiments and that the within-task manipulations employed in the first experiments may have been particularly potent at evoking memory states. Specifically, in this manipulation, preceding trials were coded as “familiar” only when there was evidence that participants recognized and used the object memories. Conversely, in the incidental manipulation, contextual images were coded as “familiar” whenever they were repeated—a procedure that does not account for participants' recognition of the familiarity. Our previous work suggests that the objective presence of familiarity does not elicit episodic memory states as strongly as the subjective recognition of images (Patil & Duncan, 2018; Duncan & Shohamy, 2016; Duncan et al., 2012). Thus, the effects observed in the within-task manipulation may have also been strengthened by identifying the subjective mnemonic experience associated with our manipulation, a step that we could not take with the incidental manipulation because we had no behavioral index of the mnemonic experience triggered by the scene image.

Inspired by the cholinergic framework, we yoked recent familiarity manipulations to the episodic dimension of the task (e.g., proepisodic block), treating its impact on incremental learning use as biproduct of episodic memory modulation. Novelty detection, though, triggers the release of multiple interacting neurochemicals (Avery & Krichmar, 2017; Schomaker & Meeter, 2015; Patel, Rossignol, Rice, & Machold, 2012; Mena-Segovia, Winn, & Bolam, 2008). Of particular relevance to incremental learning, salient, novel stimuli have been shown to elicit dopamine release (Lisman & Grace, 2005; Ljungberg, Apicella, & Schultz, 1992), and in the context of reinforcement learning, new cues evoke fast and robust phasic responses in putative dopamine neurons (Lak, Stauffer, & Schultz, 2016; Stauffer, Lak, Kobayashi, & Schultz, 2016). These early responses are not modulated by the cue's reinforcement history, unlike the smaller value-dependent responses observed a fraction of a second later (Lak et al., 2016). Given dopamine's role in reinforcement learning, value-insensitive novelty/salience responses have sparked hypotheses that novelty could either directly modify value learning by inflating the predicted value of new cues or more generally promote the exploration of these cues, without distorting prediction error computations (Guitart-Masip, Bunzeck, Stephan, Dolan, & Duzel, 2010; Wittmann, Daw, Seymour, & Dolan, 2008; Kakade & Dayan, 2002). Although the current study is not positioned to address the former hypothesis, it is notable that participants were less likely to explore new cards following the recognition of an old card in Experiments 1A and 1B. At first blush, this seems consistent with the hypothesis that novelty promotes exploration, but this interpretation is complicated by participants' general reluctance to explore novel options in these tasks. Furthermore, increases in postnovelty exploration were not observed in the more controlled Experiment 2, suggesting that contextual novelty is not always sufficient to bias exploration.

Our demonstration that multiple forms of memory contribute to choice also has important implications for understanding basic mechanisms of decision-making. Prompted by the discovery that striatal dopamine release instantiates many of the properties of prediction errors (Bayer & Glimcher, 2005; Schultz, 1998), striatal learning mechanisms have been the primary target of reinforcement learning and decision-making research. More recently, however, there has been growing interest in hippocampal and episodic memory contributions to decision-making (Bornstein, Khaw, Shohamy, & Daw, 2017; Bornstein & Norman, 2017; Murty, FeldmanHall, Hunter, Phelps, & Davachi, 2016; Shadlen & Shohamy, 2016; Peters & Büchel, 2010), often envisioned either as a key component of model-based decision systems (Doll, Shohamy, & Daw, 2015; Wimmer, Daw, & Shohamy, 2012) or as a distinct control system (Gershman & Daw, 2017; Lengyel & Dayan, 2008). The deck dimension of our task was designed after the sort of “two-armed bandit” tasks routinely used to assess striatal contributions to decision-making. Of note, when participants were given the option of using this source of information alongside episodic memories, they readily used episodic memories. In fact, object values were nearly as reliable a predictor of choice as deck values, despite participants only having a single prior experience with each object. This underscores the importance of incorporating episodic, one-shot learning mechanisms into theories of reinforcement learning. It also raises interesting questions about the balance between these different forms of learning across species. Are humans uniquely predisposed to use episodic over incrementally learned values, and if so, what benefits might be conveyed by this predisposition?

In summary, we used neural models of learning and memory to develop a new memory modulation approach, an approach that modulates which memory system guides behavior. The approach we identified—eliciting memory recognition—was found to be contextual in nature and, as such, has the potential to impact behavior broadly. It is additionally notable that our approach appears to enhance rather than inhibit the use of episodic memories. Striatal-dependent incremental learning is thought to be particularly robust (Schwabe & Wolf, 2013), requiring less attentional or cognitive resources than episodic memory (Foerde, Knowlton, & Poldrack, 2006). Thus, identifying a factor that could increase the more fragile use of episodic memories has important implications for the more common need to overcome striatal habits. Lastly, identifying factors that impact the use of different memory systems is a much-needed complement to recent research investigating the arbitration of these memory systems during learning (Lee, O'Doherty, & Shimojo, 2015). Combined, these learning and use factors will provide new insights into how people adaptively make use of multiple forms of memory and how this arbitration can be brought back into registration when it breaks down.

Reprint requests should be sent to Katherine Duncan, 100 St George St., 4th Floor, Sidney Smith Building, Toronto, Ontario M5S 3G3, Canada, or via e-mail: duncan@psych.utoronto.ca.

## REFERENCES

Avery
,
M. C.
, &
Krichmar
,
J. L.
(
2017
).
Neuromodulatory systems and their interactions: A review of models, theories, and experiments
.
Frontiers in Neural Circuits
,
11
,
108
.
Barto
,
A.
(
1995
).
Adaptive critic and the basal ganglia
. In
J. C.
Houk
,
J. L.
Davis
, &
D.
Beiser
(Eds.),
Models of information processing of the basal ganglia
(pp.
215
232
).
Cambridge, MA
:
MIT Press
.
Bayer
,
H. M.
, &
Glimcher
,
P. W.
(
2005
).
Midbrain dopamine neurons encode a quantitative reward prediction error signal
.
Neuron
,
47
,
129
141
.
Bates
,
D.
,
Maechler
,
M.
,
Bolker
,
B.
, &
Walker
,
S.
(
2015
).
Fitting linear mixed-effects models using lme4
.
Journal of Statistical Software
,
67
,
1
48
.
Bornstein
,
A. M.
,
Khaw
,
M. W.
,
Shohamy
,
D.
, &
Daw
,
N. D.
(
2017
).
Reminders of past choices bias decisions for reward in humans
.
Nature Communications
,
8
,
15958
.
Bornstein
,
A. M.
, &
Norman
,
K. A.
(
2017
).
Reinstated episodic context guides sampling-based decisions for reward
.
Nature Neuroscience
,
20
,
997
1003
.
Clark
,
R. E.
,
,
N. J.
,
Zola
,
S. M.
, &
Squire
,
L. R.
(
2002
).
Anterograde amnesia and temporally graded retrograde amnesia for a nonspatial memory task after lesions of hippocampus and subiculum
.
Journal of Neuroscience
,
22
,
4663
4669
.
Cohen
,
N. J.
, &
Eichenbaum
,
H.
(
1993
).
Memory, amnesia, and the hippocampal system
.
Cambridge, MA
:
MIT Press
.
Davachi
,
L.
,
Mitchell
,
J. P.
, &
Wagner
,
A. D.
(
2003
).
Multiple routes to memory: Distinct medial temporal lobe processes build item and source memories
.
Proceedings of the National Academy of Sciences, U.S.A.
,
100
,
2157
2162
.
Daw
,
N. D.
(
2011
).
Trial-by-trial data analysis using computational models
. In
Decision making, affect, and learning: Attention and performance XXIII
(
Vol. 23
, pp.
3
38
).
Oxford
:
Oxford University Press
.
Daw
,
N. D.
,
O'Doherty
,
J. P.
,
Dayan
,
P.
,
Seymour
,
B.
, &
Dolan
,
R. J.
(
2006
).
Cortical substrates for exploratory decisions in humans
.
Nature
,
441
,
876
879
.
Doll
,
B. B.
,
Shohamy
,
D.
, &
Daw
,
N. D.
(
2015
).
Multiple memory systems as substrates for multiple decision systems
.
Neurobiology of Learning and Memory
,
117
,
4
13
.
Douchamps
,
V.
,
Jeewajee
,
A.
,
Blundell
,
P.
,
Burgess
,
N.
, &
Lever
,
C.
(
2013
).
Evidence for encoding versus retrieval scheduling in the hippocampus by theta phase and acetylcholine
.
Journal of Neuroscience
,
33
,
8689
8704
.
Duncan
,
K.
,
,
A.
, &
Davachi
,
L.
(
2012
).
Memory's penumbra: Episodic memory decisions induce lingering mnemonic biases
.
Science
,
337
,
485
487
.
Duncan
,
K. D.
, &
Shohamy
,
D.
(
2016
).
Memory states influence value-based decisions
.
Journal of Experimental Psychology: General
,
145
,
1420
1426
.
Easton
,
A.
,
Douchamps
,
V.
,
Eacott
,
M.
, &
Lever
,
C.
(
2012
).
A specific role for septohippocampal acetylcholine in memory?
Neuropsychologia
,
50
,
3156
3168
.
Eldridge
,
L. L.
,
Knowlton
,
B. J.
,
Furmanski
,
C. S.
,
Bookheimer
,
S. Y.
, &
Engel
,
S. A.
(
2000
).
Remembering episodes: A selective role for the hippocampus during retrieval
.
Nature Neuroscience
,
3
,
1149
1152
.
Foerde
,
K.
,
Knowlton
,
B. J.
, &
Poldrack
,
R. A.
(
2006
).
Modulation of competing memory systems by distraction
.
Proceedings of the National Academy of Sciences, U.S.A.
,
103
,
11778
11783
.
Frank
,
M. J.
,
Seeberger
,
L. C.
, &
O'reilly
,
R. C.
(
2004
).
By carrot or by stick: Cognitive reinforcement learning in Parkinsonism
.
Science
,
306
,
1940
1943
.
Gershman
,
S. J.
, &
Daw
,
N. D.
(
2017
).
Reinforcement learning and episodic memory in humans and animals: An integrative framework
.
Annual Review of Psychology
,
68
,
101
128
.
Gilboa
,
A.
,
Winocur
,
G.
,
Rosenbaum
,
R. S.
,
Poreh
,
A.
,
Gao
,
F.
,
Black
,
S. E.
, et al
(
2006
).
Hippocampal contributions to recollection in retrograde and anterograde amnesia
.
Hippocampus
,
16
,
966
980
.
Giovannini
,
M. G.
,
Rakovska
,
A.
,
Benton
,
R. S.
,
Pazzagli
,
M.
,
Bianchi
,
L.
, &
Pepeu
,
G.
(
2001
).
Effects of novelty and habituation on acetylcholine, GABA, and glutamate release from the frontal cortex and hippocampus of freely moving rats
.
Neuroscience
,
106
,
43
53
.
Guitart-Masip
,
M.
,
Bunzeck
,
N.
,
Stephan
,
K. E.
,
Dolan
,
R. J.
, &
Duzel
,
E.
(
2010
).
Contextual novelty changes reward representations in the striatum
.
Journal of Neuroscience
,
30
,
1721
1726
.
Hasselmo
,
M. E.
, &
Fehlau
,
B. P.
(
2001
).
Differences in time course of ACh and GABA modulation of excitatory synaptic potentials in slices of rat hippocampus
.
Journal of Neurophysiology
,
86
,
1792
1802
.
Hasselmo
,
M. E.
,
Wyble
,
B. P.
, &
Wallenstein
,
G. V.
(
1996
).
Encoding and retrieval of episodic memories: Role of cholinergic and GABAergic modulation in the hippocampus
.
Hippocampus
,
6
,
693
708
.
,
S.
, &
Dayan
,
P.
(
2002
).
Dopamine: Generalization and bonuses
.
Neural Networks
,
15
,
549
559
.
Knowlton
,
B. J.
,
Squire
,
L. R.
, &
Gluck
,
M. A.
(
1994
).
Probabilistic classification learning in amnesia
.
Learning & Memory
,
1
,
106
120
.
Lak
,
A.
,
Stauffer
,
W. R.
, &
Schultz
,
W.
(
2016
).
Dopamine neurons learn relative chosen value from probabilistic rewards
.
eLife
,
5
,
e18044
.
Lee
,
S. W.
,
O'Doherty
,
J. P.
, &
Shimojo
,
S.
(
2015
).
Neural computations mediating one-shot learning in the human brain
.
PLOS Biology
,
13
,
e1002137
.
Lengyel
,
M.
, &
Dayan
,
P.
(
2008
).
Hippocampal contributions to control: The third way—supporting material
.
Advances in Neural Information Processing Systems
,
20
,
889
896
.
Lisman
,
J. E.
, &
Grace
,
A. A.
(
2005
).
The hippocampal-VTA loop: Controlling the entry of information into long-term memory
.
Neuron
,
46
,
703
713
.
Ljungberg
,
T.
,
Apicella
,
P.
, &
Schultz
,
W.
(
1992
).
Responses of monkey dopamine neurons during learning of behavioral reactions
.
Journal of Neurophysiology
,
67
,
145
163
.
McDonald
,
R. J.
, &
White
,
N. M.
(
1993
).
A triple dissoaciation of memory systems: Hippocampus, amygdala and dorsal stratum
.
Behavioral Neuroscience
,
107
,
3
22
.
Meeter
,
M.
,
Murre
,
J. M.
, &
Talamini
,
L. M.
(
2004
).
Mode shifting between storage and recall based on novelty detection in oscillating hippocampal circuits
.
Hippocampus
,
14
,
722
741
.
Mena-Segovia
,
J.
,
Winn
,
P.
, &
Bolam
,
J. P.
(
2008
).
Cholinergic modulation of midbrain dopaminergic systems
.
Brain Research Reviews
,
58
,
265
271
.
Murty
,
V. P.
,
FeldmanHall
,
O.
,
Hunter
,
L. E.
,
Phelps
,
E. A.
, &
Davachi
,
L.
(
2016
).
Episodic memories predict adaptive value-based decision-making
.
Journal of Experimental Psychology: General
,
145
,
548
558
.
,
L.
, &
Moscovitch
,
M.
(
1997
).
Memory consolidation, retrograde amnesia and the hippocampal complex
.
Current Opinion in Neurobiology
,
7
,
217
227
.
O'Reilly
,
R. C.
, &
McClelland
,
J. L.
(
1994
).
Hippocampal conjunctive encoding, storage, and recall: Avoiding a trade-off
.
Hippocampus
,
4
,
661
682
.
Pabst
,
M.
,
Braganza
,
O.
,
Dannenberg
,
H.
,
Hu
,
W.
,
Pothmann
,
L.
,
Rosen
,
J.
, et al
(
2016
).
Astrocyte intermediaries of septal cholinergic modulation in the hippocampus
.
Neuron
,
90
,
853
865
.
Packard
,
M. G.
(
2009
).
Exhumed from thought: Basal ganglia and response learning in the plus-maze
.
Behavioural Brain Research
,
199
,
24
31
.
Patel
,
J. C.
,
Rossignol
,
E.
,
Rice
,
M. E.
, &
Machold
,
R. P.
(
2012
).
Opposing regulation of dopaminergic activity and exploratory motor behavior by forebrain and brainstem cholinergic circuits
.
Nature Communications
,
3
,
1172
.
Patil
,
A.
, &
Duncan
,
K.
(
2018
).
Lingering cognitive states shape fundamental mnemonic abilities
.
Psychological Science
,
29
,
45
55
.
Pessiglione
,
M.
,
Seymour
,
B.
,
Flandin
,
G.
,
Dolan
,
R. J.
, &
Frith
,
C. D.
(
2006
).
Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans
.
Nature
,
442
,
1042
1045
.
Peters
,
J.
, &
Büchel
,
C.
(
2010
).
Episodic future thinking reduces reward delay discounting through an enhancement of prefrontal-mediotemporal interactions
.
Neuron
,
66
,
138
148
.
Poldrack
,
R. A.
,
Clark
,
J.
,
Paré-Blagoev
,
E. J.
,
Shohamy
,
D.
,
Creso Moyano
,
J.
,
Myers
,
C.
, et al
(
2001
).
Interactive memory systems in the human brain
.
Nature
,
414
,
546
550
.
Ranganath
,
C.
,
Yonelinas
,
A. P.
,
Cohen
,
M. X.
,
Dy
,
C. J.
,
Tom
,
S. M.
, &
D'Esposito
,
M.
(
2004
).
Dissociable correlates of recollection and familiarity within the medial temporal lobes
.
Neuropsychologia
,
42
,
2
13
.
Schomaker
,
J.
, &
Meeter
,
M.
(
2015
).
Short- and long-lasting consequences of novelty, deviance and surprise on brain and cognition
.
Neuroscience & Biobehavioral Reviews
,
55
,
268
279
.
Schönberg
,
T.
,
Daw
,
N. D.
,
Joel
,
D.
, &
O'Doherty
,
J. P.
(
2007
).
Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making
.
Journal of Neuroscience
,
27
,
12860
12867
.
Schonberg
,
T.
,
O'Doherty
,
J. P.
,
Joel
,
D.
,
Inzelberg
,
R.
,
Segev
,
Y.
, &
Daw
,
N. D.
(
2010
).
Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson's disease patients: Evidence from a model-based fMRI study
.
Neuroimage
,
49
,
772
781
.
Schultz
,
W.
(
1998
).
Predictive reward signal of dopamine neurons
.
Journal of Neurophysiology
,
80
,
1
27
.
Schultz
,
W.
(
2016
).
Dopamine reward prediction-error signalling: A two-component response
.
Nature Reviews Neuroscience
,
17
,
183
195
.
Schwabe
,
L.
, &
Wolf
,
O. T.
(
2013
).
Stress and multiple memory systems: From “thinking” to “doing”
.
Trends in Cognitive Sciences
,
17
,
60
68
.
Scoville
,
W. B.
, &
Milner
,
B.
(
1957
).
Loss of recent memory after bilateral hippocampal lesions
.
Journal of Neurology, Neurosurgery, and Psychiatry
,
20
,
11
21
.
,
M. N.
, &
Shohamy
,
D.
(
2016
).
Decision making and sequential sampling from memory
.
Neuron
,
90
,
927
939
.
Small
,
S. A.
,
Schobel
,
S. A.
,
Buxton
,
R. B.
,
Witter
,
M. P.
, &
Barnes
,
C. A.
(
2011
).
A pathophysiological framework of hippocampal dysfunction in ageing and disease
.
Nature Reviews Neuroscience
,
12
,
585
601
.
Squire
,
L. R.
(
1992
).
Memory and the hippocampus: A synthesis from findings with rats, monkeys, and humans
.
Psychological Review
,
99
,
195
231
.
Squire
,
L. R.
, &
Zola
,
S. M.
(
1996
).
Structure and function of declarative and nondeclarative memory systems
.
Proceedings of the National Academy of Sciences, U.S.A.
,
93
,
13515
13522
.
Stauffer
,
W. R.
,
Lak
,
A.
,
Kobayashi
,
S.
, &
Schultz
,
W.
(
2016
).
Components and characteristics of the dopamine reward utility signal
.
Journal of Comparative Neurology
,
524
,
1699
1711
.
Sutton
,
R. S.
, &
Barto
,
A. G.
(
1998
).
Reinforcement learning: An Introduction
(
Vol. 1
).
Cambridge, MA
:
MIT Press
.
Tulving
,
E.
(
1983
).
Elements of episodic memory
.
,
26
,
351
.
Tulving
,
E.
, &
Markowitsch
,
H. J.
(
1998
).
Episodic and declarative memory: Role of the hippocampus
.
Hippocampus
,
8
,
198
204
.
Vandecasteele
,
M.
,
Varga
,
V.
,
Berényi
,
A.
,
Papp
,
E.
,
Barthó
,
P.
,
Venance
,
L.
, et al
(
2014
).
Optogenetic activation of septal cholinergic neurons suppresses sharp wave ripples and enhances theta oscillations in the hippocampus
.
Proceedings of the National Academy of Sciences, U.S.A.
,
111
,
13535
13540
.
Volkow
,
N. D.
,
Wang
,
G. J.
,
Fowler
,
J. S.
,
Tomasi
,
D.
, &
Telang
,
F.
(
2011
).
Addiction: Beyond dopamine reward circuitry
.
Proceedings of the National Academy of Sciences, U.S.A.
,
108
,
15037
15042
.
Wimmer
,
G. E.
,
Daw
,
N. D.
, &
Shohamy
,
D.
(
2012
).
Generalization of value in reinforcement learning by humans
.
European Journal of Neuroscience
,
35
,
1092
1104
.
Wittmann
,
B. C.
,
Daw
,
N. D.
,
Seymour
,
B.
, &
Dolan
,
R. J.
(
2008
).
Striatal activity underlies novelty-based choice in humans
.
Neuron
,
58
,
967
973
.
Yassa
,
M. A.
,
Mattfeld
,
A. T.
,
Stark
,
S. M.
, &
Stark
,
C. E. L.
(
2011
).
Age-related memory deficits linked to circuit-specific disruptions in the hippocampus
.
Proceedings of the National Academy of Sciences, U.S.A.
,
108
,
8873
8878
.

## Author notes

This paper is part of a Special Focus deriving from a symposium at the 2017 Annual Meeting of the Cognitive Neuroscience Society entitled “Memory Neuromodulation: Influences of Learning States on Episodic Memory.”