We can learn from the wisdom of others to maximize success. However, it is unclear how humans take advice to flexibly adapt behavior. On the basis of data from neuroanatomy, neurophysiology, and neuroimaging, a biologically plausible model is developed to illustrate the neural mechanisms of learning from instructions. The model consists of two complementary learning pathways. The slow-learning parietal pathway carries out simple or habitual stimulus–response (S-R) mappings, whereas the fast-learning hippocampal pathway implements novel S-R rules. Specifically, the hippocampus can rapidly encode arbitrary S-R associations, and stimulus-cued responses are later recalled into the basal ganglia-gated pFC to bias response selection in the premotor and motor cortices. The interactions between the two model learning pathways explain how instructions can override habits and how automaticity can be achieved through motor consolidation.
We don't always need to learn things on our own, the hard way, from trial and error. By reading books and chatting with other people, we can immediately learn from the wisdom of others to avoid making mistakes. This way of learning from instruction/advice, unlike observational learning, does not require explicit demonstration and lies at the core of human communication and flexible behavior.
Given that instructional learning is arguably the most sophisticated form of learning seen in the animal kingdom, why does it appear so effortless in comparison with more basic, trial-and-error learning? How does instruction enable us to perform complex novel tasks perfectly on the first attempt? In this article, we address these questions with a biologically informed neural network model that can flexibly recombine old tricks for new tasks via mechanisms of hippocampal fast learning and prefrontal working memory.
Although instruction learning and following happen in almost every human laboratory experiment and in everyday life, learning from instruction has been much less explored from a mechanistic, computational modeling perspective, compared with other domains of learning, such as reinforcement learning. There has been some research in this area examining the role of instructional control in category learning (Noelle & Cottrell, 1996), probabilistic reward learning (Li, Delgado, & Phelps, 2011; Walsh & Anderson, 2011; Biele, Rieskamp, & Gonzalez, 2009; Doll, Jacobs, Sanfey, & Frank, 2009), and stimulus–response (S-R) learning (Cohen-Kdoshay & Meiran, 2009; Wenke, Gaschler, Nattkemper, & Frensch, 2009), instructional learning remains a poorly characterized phenomenon. Because different types of tasks likely engage different brain networks and mechanisms for encoding procedural knowledge in that particular domain, for simplicity we center our discussion on S-R instructions hereafter.
Multiple sources of neural data (e.g., neuroimaging and electrophysiology) show that implementation of novel S-R mappings involves several cortical and subcortical areas, such as premotor, prefrontal and posterior parietal cortices, hippocampus and striatum (Brass, Wenke, Spengler, & Waszak, 2009; Suzuki, 2007; Casey, Thomas, Davidson, Kunz, & Franzen, 2002). Although it remains unclear exactly how these brain areas coordinate to learn and implement instructions, the lateral pFC appears to be particularly important for instructional control (Hartstra, Kuhn, Verguts, & Brass, 2011; Ruge & Wolfensteller, 2010).
Several human and monkey studies on conditional motor learning (without the instruction component) provide insight into the functional roles of brain regions involved in processing S-R mappings. Whereas the hippocampal system underlies rapid acquisition of novel S-R associations (Brasted, Bussey, Murray, & Wise, 2005; Casey et al., 2002), the posterior parietal cortex subserves transformation of simple and well-learned S-R mappings (Grol, de Lange, Verstraten, Passingham, & Toni, 2006; Corbetta & Shulman, 2002). As for responses, motor preparation is carried out by the premotor cortex and other motor-related areas (Suzuki, 2007; Cavina-Pratesi et al., 2006), and the ACC is involved in monitoring response conflict (Brass et al., 2009; Botvinick, Cohen, & Carter, 2004).
On the basis of the neuroimaging and neurophysiological evidence mentioned above, a neural network model is developed to explore the dynamic interplay among various brain areas for instructional control. Mechanistically, the model explains how S-R instructions suppress habits for new conditional behavior and how automaticity can be achieved through motor consolidation. Next we outline the major model components and their connections in reference to relevant neuroanatomy.
On a large scale, our model of instructional control consists of two complementary learning pathways, as shown in Figure 1. The fast-learning hippocampal–prefrontal pathway processes complex, novel S-R mappings like those learned from instructions, whereas the slow-learning parietal pathway processes simple, habitual S-R mappings. This overall architecture is consistent with findings about hippocampal and striatal contributions to initial learning and that neocortical learning is gradually shaped by experience (Pasupathy & Miller, 2005; for reviews, see Ashby, Turner, & Horvitz, 2010; McClelland, McNaughton, & O'Reilly, 1995). Note, however, that with a focus on goal-directed control we model both the ventral and dorsomedial striatum (roughly, nucleus accumbens and caudate) but not dorsolateral striatum (putamen), which also governs habitual behavior (for reviews, see Ashby et al., 2010; Yin & Knowlton, 2006).
Our model helps to answer the following questions: Why is trial-and-error learning so arduous whereas instructed learning appears effortless? How do we successfully perform complex novel tasks on the first attempt? We reason that S-R instructions quickly assemble rather than slowly modify preexisting elements of perceptual and motor knowledge. For example, we can immediately follow the instruction: “press the left button when seeing a triangle; press the right button when seeing a square,” in which the action of button press is a preexisting motor skill and visual recognition of shapes is also an already learned perceptual ability. Note also that understanding the instruction requires a previously learned mapping from language (e.g., the verbal command of “press”) to actual behavior (e.g., the motor execution of “press”).
Our proposed model implements the aforementioned instructional control from neural to behavioral levels. Unlike a single-purpose neural network that slowly rewires the whole system to learn a new sensorimotor transformation, this general-purpose instructable model separates movement from plan representations and restricts plan updating to lie within the fast-learning hippocampus. Specifically, the pFC together with BG in the model learns a vocabulary of action items and the corresponding motor responses. The model hippocampus rapidly encodes S-R instructions as action episodes. It can retrieve a stimulus-dependent action command into the dorsolateral pFC as a goal for guiding subsequent behavior although sufficiently simple S-R rules, such as categorical rather than one-to-one mappings, may be directly maintained in the unmodeled ventrolateral pFC (Hartstra et al., 2011; Bunge, 2004) by verbal working memory instead of episodic memory. The model posterior parietal cortex alone can also carry out well-versed S-R transformations, and all the response plans outputted from the posterior parietal cortex (PPC) and pFC compete in the model premotor cortex. For all the simulated brain areas, below we discuss their anatomical connections and computational functions in the model.
Posterior Parietal Cortex
The PPC receives inputs from visual, auditory, and somatosensory cortices and outputs to motor areas, such as the premotor and primary motor cortices, consistent with the “how” or perception-for-action framework of Goodale and Milner (1992). After extensive learning, the PPC can process highly familiar S-R mappings in a fast, automatic, and unconscious manner (Rossetti et al., 2005; Schindler et al., 2004) and damage to the PPC in humans can produce a variety of sensorimotor deficits such as Apraxia (Andersen & Buneo, 2002). In the model, the PPC layer connects the input Condition layer with the output Premotor layer in the parietal learning pathway. It is essentially a hidden layer in a generic three-layer neural network that learns arbitrary input–output transformations.
The entorhinal cortex provides the major interface for communication between the hippocampus and neocortex through perirhinal and parahippocampal cortices (Lavenex & Amaral, 2000). Within the hippocampus, there are two major pathways. The perforant pathway, originated from layer II of the entorhinal cortex, links dentate gyrus, Cornu Ammonis fields CA3, CA1, and subiculum by a long neuronal chain; the direct pathway projects from layer III of the entorhinal cortex directly to CA1, which in turn projects back to layer V of the entorhinal cortex via subiculum (Duvernoy, 2005).
Depending on the phase of the hippocampal theta rhythm, CA1 is driven mainly by either entorhinal or CA3 inputs for memory encoding and retrieval, respectively (Hasselmo, Bodelon, & Wyble, 2002). For goal-directed decisions, memory retrieved in the hippocampus can be routed to the pFC via subicular, entorhinal, perirhinal, and parahippocampal cortices (Simons & Spiers, 2003; Goldman-Rakic, Selemon, & Schwartz, 1984).
In the model, we simulate both the perforant and direct pathways except subiculum, whose functional role in novelty processing (Lisman & Grace, 2005) is beyond the scope of this article. Computationally, the perforant pathway up to CA3 carries out pattern separation and completion (O'Reilly & McClelland, 1994) such that the model CA3 encodes sparse, conjunctive, and content-addressable memory representations of sensory inputs to the hippocampus, such as verbally instructed S-R associations; the direct pathway is a three-layer autoencoder network, which learns to perform a topographically organized identity mapping from the model EC_in (i.e., entorhinal superficial layers) via CA1 to EC_out (i.e., entorhinal deep layers) such that a recalled episode in CA3 can be reinstated into EC_out via the CA3-to-CA1 projection (Norman & O'Reilly, 2003).
Prefrontal Cortex and Basal Ganglia
Although the pFC comprises cytoarchitecturally distinct areas, as a whole it extensively interconnects with sensory, motor, and limbic systems. In particular, the dorsolateral pFC, implicated in working memory function, receives inputs from visual/auditory/somatosensory cortices, hippocampus (mainly via the orbitomedial pFC), and BG (via the thalamus) and outputs to motor areas such as the SMA, FEFs, and premotor cortex (Miller & Cohen, 2001).
In the model, we implement a pFC BG working memory (PBWM) system (Hazy, Frank, & O'Reilly, 2006, 2007; O'Reilly & Frank, 2006) to hold instructed actions recalled from the hippocampus. Specifically, the model pFC has two separate layers, pFC_mnt and pFC_out, to simulate neurons showing either tonic maintenance or phasic motor-related activities. The simplified model BG has distinct go and no-go neurons in every striatial matrix layer for deciding whether to maintain and/or output working memory contents. The pathway from Matrix_mnt to SNrThal_mnt controls pFC_mnt firing, whereas the pathway from Matrix_out to SNrThal_out controls pFC_out firing. The action units of the model EC_out layer projects to pFC_mnt, which in turn projects to pFC_out.
Dopamine (DA) neurons signal reward prediction errors, which allows learning of go versus no-go decisions of BG-gated working memory operations, such as outputting working memory contents to Premotor during performance but not during instruction memorization. Because appropriate go/no-go behaviors depend on the environmental context, different learning stages (see Methods for more details) are coded with localist representations in the model Context layer as a sensory, contextual input to the simulated matrisomes of the striatum, namely Matrix_mnt and Matrix_out.
The reward-predictive firing properties of DA neurons in the substantia nigra pars compacta and ventral tegmental area are simulated using a circuit that explains data on Pavlovian conditioning (Hazy, Frank, & O'Reilly, 2010; O'Reilly, Frank, Hazy, & Watz, 2007). Specifically, the simulated DA neurons increase firing for unexpected rewards and decrease firing for omission of expected rewards to respectively drive the go and no-go pathway neurons in the striatum. In the model simulation, the amount of reward supplied to this DA system is derived from the correctness of final motor responses.
Anterior Cingulate Cortex
The dorsal ACC interconnects with the PPC, dorsolateral pFC, and motor structures including premotor, supplementary motor, and primary motor areas. Thus, the dorsal ACC is in a position to evaluate or regulate executive control during both stimulus presentation and response selection, although it is engaged most strongly in monitoring and evaluating the outcomes of actions (Botvinick et al., 2004; Paus, 2001).
In the model, ACC unit receives uniform excitatory projections from all the Premotor units and has a high firing threshold such that it does not fire unless receiving inputs from more than two active units in the Premotor layer. In other words, the model ACC monitors existence of conflicting motor plans whereas the Premotor layer resolves conflicts, if any, during response selection through winner-take-all competition. Our simplified ACC does not project to any other model layers to further modulate network dynamics, such as driving the basal forebrain cholinergic neurons to increase attention for more deliberate task processing (Krichmar, 2008).
Our model was implemented using the Leabra framework for simulating biologically realistic neural networks at an intermediate level of detail. Leabra has been applied to explain a wide range of cognitive phenomena and their underlying neural mechanisms (O'Reilly, Hazy, & Herd, in press; O'Reilly & Munakata, 2000). Leabra combines locally computable mechanisms of error-driven and Hebbian associative learning to offer a biologically realistic way of training a hierarchical neural network in a supervised manner, which essentially modifies synaptic connectivity in a network to minimize any discrepancies between expectations (i.e., spontaneous network outputs in the Leabra minus phase) and actual outcomes (i.e., supplied teaching signals in the Leabra plus phase) in the network output layers.
Because conditional responses could be learned from instruction or from experience, we distinguished verbal action commands (“Action”) from motor responses (“Motor”) and supplied Condition–Motor, Action–Motor, and Condition–Action pairs as input–output teaching signals to respectively train the model PPC, BG/pFC, and hippocampus when simulating different types of learning (see the first three cases in Figure 2). Model performance was evaluated based on these different teaching signals accordingly (see the italicized labels in Figure 2). For example, during instruction memorization, it was considered an error when the hippocampus failed to output the designated “Action” in response to a specific “Condition” input in a trial. Error-driven, supervised learning was then continued for each type of learning until no error was made for several epochs, where an epoch comprised 10 trials of different input–output pairs. For cortical consolidation, no external teaching signals were provided to the model. Instead, the hippocampal–prefrontal outputs served as internal teaching signals to correct spontaneous outputs from the model parietal cortex.
The proposed model used a rate-based coding scheme with a hybrid of localist and distributed representations to process information. Here the neural firing rates in the model were computed from excitatory conductances rather than membrane potentials to better capture the rates of neuronal spiking (O'Reilly et al., in press). Localist/sparse and distributed codes refer to whether certain information is represented by a relatively small or large number of neurons, respectively. Because we abstracted away sensory and motor processes during instructional control, it was difficult to manually specify a realistic pattern of distributed neural activities in the model input/output layers and thus localist representations were used. Otherwise, distributed coding was the default representation throughout the network.
For simplicity, the model input/output layers communicated with each other through one-to-one connections between the same localist representations. These interface layers, listed in Figure 2, included the inputs/outputs of the whole system (i.e., Condition, Action, and Premotor), the hippocampus (i.e., EC_in and EC_out), and the PBWM system (i.e., pFC_mnt and pFC_out). For example, one-to-one connections were set between Condition and the first unit group of EC_in, between Action and the second unit group of EC_in, between the second unit group of EC_out and pFC_mnt, between pFC_mnt and pFC_out, and between pFC_out and Premotor (see also Figure 1B).
As a whole, the proposed model is a synthesis of several specialized neural systems. Different brain regions simulated in the model share the same set of computational principles (e.g., lateral inhibition, bidirectional connectivity, and local learning) but become functionally specialized because of different parameterizations (e.g., differences in sparseness of neural codes, learning rate, and balance between error-driven and associative learning). The present work adopts the default structures and parameter values from previous studies that detail these specialized parameterizations for the neocortex (O'Reilly et al., in press), hippocampus (Norman & O'Reilly, 2003), midbrain DA system (Hazy et al., 2010), and prefrontal-BG working memory (O'Reilly & Frank, 2006). For technical reference, the source code of our model and simulations are available for download at grey.colorado.edu/∼tren/instruct/.
The model simulation results are presented below. Each simulation result shows the mean and the standard error of the mean of 20 independent model runs, each of which used randomly generated S-R mappings and initial weights for the network connections. In all the simulations, each of the Condition, Action, and Premotor layers used 10 localist units to respectively represent 10 encountered conditions, 10 verbal actions, and 10 motor outputs (see Methods for more details).
Each simulation ran through a specific set of stages that are described in Figure 2. Three of the stages, however, were common to all the simulations: The model was pretrained with Action-to-Motor mappings (i.e., from verbal commands to real motor responses) during the vocabulary-building stage and then trained with Condition-to-Action mappings (i.e., S-R rules) during the instruction memorization stage. During the performance stage, it was tested with Condition-to-Motor mappings without any inputs from the Action layer.
Simulation 1: Instruction Following
To simulate basic instruction following, the proposed model was instructed with 10 novel pairs of S-R rules (e.g., if see S, then do R) and evaluated for its success in performing conditional actions (e.g., do R) when encountering a specific condition (e.g., see S). As shown in Figure 3A, the model quickly memorized an S-R rule in few trials during the instruction memorization stage, and without further practice it made no error in carrying out these instructions for response during the performance stage. Such a perfect performance was due to the contribution from the hippocampal–prefrontal learning pathway because lesioning the model posterior parietal cortex did not impair instruction following.
Simulation 2: Habit Suppression
S-R instructions can sometimes conflict with previously developed habits, as in S-R compatibility tasks. In such a case, newly instructed S-R associations stored in the hippocampus need to override prepotent responses promoted by old S-R habits (Casey et al., 2002). Although behaviorally people can make few or no error in following instructions, neurally they must resolve competition between new and old S-R mappings during response selection, as signaled by the dorsal ACC (Botvinick et al., 2004).
To simulate habit suppression, we first trained up the parietal habit pathway on 10 S-R rules before instructing the model on 10 newly remapped S-R rules, which recycled the same set of 10 stimuli and 10 responses in the old rules. Independently training the PPC and hippocampus was achieved by providing teaching signals to the model layers that interfaced with the parietal or hippocampal outputs: Premotor was an output layer and Action provided null inputs during the parietal training, whereas Action was an output layer and Premotor provided null inputs during the hippocampal training (see Figure 2). Such a treatment ensured proper credit assignment during error-driven learning.
As shown in Figure 3B, during the performance stage, the model successfully employed the newly instructed S-R rules without making any errors because the stronger outputs from the hippocampal–prefrontal pathways outweighed the weaker outputs from the parietal pathway in the Premotor layer, which resolved the competition between these two conflicting motor plans over time using a winner-take-all mechanism. During such a response selection process, the model ACC detected incompatible motor plans in the Premotor layer despite the model's errorless performance (Figure 1B).
Simulation 3: Automaticity
Damage to the hippocampus in primates can lead to anterograde amnesia and temporally graded retrograde amnesia although novel S-R mappings can still be slowly acquired by the habit learning system in this circumstance (Bayley, Frascino, & Squire, 2005; Wise & Murray, 1999). It is thus believed that novel declarative memory formed in the fast-learning hippocampus is later consolidated in the slow-learning neocortex (McClelland et al., 1995; Alvarez & Squire, 1994). Although the BG is suggested to train up the neocortex during perceptual and motor consolidation (Ashby et al., 2010), we propose that the posterior hippocampus and the dorsolateral striatal circuit provide teaching signals of declarative and non-declarative S-R rules, respectively, to the neocortex for developing automaticity.
In the model, the hippocampal–prefrontal pathway is fast at learning but slow at processing, whereas the parietal pathway is slow at learning but fast at processing. Such a paradoxical property is because of a larger number of synaptic connections and more complex machinery involved in the hippocampal–prefrontal than the parietal pathway. As a result, when no teaching signal is supplied from the external environment to the premotor/motor cortices, a deliberately constructed motor plan that is transmitted from the hippocampal–prefrontal pathway to the model premotor cortex can serve as an internal teaching signal to the faster-responding parietal pathway to enable error-driven and/or supervised Hebbian learning. Gradually, the parietal pathway learns to conform its answers to those from the hippocampal–prefrontal pathway during this cortical consolidation process, and automaticity purely through parietal mediation can be achieved in the long run.
To simulate development of automaticity, a memory consolidation stage was introduced before the regular performance stage. During the consolidation stage, no explicit teaching signals were provided from the simulation environment to the model Premotor layer, and thus, the model performance was not evaluated. During the performance stage, all the model prefrontal and hippocampal layers were lesioned to see the pure parietal contributions to the model outputs. As shown in Figure 4, the degree of automaticity in terms of performance accuracy was positively correlated with the duration of consolidation.
We have presented a neural model capable of instructional control based on the relevant brain areas known to be involved. The model treats an instructed S-R task as a combinatorial generalization task in that novel combination of perceptual and motor knowledge is the primary learning problem faced by the system. This is a realistic simplification, in that people can only be instructed to perform tasks where they already know the basic task elements, and the instructions consist of a particular combination of these elements to be performed. Our model shows that these combinations can be easily memorized by the hippocampus and then implemented through top–down cognitive control via the pFC/BG system. This form of cognitive learning can be more rapid and flexible than perceptual-motor learning, which engages the slow-learning neocortex to develop task-specific representations.
Learning occurs in multiple parts of the architecture to support the ability to behave according to instructions. To comprehend the language of action instructions, the model needs a vocabulary-building stage during which the hippocampus learns to perform identity mapping for relaying information from the Action layer to the corresponding motor representations in pFC layers. Meanwhile, BG learns to open the execution gate for pFC to output a motor decision to the Premotor layer. During the instruction memorization stage, the hippocampus associates inputs from the Condition and Action layers and learns each condition–action pair as an episodic pattern. Only after completing all these various forms of learning can the model follow instructions flawlessly during the performance stage—using mechanisms of pattern completion, the hippocampus recalls instructions about what action to perform based on retrieval cues from the Condition layer, and its downstream pFC either maintains a retrieved motor command in working memory when BG closes the execution gate or further triggers a motor decision in the Premotor layer when BG opens the execution gate.
Compared with reinforcement learning, instructed learning has several advantages. Reinforcement learning adapts behavior based on the consequences of actions, whereas instructed learning adapts behavior in accordance with instructed action rules. Hence, unlike the slow, retrospective process of trial and error in reinforcement learning, instructed learning tends to be fast, proactive, and errorless. In general, trial-and-error learning increases the likelihood of making the same mistakes in the future, and errorless learning avoids such a possibility of strengthening incorrect S-R associations through Hebbian mechanisms (McClelland, 2001). Note, however, that although errorless model performance is mainly because of perfect recalls of instructions from the hippocampus, we do not deny hippocampal involvement in trial-and-error learning, especially when task rules are explicit or deterministic. In fact, S-R studies that used trial-and-error procedures found hippocampal contributions to one-trial learning of correct responses (e.g., Brasted et al., 2005).
Our model differs from others in level of detail and in theorizing about how the pFC, BG, and the hippocampus interact with each other. For example, connectionist approaches (e.g., Noelle & Cottrell, 1995, 1996) and various reinforcement learning algorithms have been used to imply how instructions may interact with the experience system subserved by the BG (Li et al., 2011; Walsh & Anderson, 2011; Biele et al., 2009; Doll et al., 2009). However, the prefrontal and hippocampal mechanisms that mediate instructions remain underspecified in these studies compared with ours. Also, hippocampal memory and PBWM, respectively, underlies rule-based and habitual behavior in the bilinearity model (Dayan, 2007), but they work in tandem as the instructable pathway in our model where habitual S-R responses engage little or no prefrontal control.
Our model explains existing data and makes predictions. Note that postresponse feedback that defines correct S-R associations is functionally equivalent to preresponse instructions, our model would predict impairment of learning and executing novel conditional responses in both the cases of S-R specification when either the pFC or hippocampus is damaged in humans, as shown by Petrides (1985, 1997). Moreover, whereas the BG plays a major role in habit learning and automaticity (Ashby et al., 2010; Yin & Knowlton, 2006), our model predicts hippocampal and prefrontal involvement in cortical consolidation of instructed one-to-one S-R mappings. Furthermore, because the parietal pathway mainly supports habitual S-R execution, our model predicts that applying TMS onto the PPC will not disrupt learning and executing of instructed S-R mappings.
Finally, the model proposed here is being extended to address other important questions about instructional learning and control. For example, how does the brain handle hierarchical, probabilistic, or sequence rules other than simple S-R mappings? How does misleading instructions affect learning and performance of a given task? These more complex scenarios engage a larger brain network than the present work and require more detailed modeling of the pFC (Cole, Bagic, Kass, & Schneider, 2010; Bunge, 2004) as well as the BG and motor cortices.
We thank members of the CCN Lab at CU Boulder for helpful comments and discussion. This work was supported by ONR N00014-07-1-0651, ARL RCTA, NIH MH079485 and by the Intelligence Advanced Research Projects Activity (IARPA) via Department of the Interior (DOI) contract number D10PC20021. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI, or the U.S. Government.
Reprint requests should be sent to Tsung-Ren Huang, Department of Psychology and Neuroscience, University of Colorado Boulder, 345 UCB, Boulder, CO 80309, or via e-mail: email@example.com.