The thalamus has traditionally been considered as only a relay source of cortical inputs, with hierarchically organized cortical circuits serially transforming thalamic signals to cognitively relevant representations. Given the absence of local excitatory connections within the thalamus, the notion of thalamic relay seemed like a reasonable description over the past several decades. Recent advances in experimental approaches and theory provide a broader perspective on the role of the thalamus in cognitively relevant cortical computations and suggest that only a subset of thalamic circuit motifs fits the relay description. Here, we discuss this perspective and highlight the potential role for the thalamus, and specifically the mediodorsal (MD) nucleus, in the dynamic selection of cortical representations through a combination of intrinsic thalamic computations and output signals that change cortical network functional parameters. We suggest that through the contextual modulation of cortical computation, the thalamus and cortex jointly optimize the information and cost trade-off in an emergent fashion. We emphasize that coordinated experimental and theoretical efforts will provide a path to understanding the role of the thalamus in cognition, along with an understanding to augment cognitive capacity in health and disease.
1 Cortico-Centric View of Perceptual and Cognitive Processing
Until recently, cognition in mammalian, bird, and reptilian nervous system has been viewed as a cortico-centric process, with the thalamus considered to play the mere role of a relay system. This classic view, much driven by the visual hierarchical model of the mammalian cortex (Felleman & Van Essen, 1991), puts the thalamus at the beginning of a feedforward hierarchy. The transmission of information from the thalamus to the early sensory cortex (V1 in the visual system, for example), and the gradually increasing complex representations from V2 to MT/IT and eventually the prefrontal cortex (PFC), constitute the core of the perceptual representation under the hierarchical model. A recent comparative study of biologically plausible convolutional neural networks (CNN) and the visual ventral stream emphasizes feature enrichment through the network of hierarchically interconnected computational modules (layers in the neural network and areas in the ventral stream; Yamins et al., 2014).
The strictly static feedforward model has since morphed to a dynamic hierarchical model due to the discoveries of the role of feedback from higher cortical areas to lower cortical areas (Heeger, 2017). These dynamic hierarchies are even considered to be favorable for recursive optimizations, where the overall optimization can be achieved by breaking the problem into smaller ones to find the optimum for each of these smaller problems. Such recursive optimization does not need to be confined to just one cortical area, and each of such distributed optimizations may even be solved differently. (Marblestone, Wayne, & Kording, 2016). This view of cortical computation is also paralleled with the growing use of recurrent neural networks (RNN) that can capture the dynamics of single neurons or a neural population in a variety of tasks. As such, RNNs can mimic a context-dependent prefrontal response (Mante, Sussillo, Shenoy, & Newsome, 2013) or reproduce the temporal scaling of neural responses in the medial frontal cortex and the caudate nucleus (Wang, Narain, Hosseini, & Jazayeri, 2018). Although it is clear that higher-level cortical feedback reaches all the way down to the thalamus (Wimmer et al., 2015), the main attribute of perception/cognition remains cortico-centric under the umbrella of dynamic hierarchical models or RNN embodiment of cortical cognitive functions. Since the computation that is carried by the system should match the computing elements at the appropriate scale (Dehghani, 2018) a mismatch between these presumed computational systems and the underlying circuitry becomes vividly apparent. First, the associative cortex (and not just the sensory cortex) receives thalamic input. Second, certain thalamic territories receive their primary (driving) input from the cortex rather than sensory periphery, some of which are likely to be highly convergent on a single thalamic cell level, suggesting at least a cortical modulation of sensory input. Third, thalamic projections can be broad and diffuse, suggesting a modulatory rather than a relay function.
These points are indicative that the thalamus may play a central role in cognitive processes. But what does including the thalamus add that could not be achieved in cortical loops? One advantage that the thalamus could bring into the cortical equation is flexibility. For example, the same sensory cue may have different meanings according to the particular context a subject is in. A recent study (Rikhye, Gilra, & Halassa, 2018) shows that the thalamus may be well suited to reorganize functional connectivity in frontal cortices in response to such contextual changes, allowing for a more flexible switching of rule to action mapping.
We suggest that the unique cognitive capability of the thalamocortical system is tightly bound to parallel processing and contextual modulation that are enabled by the diversity of computing nodes (including thalamic and cortical structures) and complexity of the computing architecture (see Figure 1). Here we focus on the mediodorsal (MD) region of the thalamus, which has been extensively studied, and put it into the context of other thalamic areas, which may have related functions. We start with a brief overview of the thalamic architecture, followed by experimental evidence and a computational perspective of the (MD) thalamic role in contextual cognitive computation.
2 Thalamic Architecture: Anatomical and Functional Features
Traditionally, thalamic nuclei (see Figure 2) are defined as collections of neurons that are segregated by gross features such as white matter tracts and various types of tissue staining procedures (Jones, 1981). This gross anatomical classification has been equated with a functional one, where individual thalamic nuclei give rise to a set of defined functions (Jones, 1981, 1985). More recent fine anatomical studies challenge this notion, showing that within individual nuclei, single-cell input-output connectivity patterns are quite variable.
The thalamus lacks lateral excitatory connections and receives inputs from other subcortical structures or the cortex. In fact, a major feature of forebrain expansion across evolution is the invasion of the thalamus by cortical inputs (Grant, Hoerder-Suabedissen, & Molnár, 2012; Rouiller & Welker, 2000). Most (90–95%) afferents to the relay nuclei are not from the sensory organs (Jones, 1985; Sherman & Guillery, 2013). Recent anatomical studies have shown a great diversity of cortical input type, strength and inferred degree of convergence, even within individual thalamic nuclei (Clasca, Rubio-Garrido, & Jabaudon, 2012) (see Figure 3 for an example view of cell and network architecture diversity).
Excitatory inputs mostly arrive as feedback from layer 6 of the cortex, as well as from brainstem reticular formation. In addition to the diversity of excitatory inputs, thalamic circuits receive a diverse set of inhibitory inputs (note that local GABAergic thalamic interneurons are mostly absent in non-LGN relay nuclei). The two major systems of inhibitory control are the thalamic reticular nucleus (TRN), a shell of inhibitory nucleus surrounding thalamic excitatory nuclei, and the extrathalamic inhibitory system, a group of inhibitory projects across the fore-, mid-, and hindbrain (see Halassa & Acsády, 2016, for a review on thalamic inhibition). Perhaps a major differentiating feature of these two systems (TRN and ETI) is temporal precision. One of the key characteristics of the thalamus is a lack of direct local loops. Only a very small group of inhibitory neurons with local connections exists in thalamus (Steriade & Llinás, 1988). A mechanistic consequence of this architecture is the differential control of thalamic response gain and selectivity (see Figures 3F and 3G), with the TRN controlling the first, as observed in sensory systems (Pinault, 2004), and ETI controlling the latter, as observed in motor systems (Urbain & Deschênes, 2007). For example, basal ganglia control of thalamic responses, a form of ETI control, would be implemented through thalamic disinhibition, which is not only dependent on ETI input but also a special type of thalamic conductance that enables high-frequency bursting on release from inhibition (Deniau & Chevalier, 1985; Goldberg, Farries, & Fee, 2013).
Overall, the variety of thalamic inputs (both excitatory and inhibitory), combined with intrinsic thalamic features such as excitability and morphology, will determine the type of intrinsic computations that the thalamus performs. This view appears to be consistent with recent observations of confidence encoding in both sensory (Komura, Nikkuni, Hirashima, Uetake, & Miyamoto, 2013) and motor systems (Jazayeri & Shadlen, 2015; Ma & Jazayeri, 2014).
Thalamic relay nuclei mostly project to the cortical middle layers in a topographic fashion. However, the majority of thalamic structures also project more diffusely to the cortical superficial layers, such as mediodorsal (MD), posteriodmedial complex (POm), and pulvinar (see Figure 3 for an example of thalamic cell and circuit diversity). These diffuse projections seem poorly suited to relay information in a precise manner. Rather, they might have a modulatory cortical function role. Furthermore, a great degree of diversity can be observed at the level of thalamic axonal terminals within the cortex. While the idea of a thalamic relay was consolidated by observing that the main LGN neurons thought to be associated with form vision (M and P pathways) exhibit spatially compact cortical terminals, recent anatomical studies of individual neurons across the thalamus show a variety of terminal sizes and degree of spatial spread and intricate computational architecture (see Figure 3). This complexity of the architecture and diversity of the computing nodes are among the key factors that set apart the thalamocortical system from other conventional and unconventional computing engines (see Figure 1). Part of the complication in understanding how these anatomical types give rise to different functions is their potential for contacting different sets of excitatory and inhibitory cortical neurons.
Specifically, among thalamic nuclei, the mediodorsal thalamus (MD) seems to have a connectivity pattern that is distinctively different from the classic sensory nuclei. Corticothalamic projections to the MD originate from both layers V and VI of PFC (Giguere & Goldman-Rakic, 1988; Goldman-Rakic & Porrino, 1985; Mitchell, 2015). But in contrast to relay nuclei, cortical input to the MD terminates both extraglomerularly and within the synaptic glomeruli, suggesting that the cortex plays a different role in MD activity (in comparison to the LGN, for example; Schwartz, Dekker, & Goldman-Rakic, 1991). Additionally, optogenetic and in vitro electrophysiology techniques have revealed that the MD not only projects to layer I but also has additional terminations in layer III (Cruikshank et al., 2012). MD projections to the PFC synapse with both excitatory and inhibitory cells (Cruikshank et al., 2012), where the triggered feedforward inhibition could play a variety of roles, from regulating dendritic action potentials (Kim, Beierlein, & Connors, 1995) to imposing a narrow temporal window within which the excitation can reach the target (Cruikshank, Lewis, & Connors, 2007).
These elaborate input and output connectivities of the thalamocortical architecture point to the nonunitary computational role of the thalamus. Among thalamic nuclei, specifically MD (and likely other MD-like nuclei) has the architecture necessary to be involved in modulatory computational roles rather than the well-known relay functions. In the next sections, we provide the experimental evidence and algorithmic designs pointing to this modulatory role. This idea of the thalamus controlling cortical state parameters is highlighted in Figures 4 and 5 and the next section.
3 Many Facets of Thalamic Computation
It is commonly thought that processes like attention, decision making, and working memory are implemented through distributed computations across multiple cortical regions (Corbetta, 1998; Mesulam, 1990; Scott et al., 2017). However, it is unclear how these computations are coordinated to drive relevant behavioral outputs. From an anatomical standpoint, the thalamus is strategically positioned to perform this function, but relatively little is known about its broad functional engagement in cognition. The thalamic cellular composition and network structure constrain how the cortex receives and processes information. The thalamus is the major input to the cortex, and interactions between the two structures are critical for sensation, action, and cognition (Jones, 1998; Nakajima & Halassa, 2017; Sherman, 2016). Despite recent studies showing that the mammalian thalamus contains several circuit motifs, each with specific input-output characteristics, the thalamus is traditionally viewed as a relay to or between cortical regions (Sherman & Guillery, 2013). That the active role of the thalamus in cognition is beyond relay stems from (1) the fact that sensory input to the thalamus is quite limited in comparison to input from other structures, such as the cortex and basal ganglia; (2) the experimental evidence shows the number of nuclei modulating cortical neural processing according to behavioral context; and (3) lesions of certain nuclei such as pulvinar and MD cause severe attention and memory deficits (Saalmann & Kastner, 2015).
We will discuss how the distinctive anatomical architecture and computational role of pulvinar and MD differ from relay nuclei such as LGN. It is worth mentioning that this view of bona-fide thalamic computations is quite distinct from the one in which thalamic responses reflect their inputs, with only linear changes in response size. This property of reflecting an input (with only slight modification of amplitude) was initially observed in the lateral geniculate nucleus (LGN), which receives inputs from the retina. LGN responses to specific sensory inputs (their receptive fields, RF) are very similar to those in the retina itself, arguing that there little intrinsic computation is happening in the LGN itself outside of gain control. Success in early vision studies (Hubel & Wiesel, 1959, 1962) might have inadvertently given rise to the LGN relay function being generalized across the thalamus. The strictly feedforward thalamic role in cognition requires reconsideration (Halassa & Kastner, 2017); only a few thalamic territories receive peripheral sensory inputs and project cortically in a localized manner as the LGN does (FitzGibbon et al., 2015; Jones, 1981; Kakei et al., 2001; Raczkowski & Fitzpatrick, 1990; Sherman, 2016).
The largest thalamic structures in mammals, the MD and pulvinar, contain many neurons that receive convergent cortical inputs and project diffusely across multiple cortical layers and regions (Clasca et al., 2012; Rovo, Ulbert, & Acsády, 2012). For example, the primate pulvinar has both territories that receive topographical, nonconvergent inputs from the striate cortex (Rovo et al., 2012) and others that receive convergent inputs from nonstriate visual cortical (and frontal) areas (Mathers, 1972). This same thalamic nucleus also receives inputs from the superior colliculus (Partlow, Colonnier, & Szabo, 1977), a subcortical region receiving retinal inputs. This suggests that the pulvinar contains multiple input “motifs” solely based on the diversity of excitatory input. Such input diversity is not limited to the pulvinar, but is seen within many thalamic nuclei across the mammalian forebrain (Bickford, 2016). Local inactivation of pulvinar neurons results in reduced neural activity in the primary visual cortex (Purushothaman, Marion, Li, & Casagrande, 2012), suggesting a feedforward role. Recent studies, however, indicate that the pulvinar may have additional roles. For example, pulvinar inactivation was shown to increase low-frequency cortical oscillations (Zhou, Schafer, & Desimone, 2016). Given that such activity is often associated with inattention and sleep, this study suggested that the pulvinar may keep cortical regions in an activated state that would allow responsiveness to top-down input from other areas to modulate ongoing activity according to attentional demands. A different study showed that during perceptual decision making, pulvinar neurons encode choice confidence rather than stimulus category (Komura et al., 2013). Together, these recent findings strongly argue for more pulvinar functions beyond relaying information.
In the case of the MD, direct sensory input is limited (Mitchell, 2015), and the diffuse, distributed projections to the cortex (Kuramoto et al., 2017) are poorly suited for information relay. This input-output connectivity suggests different functions. Recent studies (Bolkan et al., 2017; Rikhye et al., 2018; Schmitt et al., 2017) have begun to shed light on the type of computation that MD performs. Taking advantage of the genetic accessibility of the mouse for neuronal manipulations, these studies have revealed that the MD coordinates task-relevant activity in the prefrontal cortex (PFC) in a manner analogous to a contextual signal regulating distinct attractor states within a computing reservoir. Specifically, in a task where animals had to keep a rule in mind over a brief delay period (see Figure 4A), PFC neurons show population-level persistent activity following the rule presentation, a sensory cue that instructs the animal to direct its attention to either vision or audition (Figures 4B and 4C). MD neurons show responses that are devoid of categorical selectivity (see Figure 4D) yet are critical for selective PFC activity; optogenetic MD inhibition diminishes this activity, while MD activation augments it. The conclusion is that MD inputs enhance synaptic connectivity among PFC neurons or may adjust the activity of PFC neurons through selective filtering of the thalamic inputs. In other words, delay-period MD activity maintains rule-selective PFC representations by augmenting local excitatory recurrence (Schmitt et al., 2017).
In a related study, a delayed nonmatching-to-sample T-maze working memory task (Bolkan et al., 2017), it was shown that MD amplification and maintenance of higher PFC activity indicated correct performance during the subsequent choice phase of the task. Interestingly, MD-dependent-increased PFC activity was much more pronounced during the later (in delay) rather than the earlier part of the task.
These findings indicate that PFC might have to recursively pull in the MD to sustain cortical representations as working memory weakens with time. Together these studies indicate that PFC cognitive computation cannot be dissociated from MD activity. Further evidence for the critical role of the MD-PFC interaction for cognition is the disrupted frontothalamic anatomical and functional connectivity seen in neurodevelopmental disorders (Marenco et al., 2012; Mitelman, Byne, Kemether, Hazlett, & Buchsbaum, 2005; Nair, Treiber, Shukla, Shih, & Müller, 2013; Parnaudeau et al., 2015; Woodward, Giraldo-Chica, Rogers, & Cascio, 2017).
3.1 Can MD Select Cortical Subnetworks Based on Contextual Modulation?
Why would a recurrent network (PFC) computation depend on its interaction with a nonrecurrent (MD), nonrelay network? What computational advantage would such a system have? Using a chemogenetic approach, a recent study suggested that information flow in the MD-PFC network can be unidirectional. While both inactivating PFC-to-thalamus and MD-to-cortex pathways impaired recognition of a change in reward value in rats performing a decision-making task, only the inactivation of the MD-to-cortex pathway had an impact on the behavioral response to a change in the action-reward relationship (Alcaraz et al., 2018). Given that a sensory stimulus may require a different action depending on the context in which it occurs, the ability to flexibly reroute the active PFC subnetwork to a different output may be crucial. In an architecture like the PFC-MD network, where the MD can modulate PFC functional connectivity, the MD might well be suited to reroute the ongoing activity in a context-dependent manner. In fact, in the mouse cognitive task described above (see Figure 4A), a subset of MD neurons showed substantial spike rate modulation during task engagement compared to when the animal is in its home in a cage (see Figure 4E; Schmitt et al., 2017). In contrast, PFC neurons show very little difference in spike rates when the animal gets engaged in the task. This suggests that perhaps different subsets of MD neurons are capable of encoding task contexts, which has been shown experimentally (Rikhye et al., 2018). Subsequently, each given subset could unlock a distinct cortical association; this hypothesis is now experimentally verified (Rikhye et al., 2018). These MD subsets have to be able to shift the cortical states dynamically while maintaining the selectivity based on the subset of cortical connections they target. This idea would also fit with the paradigm shift indicating that thalamic neurons exert dynamical control over information relay to cortex (Basso, Uhlrich, & Bickford, 2005; Parnaudeau, Bolkan, & Kellendonk, 2018).
Overall, the anatomical and neurophysiological data show that the thalamic structure and corticothalamic network circuitry, and the interplay between thalamus and cortex, shape the frame within which the thalamus plays the dual role of relay and modulator. Under this framework, different thalamic nuclei carry out a multitude of functions, including information relay. A suggestion of this comparative computational role of LGN, pulvinar, and MD is depicted in Figure 8. The importance of the (nonrelay) thalamic nuclei's regulatory influence on cortical function is also reflected by the disorders that emerge due to thalamic dysfunctions. Specifically, lesions to the pulvinar and MD lead to severe attention and memory deficits (Baxter, 2013; Saalmann & Kastner, 2011). The disruption of MD-PFC communication is the likely cause of these cognitive impairments. As mentioned earlier, the back-and-forth interaction between the MD and PFC is necessary during the task acquisition period and is reflected by an increase in (beta frequency) MD-PFC synchrony (Parnaudeau et al., 2013). In addition, the MD also regulates the neural synchrony of PFC neurons (Saalmann, 2014). Decreasing MD spiking activity (by hyperpolarizing MD neurons) leads to disrupted MD-PFC synchrony and impaired performance in the delayed nonmatch to sample task (Parnaudeau et al., 2013). Moreover, schizophrenics show both a reduced beta and gamma frequency deficit (Uhlhaas & Singer, 2010), significant reduction of MD volume (Alelú-Paz & Giménez-Amaya, 2008; Popken, Bunney, Potkin, & Jones, 2000), and a reduced number of MD neurons (Popken et al., 2000). While it remains unclear whether the MD loss of neuron is primary or secondary to PFC pathology (Popken et al., 2000), it is evident that the MD is regulating PFC plasticity, cognitive flexibility (Baxter, 2013), and contextual processing (Rikhye et al., 2018; Schmitt et al., 2017).
4 Is the Thalamus a Read-Write Medium for Cortical Parallel Processing?
The connectivity pattern of relay and nonrelay points to the dichotomy of algorithmic constraints that are imposed by these specific thalamic structures. There are stark contrasts between sensory (LGN-like) and nonsensory thalamic nuclei thalamocortical and corticothalamic connectivity profiles. Anatomical tracing and radiographic studies have shown that while relay nuclei have preserved topographic focal projections to the middle cortical layer, major thalamic structures (MD, Pulvinar, POm) have a more diffuse projection to the superficial layers of the cortex (Giguere & Goldman-Rakic, 1988; Krettek & Price, 1977; Mitchell, 2015). Among nonrelay nuclei, the MD shows interesting characteristics, projecting not only to layer I but also to the outer banks of layer III (Cruikshank et al., 2012). Only a small fraction of MD to PFC projections end in middle cortical layers, and more than 90% have modulatory and project diffusely to superficial layers of PFC (Viaene, Petrof, & Sherman, 2011a, 2011b; Zikopoulos & Barbas, 2007). In addition, corticothalamic axons to nonrelay nuclei show a dual role for cortical influence on the thalamus (Giguere & Goldman-Rakic, 1988; Mitchell, 2015). For example, not only layer VI/V PFC projects huge numbers of axons directly to MD but also sends collaterals to the reticular thalamic nucleus and indirectly influences MD activity (Giguere & Goldman-Rakic, 1988; Schwartz et al., 1991; Sherman & Guillery, 2002).
This massive reciprocal connectivity is metabolically very costly for the brain. If the brain were to operate as a simple pattern matching system, a far less costly feedforward network (similar to the structure suggested by Yamins et al., 2014) would have been more economical. The complex connectivity of the nonrelay thalamus and cortex points to computations that are beyond pattern matching. This view also matches what we know of cortical computation: that it involves multiple sources of expertise (each decoding certain aspects of the incoming stimuli), along with the fact that these expertise modules operate in a highly parallel mode. To coordinate these parallel yet convergent cortical processing modules, there is a need for a system that is commonly visible to each module. In addition, collectively these modules should keep track of the changes in the stimuli in the environment and integrate the information in time. This contextual processing would require small modifications of processed information in individual modules. For handling the local constraints, it may need repetition of certain algorithms until satisfactory results are reached. The connectivity profile that we discussed provides the platform to run such computations.
If the brain were to function as a simple pattern-matching system without wiring and metabolic constraints, evolution would just expand the size and depth of the network to the point that it could potentially memorize a large number of possible patterns. Possibly, evolution would have achieved this approximation of arbitrary patterns by evolving a deep network. This would be a desirable solution since any system can be defined as a polynomial Hamiltonian of low order, which can be accurately approximated by neural networks (Lin, Tegmark, & Rolnick, 2017). But cognition is much more than template matching and classification achieved by a neural network.
The limits of template-matching methods in dealing with (rotation, translation, and scale) invariance in object recognition quickly became known to neuroscientists and in early work on computer vision. One of the early pioneers of AI, Oliver Selfridge, proposed pandemonium architecture to overcome this issue (Selfridge, 1959). Selfridge envisioned serially connected distinct demons (an image demon, followed by a set of parallel feature demons, followed by a set of parallel cognitive demons, and eventually a decision demon) that independently perceive parts of the input before reaching a consensus together through a mixture of serial culmination of evidence from parallel processing. This simple feedforward computational pattern recognition model is in some ways a predecessor to modern connectionist feedforward neural networks, much like what we have already discussed. However, despite its simplicity, Pandemonium was a leap forward in understanding that the intensity of (independent parallel) activity along with a need to a summation inference are key to moving from simple template matching to a system that has a concept about the processed input. A later extension of this idea was proposed by Allen Newell as the Blackboard model: “Metaphorically, we can think of a set of workers, all looking at the same blackboard: each is able to read everything that is on it and to judge when he has something worthwhile to add to it. This conception is just that of Selfridge's Pandemonium: a set of demons independently looking at the total situation and shrieking in proportion to what they see that fits their natures” (Newell, 1962). Blackboard AI systems, adapted based on this model, have a common knowledge base (blackboard) that is iteratively updated (written to and read from) by a group of knowledge sources (specialist modules) and a control shell (organizing the updates by knowledge sources; Nii, 1986).
Interestingly, this computational metaphor can also be extended to the interaction between the thalamus and cortex, though the thalamic blackboard is not a passive one as in the blackboard systems (Harth, Unnikrishnan, & Pandya, 1987; Mumford, 1991, 1992). Although initially the active blackboard was used as an analogy for LGN computation, the nature of MD connectivity and its communication with the cortex seem much more suitable to the types of computations enabled by an active blackboard. Starting with an input, the thalamus as the common blackboard visible to processing (cortical) modules initially presents the problem (input) for parallel processing by modules. By “module,” we refer to a group of cortical neurons that form a functional assembly that may or may not be clustered together (in a column, for example). By iteratively reading (via thalamocortical projections) from and writing (via corticothalamic projections) to this active blackboard, expert pattern recognition modules gradually refine their initial guess based on their internal processing and the updates of the common knowledge. This process continues until the problem is solved (see Figure 5).
This iterative communication between the nonrelay thalamus and cortex suggests that corticothalamic projections return the results of computations (carried in parallel cortical modules) back to the thalamus. The integration of these revisions in the next thalamic output happens via the synaptic input and dendritic arbors of the nonrelay thalamic neurons. One of the major differences in synaptic organization of the MD from that of the sensory nuclei is that cortical axons target MD neurons both extraglomerularly and within the synaptic glomeruli (Schwartz et al., 1991). In sensory nuclei, these within-glomeruli synaptic sites are particularly designated for receiving major ascending sensory afferents (Spacek & Lieberman, 1974). In the MD, such large terminals may even engulf multiple synaptic contacts (Pelzer, Horstmann, & Kuner, 2017) and are positioned on proximal dendrites of thalamic neurons (Schwartz et al., 1991). These within-glomuerli multisynaptic structures exhibit fast kinetics, large postsynaptic currents, and strong short-term depression (Pelzer, Horstmann, & Kuner, 2017). Interestingly, the short-term depression is combined with fast recovery after repetitive stimulation (Pelzer et al., 2017), enabling the generation of synaptic activity patterns that can match the frequency of corticothalamic inputs arriving via these large terminals (Steriade & Deschenes, 1984). As a result, these potent synaptic structures provide the platform for PFC inputs to play a much more active role in shaping the thalamic response in relay (MD) nuclei in comparison to the role that cortical feedback to sensory thalamic nuclei may play (Schwartz et al., 1991).
Distinctive biophysical characteristics of nonrelay thalamocortical projections play a complementary role in the computational scheme echoing an active blackboard. Interestingly, activating MD does not generate spikes across a population of prefrontal cortical neurons they project to, while activating LGN generates spikes in primary visual cortex (Schmitt et al., 2017; Rikhye et al., 2018). Instead, MD activation results in overall enhancement of inhibitory tone, coupled with enhanced local recurrent connectivity within the PFC. Although thalamocortical feedforward inhibition is also observed in somatosensory barrel cortex in response to thalamic stimulation (Gabernet, Jadhav, Feldman, Carandini, & Scanziani, 2005), MD-evoked inhibition in PFC exerts a more powerful inhibitory gain control (Delevich, Tucciarone, Huang, & Li, 2015). This difference must be rooted in the particular cortical target pattern of MD projections. Specifically, the MD directly targets parvalbumin-positive PV (and not somatostatin—SOM) interneurons in layer I and III (Collins, Anastasiades, Marlin, & Carter, 2018; Delevich et al., 2015; Kuroda, Yokofujita, & Murakami, 1998; Kuroda, Yokofujita, Oda, & Price, 2004; Rotaru, Barrionuevo, & Sesack, 2005), while (for example) VM only “weakly” activates a variety of layer I interneurons (Cruikshank et al., 2012). In fact, when SOM interneurons are silenced, MD-evoked feedforward inhibition is enhanced (Delevich et al., 2015). As a result, while VM plays a more important role in the excitation-inhibition balance, the MD plays a modulator role via varying the integration time window and temporal precision of cortical responses (Collins et al., 2018).
While5 individual pyramidal neurons harbor a broad response dynamics, the feedforward inhibition regulates the population dynamic via graded recruitment of individual neurons (Khubieh, Ratté, Lankarany, & Prescott, 2016). Increased conductance, noisy voltage fluctuations, and depolarization are the not necessarily exclusive factors that define how changes in the background input (i.e., stimulus changes and their contextual relevance) affect the gain (Cardin, Palmer, & Contreras, 2008; Prescott & De Koninck, 2003). In addition, while sensory thalamocortical synapses (i.e., LGN to visual cortex) onto fast-spiking inhibitory neurons manifest much higher release probability than those onto pyramidal cells, MD projections show similar presynaptic release probability among the two inhibitory and excitatory cortical neurons (Delevich et al., 2015). The covariation of excitatory and feedforward inhibitory response sets the control for graded recruitment of pyramidal neurons into population response (Khubieh et al., 2016). This control in itself is evoked by prior cortical excitation of MD, and as a result, the altered activity of PV interneurons in PFC can bias the response toward passive versus flexible contextual processing in a manner that is distinctively different from the observed response of the sensory cortices (Delevich et al., 2015). These mechanisms show us why the chemogenetic inhibition of MD leads to impaired working memory and flexible goal-directed behaviors (Parnaudeau et al., 2013, 2015). Similarly, schizophrenics show reduced MD-PFC functional coupling (Mitelman et al., 2005) and deficits in prefrontal PV interneurons (Lewis, Curley, Glausier, & Volk, 2012), highlighting the importance of the modulatory effect of the MD on PFc function (Delevich et al., 2015).
5 Computational and Metabolic Constrains
To process the changing stimuli and altering contextual cues and in order to achieve cognitive flexibility, the thalamocortical system needs to harbor temporal buffering. The mechanisms that we describe here point to the ways in which buffeing may take place: (1) changes in the stimuli/context are constantly reprocessed by the cortex and the outputs are rewritten to the thalamus, and (2) the thalamus constantly reshapes the cortical population dynamics. The MD is changing the mode by which PFC neurons interact with one another, initiating and updating different attractor dynamics underlying distinct cognitive inputs. As a result, the thalamocortical system, collectively and at any instant, keeps an updated description of the stimulus/context over some computational cycles up to the present (see Figure 5). Perhaps the upper bound of these computational cycles is tightly bound to the particulars of the corticothalamic and thalamocortical connectivity and biophysical constraints. However, since we are dealing with a biological system with finite resources, this back-and-forth communication needs to have certain characteristics to provide a viable computational solution. First and foremost, the control of interaction and its scheduling has to have a plausible biological component and should bind solutions as time evolves. Second, to avoid turning into an NP-hard (nondeterministic polynomial-time hardness) problem, there must exist a mechanism that stops this iterative computation once an approximation has been reached (see Figure 5). Here, we propose a specific solution to the first problem and a plausible one for the later issue. We suggest that phase-dependent contextual modulation serves to deal with the first issue, and a multiobjective optimization of efficiency (computational information gain) and economy (computational cost, that is, metabolic needs and the required time for computation) handles the second issue (see Figure 6). In both cases, we suggest that the thalamus plays an integral role in conjunction with cortex.
5.1 Computational Constraints and the Role of the Thalamus in Phase-Dependent Contextual Modulation
As mentioned earlier, we know that hierarchical convolutional neural networks (HCNN), which can recapitulate certain properties of static hierarchical forward models, cannot capture any processes that need to store prior states (Yamins & DiCarlo, 2016). As a result, context-dependent processing can be extremely hard to implement in neural network models (Rigotti, Rubin, Wang, & Fusi, 2010). The most widely used ANNs (feedforward nets, i.e., multilayer perceptrons/deep learning algorithms) face fundamental deficiencies: the ubiquitous training algorithms (such as backpropagation) have no biophysical plausibility, have high computational cost (number of operations and speed), and require millions of examples for proper adjustment of the connections' weights. These features render feedforward NNs unsuitable for temporal information processing.
In contrast, recurrent neural networks (RNNs) can universally approximate the state of dynamical systems (Funahashi & Nakamura, 1993), and because of their dynamical memory, they are well suited for contextual computation. If the higher cortical areas were to show some features of RNN-like networks, as manifested by the dynamical response of single neurons (Mante et al., 2013), then we anticipate that the local computation (interaction between neighboring neurons) will be mostly driven by external biases. The thalamic projections could then play the role of bias where they seed the state of the network.
From both anatomical studies and electrophysiological investigations (Groh et al., 2014), we know that the thalamus is at a prime position to modify the signal based on the cognitive processing that is happening in the cortex (Bolkan et al., 2017; Schmitt et al., 2017). This thalamic-driven regulation entails “binding in time” since the MD-like thalamus modifies its output cortex at a given time and is itself influenced by what is perceived by the cortex in time prior. But how can the “binding in time” avoid locking in the thalamic function to a set of inputs at a given time? How can the thalamus constantly be both ahead of the cortex and yet keep track of the past information? The secret may be embedded in the nonrecurrent intrinsic structure of the thalamus, the recurrent structure of the higher cortical areas, and the phase-sensitive detection that biases and binds the locally recurrent activity in the cortex with large-scale feedback loops.
To expand the idea further, we revisit some core attributes of cognitive processing. Based on the observations of behavior, higher cognition requires efficient computation, time delay feedback, the capacity to retain information, and contextual computational properties. Such computational cognitive processes surpass the computational capacity of simple RNN-like networks. The essential required properties of a complex cognitive system of this kind are that (1) input should be nonlinearly mapped onto the high-dimensional state, while different inputs map onto different states; (2) slightly different states should map onto identical targets; (3) only the recent past should influence the state, and the network is essentially unaware of the remote past; (4) a phase-locked loop should decode information that is already encoded in time; and (5) the combination of properties 1 to 4, should optimize sensory processing based on the context.
The first three attributes of such a system have close relevance to the constraints and computational properties of higher cortical areas (prefrontal). The same three are also the main features of reservoir computing, namely, separation property, approximation property, and fading memory (Jaeger, 2001, 2007; Maass, Natschläger, & Markram, 2002, 2003). Interestingly, an RC system can “nonlinearly” map a lower-dimensional system to a high-dimensional space facilitating classification of the elements of the low-dimensional space. The last two properties match the structure and computational constraints of a nonrelay thalamic system as a contextual modulator that is phasically changing the input to the RC system. In fact, in an RC model of the prefrontal cortex, adding a phase neuron significantly improved the network's performance in complex cognitive tasks. The phase neuron improves performance by generating input-driven attractor dynamics that best matched the input (Enel, Procyk, Quilodran, & Dominey, 2016). This advantageous phase-based bias effect is not limited to the simulation or physiological RC-like neural circuitry. In a recent study, electronic implementation and numerical studies of a limited RC system of a single nonlinear node with delayed feedback have shown efficient information processing (Appeltant et al., 2011). Such a reservoir's transient dynamical response follows delay-dynamical systems, and only a limited set of parameters is required to set the rich dynamical properties of delay systems (Ikeda & Matsumoto, 1987). This system was able to effectively process time-dependent signals.
The phase neuron (Enel et al., 2016) and delayed dynamical RC (Appeltant et al., 2011) both show properties that resemble the structure-function of the MD-like thalamus as discussed here. Specifically, the phasic recruitment of cortical neurons is invoked due to the combination of cortical influence on nonrelay thalamic neurons through direct and indirect corticothalamic projections (see the reticular nucleus inhibitory influence on thalamic neurons, Figure 3). In a given cycle of computation (see Figure 5), cortical feedback leads to the release of GABA (via reticular nucleus) in the MD (Kim et al., 2011). It has been shown that the increased GABA alters the opening of T-type Ca2 channels (Crunelli & Leresche, 1991), which, in the case of the MD, results in enhanced MD-PFC interaction, yielding mutual drive of the corticothalamic and thalamocortical activity together (Kim et al., 2011). Through this calcium-based low-threshold spiking, the gradual synchronization of the MD and PFC ensues (Jones, 2002; Kim et al., 2011). As a result, MD units and PFC show strong phase-locked synchrony (Parnaudeau et al., 2013). This gradual phase-locking mechanism forms the basis of the temporal dynamics that can nonlinearly (through consecutive cycles of computation) change cortical activity as a result of novel stimuli or unexpected contextual changes as observed experimentally (Bolkan et al., 2017; Schmitt et al., 2017). In fact, abnormal activity of T-type Ca2 channels in MD leads to hypersynchrony in PFC neurons and frontal lobe-specific seizures (Kim et al., 2011). The interactions between the nonrelay thalamus and cortex collectively are neither feedforward nor locally recurrent, but have a mixture of a nonrecurrent phase encoder that keeps copies of the past processing and modulates the sensory input relay and its next step processing (see Figure 5). The distinctive short-term dynamics are well matched with the divergent structure-function relationship of sensory and nonrelay MD-like thalamic nuclei. These features further emphasize that the perceptual and cognitive processing cannot be solely cortico-centric operations.
6 Biological Constraints and the Role of the Thalamus in Computational Optimization
Computation and optimization are two sides of the same coin. But how does the brain optimize the computations that would match its required objective: cognitive processing? A current trend in thinking is that the brain optimizes some arbitrary functions; the hopes is that the future discovery of these unknown functions may guide us to establish a link between the brain's operations and deep learning (Marblestone et al., 2016). This line of approach to optimizational (and computational) operations of the brain has a few flaws. First, it avoids specifying what function the brain is supposed to optimize (and as a result, it remains vague). Second, it refrains from addressing certain limitations that the brain has to cope with due to biological constraints. First of these limitations is the importance of using just enough resources to solve the current perceptual problem. Second is the need to come up with a solution just in (the needed) time. The importance of “just-enough” and “just-in-time” computation in cortical computation should not be overlooked (Douglas & Martin, 2007). If the first condition is not met, the organism cannot sustain continued activity since the metabolic demand surpasses the dedicated energetic expenditure and the animal cannot survive. In fact, the communications in neural networks are highly constrained by a number of factors, specifically the energetic demands of the network operations (Laughlin & Sejnowski, 2003). From estimates of the cost of cortical computation (Lennie, 2003), we know that the high cost of spiking forces the brain to rely on sparse communication and using only a small fraction of the available neurons (Baddeley et al., 1997; Shoham, O'Connor, & Segev, 2006). While theoretically, the cortex can dedicate a large number of neurons (and very high dynamical space) to solve any cognitive task, the metabolic demand of such high-energetic neural activity renders such a mechanism highly inefficient. As a result, the law of diminishing returns dictates that increased energetic cost causing excessive pooling of active neurons to an assembly, would be penalized (Niven & Laughlin, 2008). The penalty for unnecessary high-energetic neural activity in itself should be driven by the nature of computation rather than being formulated as a fixed arbitrary threshold imposed by an external observer. On the other hand, a system can resort to low-cost computation at any given time but dedicate a long enough time to solve the task at hand. Naturally, such a system would not be very relevant to biological systems since time is of the essence. If an animal dedicates a long instance of its computational capacity to solve a problem, the environment has changed before it reaches a solution and the solution becomes obsolete. A deer would never have an advantage for its brain to have fully analyzed the visual scene instead of spotting the approaching wolf and shifting resources to the most-needed task, escape. As a result, many of the optimization techniques and concepts that may be relevant to artificial neural networks are irrelevant to embodied computational cognition of the brain. The optimization that the brain requires is not aiming for the best possible performance; rather, it needs to reach a good mixture of economy and efficiency.
Not surprisingly, these constraints, efficiency and economy, are cornerstones of homeostasis and are observed across many scales in living systems (Szekely, Sheftel, Mayo, & Alon, 2013). The simple “integral feedback” acts as the mainstay of control feedback in such homeostatic systems (such as E. coli heat shock or DNA repair after exposure to gamma radiation; Dekel & Alon, 2005; El-Samad, Goff, & Khamash, 2002; El-Samad, Kurata, Doyle, Gross, & Khamash, 2005; Krishna, Maslov, & Sneppen, 2007). Change in input leads to change in the output and the proportional change in the controller aiming to reset the output to the desired regime. When the integral feedback is disrupted, the system can no longer reach proper homeostasis and either efficiency or economy (or even both) will be suboptimal (El-Samad et al., 2002; El-Samad, Kurata, Doyle, Gross, & Khamash, 2005; Szekely et al., 2013).
Many different etiologies could be behind the integral feedback disruption, but the outcome is loss of robust response in uncertain environments. The presence of feedforward and feedback loops provides the means for robust and fast operation in processing fluctuating incoming inputs. This feedback regulation and operational robustness have an energetic and computational cost for the system. Although for simple systems, it is feasible to associate the exact cost of an operation to the overall computational cost of the system, scaling the metabolic cost of feedback regulations to large networks remains a challenge since it will involve multiple feedback loops, nonlinear dynamics, and numerous uncertain parameters (Csete & Doyle, 2002). Specifically, in the case of a single neuron, branching architecture, nonuniform ion channel distributions, and conduction states of action potentials affect the rate of energy consumption (Ju, Hines, & Yu, 2016). However, this electrochemical energy of single-neuron operation does not linearly scale to the spent energy at network level (Wang & Wang, 2014; Wang, Zhang, & Chen, 2009). The total energy function of neural populations will depend not only on the energy function of single neurons and their coupling in a given neural population, but also on the flow of information between different populations (Wang & Wang, 2014; Wang & Zhu, 2016). When a large pool of neurons is recruited to form multiple assemblies to perform a certain computation, it is the interactions between the assemblies that will define the collective behavior of the discrete components. Since the coupled processes show additive entropy productions (Demirel, 2011), the total energetic optimality of the desired function would depend on the feedback loops between the assemblies and how these feedbacks control the intrinsic energy expenditure of a given assembly. These attributes are in line with the general principle of modular composition of biological systems (Dehghani, 2017, 2018; Hartwell, Hopfield, Leibler, & Murray, 1999). From the dynamical systems' perspective, to understand the operational principles (here, of large assembly of neurons), we do not need to strip down the assembly to its individual component level (here, individual neurons; Dehghani, 2018). As a result, the optimal control is at the functional scale of modules where the interaction between the system's modules takes place (Csete & Doyle, 2002; Dehghani, 2018).
The constraints that we have discussed directly translate to the computational operations of the thalamocortical system as we discussed. Instead of trying to deal with just one fitness function at a time (where the minima of the landscape would be deemed as “the” optima), the brain has to perform a multiobjective optimization, finding solutions to both metabolic cost (economy) and just-in-time (efficiency) computation. Thus, we can infer that a unique solution does not exist for such a problem. Rather, any optimization for computational efficiency will cost us economy, and any optimization for economy will cost us efficiency. In this case, a multiobjective optimization Pareto frontier is desirable. A Pareto frontier of information and cost will be the set of solutions where any other point in the space is objectively worse for both of the objectives (Godfrey, Shipley, & Gryz, 2006; Kung, Luccio, & Preparata, 1975; Szekely et al., 2013). As a result, the optimization mechanism should push the system to this frontier. The iterative dynamical interaction between the thalamus and cortex seems to provide an elegant solution for this problem. (We discuss this in more detail below.)
In addition to these theoretical rationales, we also wish to point to some observations that support the emergent optimization in the thalamocortical system. For example, metabolic studies have shown that following thalamic injuries, a misbalance in cortical metabolism ensues (Baron et al., 1986, 1992; Larson et al., 1998; Levasseur et al., 1992). Moreover, in healthy humans (and not patients with mood disorder), the metabolic rate of the thalamus directly relates to the power of cortical oscillations (Lindgren et al., 1999). The misbalance in cortical metabolism has been observed in a variety of nuclei damages, but it is especially pronounced in mediodorsal, center median, or pulvinar injuries (Baron et al., 1986, 1992). In addition, low-frequency and high-frequency stimulation of the MD can induce long-term depression or potentiation or in mPFC (medial PFC); however, the exact sign and magnitude of the differential modulation of thalamo-prefrontal functions under low- and high-input drive depend on the lack or presence of muscarinic and nicotinic modulation (Bueno-Junior, Lopes-Aguiar, Ruggiero, Romcy-Pereira, & Leite, 2012). Just as thalamic injury or modulation can change the cortical activity and metabolism, cortical injuries (due to stroke, for example) can cause an attenuation of the excitatory feedback to the thalamus and lead to thalamocortical dysrhythmia (van Wijngaarden, Zucca, Finnigan, & Verschure, 2016). Regardless of where the initial injury has occurred, the disrupted thalamocortical interaction is conjoined with a misbalance in metabolism. The resultant out-of-balance activity leads to cognitive disorders that can happen in the form of disrupted information processing due to cortical hypersynchrony as a result of excessive thalamic spiking (Kim et al., 2011) or faulty modulation of sensory signals and loss of the normal correlation between glucose metabolism in the thalamus and PFC (Byne et al., 2001; Katz et al., 1996). The exact celullar or subcellular mechanism that lies beneath the joint fluctuations of firing and metabolic of cortex and thalamus is not very well understood, and a number of mechanisms may act (not necessarily exclusively). For example, reduced cortical feedback may lead to thalamic hyperpolarization, and the resultant deinactivation of voltage-gated T-type Ca channels may cause the neurons to switch from tonic spiking to a pathological bursting (van Wijngaarden et al., 2016). Or it could be that the thalamic drive of the inhibitory neurons in the cortex not only directly affects the cortical mode of firing (Fan, Duan, Wang, & Luan, 2017) but also changes the glycogenolysis in astrocytes through vasoactive intestinal peptide (VIP) interneurons (Magistretti, 2006; Magistretti & Allaman, 2015). Interestingly, and in contrast to the noradrenergic afferent fibers that span horizontally across cortical domains, VIP neurons have a bipolar architecture, and therefore their effect is spatially limited (Magistretti & Allaman, 2015), likely correlated to the size of the functional assemblies that are recruited to perform a computational task. Whichever the exact mechanism at the cellular level is, the collective activity of modulatory thalamus and cortex drives the optimization that inherently cannot be controlled by the information available at the scale of single neurons and solely in cortex.
To formalize multiobjective optimization, consider a set of functions: and of (firing rate of thalamic cell) and (firing rate of cortical cells). Uncertainty (or its opposite, information) and computational cost (a mixture of time and metabolic expense) can both be mapped to this functional space of and (see Figures 6A1 and 6A2). We define computational cost and information as product and linear sum of cortical and thalamic activity (, ; with as coefficients) to reflect the logarithmic nature of information (entropy) and the fact that biological cost is an accelerating function of the cost-inducing variables (Dekel & Alon, 2005). The hypothetical space of cost and information is depicted in Figure 6, where the top panels show indifference maps of information in panel A1 and cost in panel A2. The example simulations and parametric plots of the cost and information functions defined as above are shown in Figure 7. In each indifference map, along each iso-quant curve, the total functional attribute is the same. For example, anywhere on the curve, the uncertainty (or information) in our computational engine is the same. However, different iso-quant curves represent different levels of the functional attribute. For example, moving outward increases information (reduces uncertainty) as , and thus if computational cost were not a constraint, the optimal solution would have existed on or farther away (see Figure 6A1). In contrast, moving inward would preserve the cost () if the computational engine did not have the objective of reducing uncertainty (see Figure 6A2). Since information and cost are interdependent and both depend on the interaction between the thalamus and cortex, we suggest that information and cost optimization happens through an iterative interaction between the thalamus and cortex (note the blackboard analogy and contextual modulation discussed above). Since we defined both information and cost as a set of iso-quant curves in the functional space of and , they can be corepresented in the same space (see Figure 6B). Optimal solutions for information and cost optimizations are simply the solutions to where the tangents of the iso-quant curves are equal (see the tangents—black dashed lines—and points A, B, C, and D, in Figure 6B). These points create a set of optimal solutions for the trade-off between information and cost (green curve). Mapping of the optimal solutions to the computational efficiency space gives us the Pareto efficient curve (see the cyan curve in Figure 6C). Anywhere inside the curve is not Pareto efficient (i.e., information gain and computational cost can change in such a way that, collectively, the system can be in a better state on the Pareto curve). Points outside the Pareto efficient curve are not available to the current state of the system due to the coefficients of and . A change in these coefficients can potentially shape a different corepresentation of information and cost (see Figure 7, top row, for three different instances of and based on different values), and thus a different Pareto efficient curve (see Figure 7, bottom row). These different possible Pareto frontiers can be set based on the prior state of the system and the complexity of the computational problem on hand. For example, the modulatory MD-like thalamic triggering of feedforward inhibition of layer I interneurons and layers II/III pyramidals (Cruikshank et al., 2012) may tune the cortical activity to a sustained profile of arousal during wakefulness (Harris & Thiele, 2011). Or in contrast to the relay thalamus where maximum responsiveness to transient signals (such as sensory stimuli onset/offset) is needed (Bruno & Sakmann, 2006; Rose & Metherate, 2005), the MD-like modulatory drive may be invoked for tasks where working memory and contextual processing are needed (Delevich et al., 2015; Parnaudeau et al., 2013). Nonetheless, the computational efficiency of the system can not be infinitely pushed outward because of the system's intrinsic biophysical constrains (neurons and their wiring). The shaded region in Figure 7, bottom row, shows this nonpermissible zone.
In the defined computational efficiency space , composed of the two variables information and cost (as the objective functions, shown in bottom panels of Figures 6 and 7), solving a computational problem is represented by a decrease in uncertainty. However, any change in uncertainty has an associated cost. The first derivative of the Pareto frontier shows the marginal rate of substitution as . This ratio varies among different points on the Pareto efficient curve. If we take two points on the pareto curve in the computational efficiency space, such as A and C, for example, the computational efficiency of these two points is equal to . The changes in the efficiency of point A with respect to information and cost are the partial derivatives and , respectively. As a result, , meaning that there is constant efficiency along the Pareto curve; the trade-off between information and cost is not constant. The optimization in this space is not based on some fixed, built-in algorithm or arbitrary thresholds by an external observer. Rather, information and cost optimization is the result of back-and-forth interaction between the thalamus and cortex. Based on the computational perspective that we have portrayed, the thalamus seems to be poised to operate as an optimizer. The thalamus receives a copy of (sensory) input while relaying it and receives an efferent copy from the processor (cortex) while trying to efficiently bind the information from the past and present and sending it back to cortex. The outcome of such emergent optimization is a Pareto front in the economy-efficiency landscape (see Figures 6 and 7). If the cortex were to be the sole conductor of cognitive processing, the dynamics of the relay and cortical processing would meander in the parameter space and not yield any optimization that can provide a feasible solution to economic and just-in-time computation. Such a system is doomed to fail due to either metabolic costs or computational freeze over time; thus, it would more or less be a useless cognitive engine. In contrast, with the help of an optimizer that acts as a contextual modulator, the acceptable parameters will be confined to a manifold within the parameter space. Such a regime would be a sustainable and favorable domain for cognitive computing. This property shows another important facet of a thalamocortical computational cognitive system and the need to move passed the cortico-centric view of cognition. An important consequence of this formalization is that it provides testable hypotheses for objectively evaluating information and cost optimization. By careful simultaneous measurements of thalamic and cortical collective activity, during different states and under different neurotransmitter modulatory effects, one should be able to examine the distinctive interaction of cortex and MD-like versus relay thalamic nuclei. Although we wish to emphasize that while information processing is a fundamentally energy-consuming process (Bennett, 2003; Parrondo, Horowitz, & Sagawa, 2015) and one can drive theoretical estimates of the energetic cost of the activity of a population of neurons, the exact translation of bits to watts in adaptive information processing systems (such as thalamocortical) can be verified only experimentally (Flack, 2017). Without proper and careful measurements, it is impossible to predict how much more reliable the collective computation could get at the expense of energy (Ay, Flack, & Krakauer, 2007). Also, the degree to which the energy is traded for accuracy or speed (or their combination) will be a hard challenge for the experimentalists measuring the collective activity (Lan, Sartori, Neumann, Sourjik, & Tu, 2012).
7 Conclusion: Reframing Thalamic Function above and Beyond Information Relay
New evidence about the possible role of the thalamus has started to challenge the cortico-centric view of perception and cognition. Anatomical studies and physiological measurements have begun to unravel the importance of the cortico-thalamo-cortical loops in cognitive processes (Basso et al., 2005; Parnaudeau et al., 2018). Under this emerging paradigm, the thalamus plays two distinctive roles: information relay and modulation of cortical function (Sherman & Guillery, 2013), where the neocortex does not work in isolation but is largely dependent on the thalamus. In contrast to cortical networks that operate as specialized memory devices via their local recurrent excitatory connections, the thalamus is devoid of local connections and is instead optimized for capturing state information that is distributed across multiple cortical nodes while animals are engaged in context-dependent task switching (Schmitt et al., 2017). This allows the thalamus to explicitly represent task context (corresponding to different combinations of cortical states), and through its unique projection patterns to the cortex, different thalamic inputs modify the effective connections between cortical neurons (Bolkan et al., 2017; Schmitt et al., 2017).
We started with a brief overview of the architecture of the thalamus and the back-and-forth communication between the thalamus and cortex; then we provided electrophysiological evidence of thalamic modulatory function and concluded with a computational frame that encapsulates the architectural and functional attributes of the thalamic role in cognition. In such a frame, the computational efficiency of the cognitive computing machinery is achieved through iterative interactions between the thalamus and cortex embedded in the hierarchical organization (see Figures 4 and 5). Under this emergent view, the thalamus serves not only as relay but also as a read and write medium for cortical processing, playing a crucial role in contextual modulation of cognition (see Figure 8). Such multiscale organization of computational processes is a necessary requirement for the design of the intelligent systems (Dehghani, 2017; Simon, 1962, 1969). Distributed computing in biological systems in most cases operates without central control (Navlakha & Bar-Joseph, 2014). This is well reflected in the computational perspective that we have discussed. We suggest that through the continuous contextual modulation of cortical activity, the thalamus (along with the cortex) plays a significant role in the emergent optimization of computational efficiency and computational cost. This phenomenon has a deep relation with phase transitions in complex networks. Different states (phases) of the network are associated with the connectivity of the computing elements (see the thalamic weight and pointer and cortical function and weight modules in Figure 5). Interestingly, intrinsic properties of the complex networks do not define the phase transitions in the system. Rather, the interplay of the system with its external environment shapes the landscape where phase transitions occur (Seoane & Solé, 2015). This parallel in well-studied physical systems and neuronal networks of the thalamo-cortical system shows the importance of the interplay between the thalamus and cortex in cognitive computation and optimization. The proposed frame for contextual cognitive computation and the emergent information and cost optimization in the thalamocortical system can guide us in designing novel AI architecture.
We thank Michael Halassa for helpful discussions.