We present a computational model that highlights the role of basal ganglia (BG) in generating simple reaching movements. The model is cast within the reinforcement learning (RL) framework with correspondence between RL components and neuroanatomy as follows: dopamine signal of substantia nigra pars compacta as the temporal difference error, striatum as the substrate for the critic, and the motor cortex as the actor. A key feature of this neurobiological interpretation is our hypothesis that the indirect pathway is the explorer. Chaotic activity, originating from the indirect pathway part of the model, drives the wandering, exploratory movements of the arm. Thus, the direct pathway subserves exploitation, while the indirect pathway subserves exploration. The motor cortex becomes more and more independent of the corrective influence of BG as training progresses. Reaching trajectories show diminishing variability with training. Reaching movements associated with Parkinson's disease (PD) are simulated by reducing dopamine and degrading the complexity of indirect pathway dynamics by switching it from chaotic to periodic behavior. Under the simulated PD conditions, the arm exhibits PD motor symptoms like tremor, bradykinesia and undershooting. The model echoes the notion that PD is a dynamical disease.
Reaching movements are to motor function what the simple pendulum is to classical mechanics. Although reaching movements are straightforward to understand, they are interesting to researchers of motor function since they are supported by a full array of motor areas in the brain. From a clinical point of view, reaching movements have diagnostic value since various motor disorders, like Parkinson's disease (PD), manifest characteristic changes in reaching. PD patients are known to exhibit slower reaction times and movement times in simple movements aimed at targets (Brown & Jahanshahi, 1996). The first agonist burst in PD reaching movements is also observed to be weaker than in normals, resulting in longer and multistaged reaching movements requiring multiple agonist bursts. It has been suggested that part of the reason behind the longer movement times (MTs) in PD movements is that patients adopt a closed-loop strategy to execute movements that normal subjects execute in fast, open-loop fashion (Flowers, 1976). Though PD patients are capable of making fast ballistic movements, such performance results in impaired accuracy (Sheridan & Flowers, 1990). Another aspect of PD movement is the greater variability in movement end point for larger movements. Thus, bradykinesia, which refers to the relative slowness of Parkinsonian movement and the closed-loop mode of operation, seems to be the strategy that PD patients adopt to compensate for their inability to make consistent, large-amplitude movements.
Understanding PD reaching movements requires understanding the causal relationship between PD-related dopamine deficiency in the basal ganglia (BG) and arm movements. Such understanding is best formulated in terms of a computational model. Bischoff (1998) presents a model of Parkinsonian arm control involving a simple reaching task and a reciprocal aiming task. Under PD conditions, the model exhibits bradykinesia and impaired ability to make sequential movements. Cutsuridis and Perantonis (2006) have modeled PD bradykinesia with an extensive model that includes, in addition to dopamine projections to BG and cortex, dopamine projections to the spinal cord. The model successfully reproduces aspects of PD bradykinesia in terms of electromyographic (EMG) and movement parameters. Weaknesses of this model include the absence of explicit representations of BG nuclei and the absence of a mechanism for learning reaching movements.
A perspective of BG function that has been gaining strength for over a decade is the idea that BG forms a neural substrate for reinforcement learning (RL), a branch of machine learning inspired by instrumental conditioning (Joel, Niv, & Ruppin, 2002). Although a good number of BG models are not RL based, most of them address only specific aspects of multitudinous functions of BG. Efforts are underway to explain the rich variety of BG functions solely within the RL framework (Chakravarthy, Joseph, & Bapi, in press).
A key feature of the proposed BG model is its interpretation of the role of the BG indirect pathway, which in the past has been given a varied and tentative interpretation, including withholding of action (Albin, Reiner, Anderson, Penney, & Young, 1990; Frank, 2005), focusing and sequencing (Hikosaka, Takikawa, & Kawagoe, 2000), action selection (Redgrave, Prescott, & Gurney, 1999), and switching (Isoda & Hikosaka, 2008). We have been developing a line of modeling that hypothesizes that the indirect pathway subserves exploratory behavior (Sridharan, Prashanth, & Chakravarthy, 2006). Thus, the direct pathway and indirect pathway play complementary roles, whereby the direct pathway subserves exploitation while the indirect pathway supports exploration. The presence of complex dynamics in the indirect pathway justifies its putative role in exploration, and degradation of such complex activity to more regular forms of activity like synchronized bursts is hypothesized to contribute to impaired movement. Experimental evidence that is consistent with such a hypothesis is reviewed in section 5. As a departure from the traditional description of BG functional anatomy according to which the direct pathway and indirect pathway support the go and no-go regimes, respectively, we propose a third regime: the explore regime, which comes between the go and no-go regimes. This explore regime is also supported by the indirect pathway.
In this letter, we describe a model of BG that essentially belongs to the RL class of BG models. In this model, the dopamine (DA) signal is related to incremental changes in error between the target position and the position of the end effector of the arm. The DA level switches the transmission between the direct pathway and indirect pathway in BG (Clark, Boutros, & Mendez, 2005). BG output is used as a corrective signal to the motor cortex (MC). This combined output of MC and BG is used to control the arm. As learning in MC progresses, the motor cortex becomes gradually independent of BG and begins to perform relatively independent of the modulatory influence of BG. Parkinsonian pathology is also captured naturally by the model through reduction of the dopamine level (temporal difference (TD) error, and by degrading the complex dynamics of indirect pathway.
The letter is organized as follows. Section 2 presents a brief background of BG structure and function. Section 3 describes the model architecture, including training dynamics and measures of performance evaluation. Numerical simulations with the model training and testing under normal and Parkinsonian conditions are described in section 4. A discussion of the work is given in the final section.
2.1. Basal Ganglia Circuitry.
The BG comprises a group of subcortical nuclei that form a highly interconnected network of modules. The caudate nucleus and the putamen, together referred to as the striatum (STR), are the major input nuclei. The striatum receives inputs from a number of cortical areas and the thalamus. Another input port of BG, which is not, however, typically considered so, is the subthalamic nucleus (STN), which, like the striatum, also receives inputs from the cortex. An internal module of BG, which is not directly connected to cortex or thalamus, is the globus pallidus externa (GPe), which is thought to play a central role in BG according to more recent perspectives on BG circuitry (Obeso, Rodriguez-Oroz, Blesa, & Guridi, 2006; Nambu, 2008). The BG has two output nuclei: the globus pallidus interna (GPi) and substantia nigra pars reticulata (SNr). The output nuclei mainly target three nuclei: the thalamus, pedunculopontine nucleus, and superior colliculus. The activity of basal ganglia is modulated through constant feeds of the neurotransmitter dopamine (DA) from substantia nigra pars compacta (SNc) via the nigrostriatal pathway. The degeneration of neurons in SNc, whose axons form this pathway, is known to cause idiopathic Parkinson's disease. Traditionally signal propagation through the BG is thought to occur via two alternative pathways: the direct pathway, which includes STR GPi/SNr, and the indirect pathway, which consists of STR GPe STN GPi. Dopaminergic transmission from SNc has a differential effect on striatal neurons according to striatum dopamine level: at smaller dopamine levels, the indirect pathway is selected, and an increase in striatal dopamine shifts the balance toward the direct pathway, thereby increasing overall motor activity. Thus, the indirect pathway is the normally active pathway. The balance is switched just before movement onset, when dopamine release to striatum activates the direct pathway (Clark et al., 2005).
Although for a long time, BG were thought to support motor functions exclusively, it is now recognized that BG also have a role in cognitive, affective, and autonomous functions. BG circuitry is involved in a great range of functions, including (1) reward-based learning (Schultz, 1998), (2) exploratory and navigational behavior (Packard & Knowlton, 2002), (3) goal-oriented behavior (Cohen, Braver, & Brown, 2002), (4) motor preparation (Alexander, 1987), (5) working memory (Cohen et al., 2002), (6) timing (Buhusi & Meck, 2005), (7) action gating, (8) action selection (Redgrave et al., 1999), (9) fatigue (Chaudhuri & Behan, 2000) and (10) apathy (Levy & Dubois, 2005). In spite of the significant progress in our knowledge of BG at several levels, it is still not clear how such an overwhelming range of functions is supported by the same subcortical circuit.
2.2. Reinforcement Learning—Dopamine.
A key idea that opens doors to understanding BG function is the idea that the activity of dopaminergic cells in BG represents reward signaling (Schultz, 1998). More precisely, dopamine neurons are activated by rewarding events that are better than predicted, remain unaffected by events that are as good as predicted, and are depressed by events that are worse than predicted. Thus, the dopamine signal seems to represent the error between predicted future reward and actual reward (Montague, Dayan, & Sejnowski, 1996).
Interestingly, a quantity known as temporal difference error, analogous to the error between predicted and actual future rewards, plays a key role in reinforcement learning (RL), a branch of machine learning. This conceptual association enabled the application of RL concepts (Sutton & Barto, 1998) to BG research (Joel et al., 2002). RL studies how an agent learns to respond to stimuli optimally without an explicit teacher; the agent's learning process is driven by reward or punishment signals that come from the environment in response to the agent's actions. Responses that result in rewards are reinforced, and those that lead to punishment are avoided. Actor, critic, and explorer are key components in a typical RL framework. The critic is a module that estimates the reward-giving potential, the value (, of the current state. The actor uses the gradient in to choose actions that increase . In the model examined here, when the gradient is absent or too weak, the choice of actions becomes increasingly stochastic. This stochasticity in choice of actions is identified with the explorer.
3. The Complete Model: Arm, Basal Ganglia, and Motor Cortex
Figure 1 depicts the architecture of the arm control system including the BG circuit, the motor cortex (MC), and the two-link arm model (AM). The motor task on which the system is trained consists of commanding the end effector of the arm from the initial central position to one of the four surrounding targets. Information corresponding to the ith target is coded in the target selection vector () such that ith component ( is set to 1, while all the other components equal 0. The target selection vector is presented to both MC and BG (see Figure 1). Outputs of MC and BG are combined to produce g, which represents the activations given to the four muscles of the two-link arm. The output of the BG may be regarded as a correction to the output of MC in controlling the arm. The basis of this correction is the error information associated with the relative position of the arm with respect to the target; this error is coded as the dopamine signal available to the BG. Thus, in the model, the role of BG is twofold: (1) to provide real-time corrective information to MC based on error information conveyed by the nigrostriatal dopamine signal and (2) to use this corrective signal to train the cortex on the motor task at hand.
3.1. Inputs and Outputs of BG Model
3.1.1. Motor Cortex.
3.1.2. Basal Ganglia Model.
The BG part of the model has four key components: the critic, which is implemented in the striatum; the direct pathway; the indirect pathway; and the TD error, , which represents the dopamine signal arising out of SNc.
3.1.4. Dopamine Signal.
3.2. Direct and Indirect Pathways.
3.3. Training MC.
3.4. Arm Model.
3.5. Modeling Parkinsonian Dynamics.
The map parameter K controls the transitions between fixed point (, periodic (for ( (approximate)), and chaotic ( behaviors of the logistic map (May, 1976). However, note that even in the chaotic regime (, there are the so-called islands of stability—small ranges of K where the map exhibits periodic behavior. However, these islands are not likely to be detected in our simulations since we have not scanned the space of K at sufficiently high resolution. In the simulations in the following section, we designate 4 to correspond to the normal condition and use smaller values of K down to 3 to simulate PD-related degeneration.
4. Simulations: Normal and PD Reaching Movements
4.1. Normal Reaching Movements.
The model described in the previous section is used to reach the four targets shown in Figure 1. Training simulations are run for 20 epochs, where each epoch consists of reaching (or making time-limited attempts to reach) all four targets. The weights of MC are randomly initialized between −0.5 and 0.5. Each reaching movement lasts for at most 100 time steps, or until the arm serendipitously comes sufficiently close to the target. Thus, reaching movements are made once toward each of the four targets in one epoch. This process is repeated for 20 epochs, at the end of which, MC is almost completely trained. Even if training is continued beyond 20 epochs, MC error does not reach 0, but fluctuates around a small, positive value. The labile influences coming from BG to train MC play a dual, and mutually conflicting, role. This variability, however, is necessary to explore the output space and discover rewarding increments to muscle activations. The same variability, however, prevents the MC from learning further once a low error value is reached. It is for this reason that we increase (MC's contribution to movement), and decrease (BG's contribution), as a function of training error, as training progresses. Numerical values of various parameters used in the model are listed in Table 1.
|Parameter .||Value .||Description .|
|A||2||Amplitude of the reward and value functions|
|R||3||Spread of the value function|
|Rtol||0.3||Radius of the tolerance circle|
|a||0.1||Scaling factor for DA and DA in BG function|
|0.2||Learning rate of MC|
|0.03||Standard deviation of the gaussian used in reward calculation|
|B||0.04||Scaling factor for logistic map function|
|Parameter .||Value .||Description .|
|A||2||Amplitude of the reward and value functions|
|R||3||Spread of the value function|
|Rtol||0.3||Radius of the tolerance circle|
|a||0.1||Scaling factor for DA and DA in BG function|
|0.2||Learning rate of MC|
|0.03||Standard deviation of the gaussian used in reward calculation|
|B||0.04||Scaling factor for logistic map function|
The parameters of Table 1 are chosen by experimentation keeping in view the various trade-offs involved. For example, a controls the DA thresholds in equation 3.6. Larger values of a increase the time spent in exploration. Similarly, B controls the amplitude of exploration. A larger value of Rtol increases the probability of a successful reach but worsens reaching error. Variation of these parameters within a small range around the currently used values did not exhibit any sudden unexpected changes in system behavior. However, a systematic sensitivity analysis of the above parameters could form part of a separate study.
The evolution of the reaching performance over the epochs is characterized by three metrics: the MC performance error, reaching duration, and path variability. Figure 2 shows the trajectories of the arm in the first epoch. Since the MC is untrained, the arm makes long, wandering movements to reach the target. Arm trajectories are nearly straight in the last (twentieth) epoch (see Figure 3). Figure 4 shows the reaching movements made by the arm under the sole influence of MC, without the BG contribution. This is done by setting 1 and 0. Note that the perturbative influence of BG is absent in this case. Although the average reaching duration appears to decrease in the mean with learning, the trend does not seem to be significant when the error bars are considered (see Figure 5). Note that the error bars in all figures denote standard deviation. As the MC learns to reach, the initial movement, which is driven by the MC, arrives closer and closer to the target; thus the time-consuming wandering search for the target is reduced as learning progresses. For the same reason, path variability is also reduced as training progresses (see Figure 6). Naturally, since the goal of training is to train the MC to reach, MC reaching error decreases with epochs (see Figures 7 and 8).
The classical description of the function of direct pathway and indirect pathway associates direct pathway with movement facilitation and indirect pathway with movement inhibition. In the model here, we propose that the dynamics of the go and no-go regime are opposite to each other: the respective changes in BG output in the two regimes have opposite signs. However, it may be argued that a simpler way to implement the no-go regime is to let the BG output remain unaltered. We implemented this variation of the regime and found that the results were qualitatively the same (see appendix B). Therefore, we continue with the formulation of regimes as depicted in equations 3.5 and consider their consequences in Parkinsonian conditions in the next section.
4.2. PD Reaching Movements.
Simulations of PD-related pathology are based on three types of models. In the type A PD model, both dopamine reduction (−0.5 DA 0.5) and reduced complexity of indirect pathway dynamics (3 K 4) are incorporated. In the type B PD model, only dopamine reduction is implemented (−0.5 DA 0.5, and K = 4). In the type C PD model, only the reduced complexity of indirect pathway dynamics (3 4) is incorporated with no reduction in dopamine (DA 0.5).
We define a few metrics to characterize reaching performance in PD conditions:
The undershoot factor, which quantifies the extent by which the final position of the arm undershoots the target
The tremor factor, which quantifies the tremor seen in arm move- ments
Average velocity to quantify bradykinesia
Since the loss of dopaminergic cells in SNc is the etiology of idiopathic PD, it would be natural to describe the degree of PD by the percentage loss of DA cells. In this simulation, the degree of PD pathology is expressed by the quantity Dceil, which clamps the DA signal . We now define a quantity, PDA, which represents the percentage of DA cell loss and relate it to Dceil. Note that can take both positive and negative values, typically varying between −0.5 and 0.5 in the simulations. When 0, Dceil can take its highest value of 0.5, and when 1, Dceil takes its lowest value of −0.5. Thus, we have, . For a given trial, there are 20 epochs for each 5% of DA loss, and each epoch lasts a maximum of 100 iterations; if the arm freezes for more than 10 time steps, the reach is terminated. There are 10 trials for each DA level. Trials represent repeated simulation for the same DA level. Such repetition is necessary to examine the level of variability in reaching.
In the three types of PD simulations, we start with the MC fully trained under normal conditions (as in section 4.1) and continue to train it under the pathological conditions by varying PDA and K. Variation of various metrics like undershooting presents three possible scenarios of PD disease progression.
4.2.1. PD Model, Type A
In this PD model, both dopamine reduction (−0.5 DA 0.5) and reduced complexity of indirect pathway dynamics (3 K 4) are incorporated. DA and K are related to PDA as follows: and .
PD patients are known to often undershoot targets in reaching performance (van Gemmert, Adler, & Stelmach, 2003). This is clearly seen in Figure 9, where undershooting worsens with the increasing loss of DA cells (PDA). At around 50% loss of DA cells, undershooting reaches nearly its minimum and does not change significantly henceforth. Figure 10 shows a snapshot of reaching trajectories with undershooting. Note that apart from undershooting the target, there is also a large error in reaching direction.
Tremor also increases with increasing PDA up to about PDA = 50%; henceforth, it quickly drops to 0 at PDA = 60% and remains at 0 for larger values of PDA (see Figure 11). This development may be accounted for as follows. As PDA is increased, DA is also reduced, and spends more and more time in the explore and no-go regimes. Such exaggerated exploration, occurring in place of a straight target pursuit corresponding to the go regime, seems to manifest as tremor. As PDA is increased further, is always confined to the no-go regime, regardless of the actual performance of the arm. Thus, the arm enters a relatively frozen state with no tremor. Ramifications of this change can be seen in average velocity also. Average velocity decreases with increasing PDA, reaching a small average velocity at about and remaining there for larger values of PDA (see Figure 12). Although the disease pathology is confined to BG, these performance error of MC gradually increases with increasing PDA and saturates at about (see Figure 13). Thus, undershooting and average velocity seem to show a common pattern: a nearly gradual worsening up to about and a subsequent relatively frozen condition marked by a paucity of movement. However, tremor gradually increases to a peak value, before falling rapidly around These patterns seem to be reflected in the variation of time spent in various regimes (see Figure 14). Variation of time spent in the explore regime seems to resemble a variation of tremor, which undershooting and average velocity variation seem to follow the variation of time spent in the go regime (see Figure 14).
Normal reaching is accompanied by a large, initial agonist burst, followed by an antagonist burst, which sometimes is followed by a second, smaller agonist burst (see Figure 15, top). Note that all agonist burst plots correspond to the variation of the first component of the muscle activation vector (representing the shoulder) as the arm reaches target 1. We have not included other components, since they show similar behavior. Time zero in all agonist burst plots corresponds to the start of the reaching movement. Such biphasic response seen in the normal case does not appear in type A PD results, which show a nearly monotonic build-up of activity (see Figure 15, bottom).
4.2.2. PD Model Type B.
In this PD model, only dopamine reduction (−0.5 DA 0.5; 4) is incorporated. DA is related to PDA as follows: .
Although the general trends are similar to those seen in case of the type A PD model, there is an important difference. For example, if we consider the variation of undershooting with PDA, in the case of a type A model, there is a gradual reduction followed by saturation. However, in the case of a type B model, undershooting remained nearly constant and fell drastically at a PDA value of about 50%, without much subsequent variation (see Figure 16). Figure 17 shows a snapshot of undershooting in this case. Tremor also remains nearly constant until 50%, falling abruptly to 0 thereafter (see Figure 18). However, tremor exhibits a sharp transient rise at 50% before it falls to 0. A similar step-like change is observed in average velocity also (see Figure 19). However, actor (MC) error shows insignificant variation with P (see Figure 20). In this case, too, variation of symptoms reflects variation of time spent in various regimes. For instance, as in the previous case, variation of average velocity and undershooting resembles variation of time spent in the go regime, while variation of tremor resembles the explore regime (see Figure 21). In this case, too, agonist burst shows a monotonic variation (see Figure 22, bottom), compared to a biphasic response of a normal case (see Figure 22, top).
Thus, unlike the type A model, symptoms in a type B model show a step-like variation, with the symptoms remaining constant up to a critical value of 50%, thereafter transitioning to a permanently worse state. This can perhaps be accounted as follows: the loss of DA neurons might be compensated by the intact indirect pathway dynamics, and this balance is perhaps disturbed when reaches a critical level (in this case, 50%).
4.2.3. PD Model, Type C.
In this PD model, only reduced complexity of indirect pathway dynamics () is incorporated with no reduction in dopamine ( 0.5). DA is fixed at 0.5, and K is related to PDA as: .
In type A simulations, we have seen a gradual variation of symptoms, followed by saturation at PDA of about 50%. In type B, we have seen a nearly constant profile up to a PDA of about 50%, followed by a sudden shift to another plateau. In type C, we see a generally gradual variation of symptoms (undershooting factor, Figures 23 and 24; tremor factor, Figure 25; average velocity, Figure 26) with no sharp transition at PDA of about 50%. Since DA is fixed, is allowed a full, unconstrained variation. Thus, the sharp transitions among the three regimes do not occur here. Whatever impairment is observed is due to the degradation of the complexity of exploration (reduction in K). Note that the boundaries between regimes are not fixed but vary as a function of actor error (see equation 3.6). The span of the explore regime is wider for a higher actor error. Thus, actor error increases as PDA increases (see Figure 27). As a consequence, for larger values of PDA, the system spends more time in the explore regime than the go regime (see Figure 28). Since never or rarely drops too low, a no-go regime is rarely selected (see Figure 28). In this case too, an agonist burst shows a monotonic variation (see Figure 29, bottom) compared to a biphasic response of a normal case (see Figure 29, top).
We have presented a model of Parkinsonian reaching dynamics. The model consists of MC, BG, and a two-link arm. The BG model is cast essentially in the framework of reinforcement learning, though we depart radically from the interpretation of neural substrates of various RL components. In line with an actor-critic type of BG models, we interpret the temporal difference error as the DA signal. The value function, which is thought to be computed in the striatum, is not learned but predefined in terms of distance of the arm's end effector and the target. The DA signal switches transmission between the direct and indirect pathway. Thus, BG output is dominated by direct or indirect pathway activity, depending on the magnitude of the DA signal. BG output in combination with MC output controls the arm. Thus, the perturbative corrections from the BG and the DA signal together help MC learn to reach. MC's dependence on BG gradually diminishes as training progresses. Thus, in the model, BG discovers the correct output by reward-related dynamics and transfers the knowledge to MC. A similar scenario of sequential learning between BG and cortex was described in the experimental literature. Studies on different time courses of learning in basal ganglia and prefrontal areas exhibit a similar sequencing (first basal ganglia and then prefrontal) in saccade-related behavior in monkeys (Pasupathy & Miller, 2005).
The DA signal in this letter does not distinguish between the two forms of DA release from mesencephalic dopamine centers reported in the experimental literature: the phasic release which acts on a timescale of seconds, and tonic release, which acts over a few minutes (Dreher & Burnod, 2002). Phasic release is linked to the difference in expected future reward and actual reward, a quantity described in the RL literature as the temporal difference error. Both tonic and phasic dopamine releases are thought to have differential roles in efficient updating of working memory information in the prefrontal cortex. Tonic DA is thought to increase the stability of maintained information in the PFC by increasing the signal-to-noise ratio of the pattern with respect to background noise. By contrast, phasic DA is thought to control when an activity has to be maintained or when it must be updated (Cohen et al., 2002). The possibility of interaction between tonic and phasic dopamine has also been considered. It has been suggested that tonic dopamine can regulate the intensity of phasic dopamine by the effect of the former on extracellular dopamine levels (Grace, 1991). On the whole, an extensive literature exists on the question of specific roles of phasic and tonic DA; a comprehensive theory of the role of these two forms of release on various cortical and subcortical targets continues to be elusive. In this letter, we do not distinguish between these two forms of DA release. However, the single variable shown in this letter is similar to TD error and therefore closer to phasic DA than tonic DA.
In simulations of normal reaching in section 4.1, there is an early stage when the arm exhibits prolonged wandering movements before it reaches the target. When the arm approaches the target accidentally, a reward is delivered, which is used to train the MC. Reaching movements become more direct and briefer as training proceeds. Thus, variability in reaching trajectories decreases with learning (see Figure 6). Studies with primate reaching patterns reveal an exponential reduction of variability with learning (Georgopoulos, Kalaska, & Massey, 1981). The exploratory movements of the arm, driven by the chaotic activity of indirect pathway, are reminiscent of the notion of motor babbling proposed in the context of imitation learning in infants (Meltzoff & Moore, 1997). Infants are thought to make random movements and, by confirmatory feedback from an adult in the environment, learn to relate the movements initiated and the end states of the body. A similar learning of articulatory-auditory relation also seems to be driven by the more familiar vocal babbling (Kuhl & Meltzoff, 1996).
In the PD reaching simulations of section 4.2, we considered three types of models. Typically PD pathophysiology is modeled purely in terms of a reduction in dopamine. However, in the type A PD model, we incorporated two factors related to PD pathology: dopamine reduction and reduction in the complexity of indirect pathway dynamics. Measures of impaired reaching movement like (longer) reaching duration indicating bradykinesia, tremor, and undershooting are calculated. All three types of PD models (A, B, and C) showed longer reaching duration, more tremor, and greater undershooting compared to normal. In type A, as disease progressed (increased loss of DA cells (PDA) and decreased complexity of indirect pathway dynamics), these measures gradually approached an extreme value before they saturated. In type B, the measures showed a step-like variation. Type C exhibits a smooth variation of symptoms, except for tremor, which exhibits a nearly flat peak.
The pattern of variation of symptoms seems to be reflected in the variation of time spent in various regimes. With increasing DA loss, the time spent in the go regime falls, and spends more time within the explore range. Thus, time spent in the explore regime increases. This increase continues until falls below a critical value and enters the no-go regime and remains mostly confined there. Thus, we see that as DA loss increases, the time spent in the explore regime gradually approaches a peak and falls subsequently to near zero (in types A and B). Since there is no restriction on in type C, the time spent in the explore regime increases monotonically. In general, the variation of time spent in the explore regime seems to resemble the variation of tremor, while undershooting and average velocity variation seem to follow the variation of time spent in the go regime. Experimental studies have linked tremor to changes in GP (Hurtado, Graym, Tamas, & Sigvardt, 1999) and STN (Hamani, Saint-Cyr, Fraser, Kaplitt, & Lozano, 2004)—in other word to changes in indirect pathway. In cases of akinetic rigidity, Albin et al. (1990) found a profound loss of striatal cells projecting to GPi, which constitutes the direct pathway. Based on the features of disease progression in Huntington's disease, another neurodegenerative disorder, it was suggested that degeneration of the direct pathway is responsible for rigidity and bradykinesia (Berardelli et al., 1999). In all three types in our model, the frozen, or rigid-like, state is associated with drastically reduced time spent in go regime, whose substrate is the direct pathway.
Increased movement duration in PD patients is a well-known clinical fact. In a study in which patients were asked to look and point to visual targets on a screen, PD patients took 24% more time to execute the movement than control subjects (Desmurget, Grafton, Vindras, Gréa, & Turner, 2003). Bradykinesia is thought to occur due to the failure of BG output to reinforce the cortical mechanisms that prepare and execute the commands to move (Berardelli, Rothwell, Thompson, & Hallett, 2001). In our model too, bradykinesia is a result of impaired interaction between BG and MC, which is caused by DA cell loss.
Undershooting the target is another prominent feature of goal-oriented PD movements. In a study in which PD patients were asked to copy target lines of fixed size, patients, compared to controls, undershot the required size when the target size is greater than or equal to 2 cm (van Gemmert et al., 2003). It is noteworthy that in PD patients, saccadic movements also typically undershoot targets, particularly in the vertical direction (White, Saint-Cyr, Tomlinson, & Sharpe, 1983). In the simulations of the previous section, error in reaching includes both undershooting and error in direction. However, reaching error in PD patients is typically dominated by undershooting with no significant error in direction. This discrepancy in model performance related to differential error in reaching direction and undershooting will have to be investigated further.
Tremor is another classic symptom of PD motor impairment. Our model too exhibits tremor, which might be different from the way it is characterized in the experimental literature. Tremor in the movement disorder literature is marked by the presence of strong oscillatory components in electromyogram (EMG), and PD tremor is sometimes found to correlate with abnormal neural activity in GP (Hurtado et al., 1999) and STN (Hamani et al., 2004). In our model, tremor is quantified as the root mean square (RMS) value of acceleration of the arm's end effector (see appendix A). Thus what we refer to as tremor, strictly speaking, denotes fluctuations in movement velocity, which is higher in the PD version of the model than in the normal condition. Furthermore, the tremor described in our model emerges during reaching and is therefore akin to action tremor. Although action tremor is found in PD patients, resting tremor is found more often than action tremor. One way of extending the current model to address the problem of resting tremor is to treat the resting state as another possible target location. Since a typical hand may be assumed to spend more time in the resting state than in any other state, this feature can be incorporated in the simulations. It would be interesting to note the differences in the action tremor and resting tremor that emerge from such a model.
Another aspect of tremor in the model is that in this work, we use a simple measure of tremor based on the RMS value of acceleration, but considering the suggested link with degradation of chaos in the indirect pathway, it would perhaps be more appropriate to perform chaotic time-series analysis on tremor and check if there is a reduction in chaoticity with disease progression. Such analyses will have a bearing on clinical data since it was reported that PD tremor displays reduced chaoticity due to the effect of treatment (Yulmetyev et al., 2006). These alternative measures of tremor will be the subject of future work.
Dounskaia, Fradet, Lee, Leis, and Adler (2009) characterize movement irregularities in PD reaching using a measure called the normalized jerk score (NJS) and show that this score in PD patients is greater than in normal subjects. Since PD movements are known to have abnormal fluctuation in velocity and acceleration, the magnitude of jerk, which refers to the temporal derivative of acceleration, is understandably higher in PD patients than in normals. The proposed model is a lumped model of BG, which aims to embody the essence of BG dynamics. It is meant to present a picture of BG in which the direct pathway subserves exploitation and indirect pathway subserves exploration (see Figure 30) and in this respect departs radically from existing BG modeling literature. It attempts to present only the large-scale picture and is not meant to be a detailed, network-level, or biophysical model of BG function. It is a systems-level model that aims to link PD pathology at a circuit level with its behavioral manifestations in reaching. To achieve such a wide scope, model components have been simplified. The arm and the muscles involved are static models, and therefore arm dynamics are produced purely by temporal variation in muscle activation. A more realistic model would use a dynamic arm and also incorporate a forward model necessary to control the arm. The actor/MC is also a static model, a perceptron, which happens to be adequate to the problem at hand. The value function is precalculated and is not trained by , as it should be in a full RL framework. There is also no explicit representation of striatum or corticostriatal connections modifiable by DA signals. A novel feature of the model presented here is to represent part of the indirect pathway dynamics using a chaotic system and suppress its chaoticity to represent PD pathology. We envision two stages of future development of this model. In the first, each model component is replaced by networks of abstract neurons with appropriate dynamics. The second stage would consist of biophysical neuron models, with the model architecture closely complying BG anatomy.
Another novel feature of the proposed model is that it attempts to capture disease progression in a neurodegenerative disorder (NDD) like PD, as opposed to contrasting normal function with disease state at a particular level. NDDs, whose incidence seems to be increasing dramatically, are marked by a progressive impairment in function. Understanding the nature of this progressive impairment is complicated by the fact that the impairment is usually associated with high short-term fluctuation in symptoms (Walker et al., 2000). Although neurological deficits in such cases are thought to be related to neuronal cell loss, more recent findings suggest that the situation could be more complicated (Terry et al., 1991). Behavioral impairment in NDDs is associated with the formation of abnormal protein assemblies (plaques, tangles, and inclusion bodies), neuronal cell loss, and network dysfunction (Palop, Chin, & Mucke, 2006). An integral understanding of disease progression entails progress in understanding at all the above levels. Another tricky affair in NDDs is to be able to distinguish between a co-pathologic and a compensatory change. Only an integral understanding of NDD progression at the network level will help the development of effective therapies for NDD. The model presented here marks a step in that direction for the specific case of PD.
5.1. STN-GPe and Exploration.
A key idea that is embodied in our model—an idea that is developed from earlier work (Sridharan et al., 2006; Gangadhar, Joseph, & Chakravarthy, 2008)—is that the STN-GPe system, which constitutes the indirect pathway, plays the role of the explorer in BG dynamics. RL-based or actor-critic models are an important class of models describing BG function. Of the three key components of RL—actor, critic, and explorer—substrates to both actor and critic have been located within the BG nuclei; however, no subcortical substrate to the explorer has been discovered in experimental work or suggested in modeling studies. Functional imaging studies identify two cortical substrates of exploration—anterior frontopolar cortex and intraparietal sulcus—but no subcortical counterpart of exploration has been found (Daw, O'Doherty, Seymour, Dayan, & Dolan, 2006). On the other hand, the roles attributed to the STN-GPe have been variable and tentative ranging from movement inhibition (Albin et al., 1990; Frank, 2005), to focusing and sequencing (Hikosaka et al., 2000), action selection (Gurney, Redgrave, & Prescott, 2001), switching (Isoda & Hikosaka, 2008). Thus, we try to fit the peg of a missing subcortical substrate for exploration into the hole of a tentative understanding of STN-GPe function and propose that the STN-GPe system is the subcortical substrate for exploration.
The STN-GPe loop is often studied as a single unit, perhaps since oscillations produced by this loop have fascinated many researchers (Terman, Rubin, Yew, & Wilson, 2002). Based on their studies of BG organotypic tissue cultures, Plenz and Kitai (1999) have proposed that correlated activity can arise in both STN and GPe structures and is caused by the interaction of the two structures rather than being driven by an external source. Recent experimental studies have revealed prominent low-frequency periodicity (4–30 Hz) of firing and dramatically increased correlations among neurons in the GPe and the STN, though there were no significant changes in firing rates (Bergman, Wichmann, Karmon, & DeLong, 1994; Nini, Feingold, Slovin, & Bergman, 1995; Magnin, Morel, & Jeanmonod, 2000; Brown et al., 2001). Under dopamine-deficient conditions associated with PD, recordings from STN neurons of PD animals and patients revealed synchronized oscillations (Magnin et al., 2000; Nini et al., 1995).
Thus, we propose a functional role for the presence of complex oscillations in STN-GPe in normal conditions and explain the pathological consequences of the loss of complex dynamics in that structure. The idea of explaining PD symptoms in terms of reduction in complexity of dynamics of relevant neural structures has existed for some time (Edwards, Beuter, & Glass, 1999). It is from such considerations that PD has been dubbed a “dynamical disease” (Beuter & Vasilakos, 1995). Accordingly, fixed-point dynamics have been linked to akinetic rigidity of PD and limit cycles to PD tremor. Several instances have been discovered in physiology, particularly cardiac physiology, where the chaotic activity of a system is essential for its normal function (Goldberger, Rigney, & West, 1990). There may be a similar situation in the STN-GPe system: complex activity may correspond to normal function and loss of complexity to disease.
Appendix A: Definitions
Average velocity: The rate of change of displacement at the end of a reach.
Actor error: The magnitude of the vector connecting the target and the end point reached by the arm with the sole contribution of the actor (see Figure 31).
Path variability: The standard deviation of the length of the normals from the line connecting its extremities, intersecting it (see Figure 32).
Parkinson's disease: The undershooting factor is the ratio of the magnitudes of the projection of the actual displacement vector of the reach onto the vector connecting the origin to the target, to that of the vector connecting the origin to the target. Tremor factor is defined as the root mean square value of the acceleration of the arm during the reaching task. Average velocity is the same as the normal case.