Abstract

Humans are able to robustly maintain desired motion and posture under dynamically changing circumstances, including novel conditions. To accomplish this, the brain needs to optimize the synergistic control between muscles against external dynamic factors. However, previous related studies have usually simplified the control of multiple muscles using two opposing muscles, which are minimum actuators to simulate linear feedback control. As a result, they have been unable to analyze how muscle synergy contributes to motion control robustness in a biological system. To address this issue, we considered a new muscle synergy concept used to optimize the synergy between muscle units against external dynamic conditions, including novel conditions. We propose that two main muscle control policies synergistically control muscle units to maintain the desired motion against external dynamic conditions. Our assumption is based on biological evidence regarding the control of multiple muscles via the corticospinal tract. One of the policies is the group control policy (GCP), which is used to control muscle group units classified based on functional similarities in joint control. This policy is used to effectively resist external dynamic circumstances, such as disturbances. The individual control policy (ICP) assists the GCP in precisely controlling motion by controlling individual muscle units. To validate this hypothesis, we simulated the reinforcement of the synergistic actions of the two control policies during the reinforcement learning of feedback motion control. Using this learning paradigm, the two control policies were synergistically combined to result in robust feedback control under novel transient and sustained disturbances that did not involve learning. Further, by comparing our data to experimental data generated by human subjects under the same conditions as those of the simulation, we showed that the proposed synergy concept may be used to analyze muscle synergy–driven motion control robustness in humans.

1  Introduction

The human central nervous system (CNS) controls multiple muscles to robustly maintain posture and conscious motions in dynamically changing environments. The human body has about 400 skeletal muscles, each of which can contract by an activation value ranging from 0 to 100%. Therefore, there are a large number of activation patterns for the multiple muscles that are used to adopt a target posture or perform a target motion. There are thus many combinations of the activation levels at individual muscles at an articulated joint. To optimally regulate these combinations in a dynamic external environment, the CNS optimizes synergistic control between muscles. However, the manner in which this is accomplished is as yet unclear.

Many previous studies have been conducted to understand how multiple muscles are controlled to realize target motions or target postures. These studies have used three approaches based on their goals. The first approach involves a linear feedback gain control method, such as the proportional integral derivative (PID) control law, wherein the motion is corrected to maintain the desired outcome using feedback gain parameters estimated from the difference between the objective motion and the achieved motion. This method is commonly used when modeling control of mechanical devices, such as robots, under external forces. However, the PID model was designed to simulate coupling control (Petkos & Vijayakunar, 2007) of an actuator in a joint. Therefore, control of multiple muscles in a joint is beyond its design focus. When the PID is applied to muscle control, multiple muscles are simplified for antagonistic coupling control (ACC), which uses only two opposing muscles: the agonist and antagonist. Since the ACC uses only two opposing muscles, it cannot be used to investigate the contribution of synergy among multiple muscles to feedback motion control. To resolve this problem, some researchers (Nemirovsky & Rooij, 2010; Rooij, 2011) have manually predefined the synergy weights of multiple muscles. However, muscle synergy should be changed in real time to optimize the contribution of individual muscle activities to the objective motion control. As a result, these studies are not useful for simulating muscle synergy. Impedance control (IPC) is also a type of linear feedback gain control. Therefore, muscle controllers using IPC (Hogan, 1984) are also restricted to the use of an ACC. Further, in order to implement IPC, nonlinear muscle contraction mechanics, such as force-velocity and force-length variations during feedback control, which are processed to recover an equilibrium point from perturbed conditions, require linearization. However, as this linearization is allowable for only very small changes in contraction velocity, the control range of models using IPC is very restricted due to this limitation. IPC is therefore not useful to simulate motion control robustness under disturbed conditions.

In the second approach, finite synergistic patterns of muscle activities modulated by the spatially fixed activation of multiple muscles are inversely extracted (Tresch, Philippe, & Emilio, 1999; d'Avella, Philippe, & Emilio, 2003; Torres-Oviedo, Macpherson, & Ting, 2006; Ting & Macpherson, 2012; Safavynia & Ting, 2012; Barroso et al., 2014). This approach uses an optimization method that uses cost functions based on nonnegative matrix factorization of measured electromyographic (EMG) signals (Tresch et al., 1999; Lee & Seung, 1999). However, this method was designed to analyze muscle synergies as motion primitives based on EMG data. It is therefore unrelated to simulation analysis used to verify the contribution of muscle synergy to robust motion control.

The third approach is to use an optimal control algorithm with a cost function. This approach has been used by Yamaguchi and Zajac (1990), Ait-Haddou, Binding, and Herzog (2000), Haug, Tramecon, Allain, and Choi (2001), Thelen, Anderson, and Delp (2003), Hada, Yamada, and Tsuji (2007), and others. In these studies, various cost functions, such as energy minimization methods (Crowninshield, 1978), have been used to estimate optimal activation of multiple muscles in order to follow a predetermined path or reach a final goal under a given dynamic condition. However, due to the predefined dynamic condition, it is impossible to achieve robust motion control under unexpected dynamic conditions different from the given dynamic condition. This problem is attributed to the movement duration that is predefined for the cost function, which cannot be modified in real time in the presence of unexpected disturbances. Similarly, optimal feedback control (OFC) (Todorov & Jordan, 2002; Liu & Todorov, 2007) is also not free from the constraint of movement duration because it also optimizes motion control using the cost function. To resolve this problem, the Markov decision process (MDP) has been applied to OFC (Liu & Todorov, 2007), as MDP is not constrained by a predefined movement duration. However, this MDP-driven OFC model (Liu & Todorov, 2007) can only be used for discretized states and is therefore affected by dimensionality. Thus, it requires a simpler order model of joint dynamics. Furthermore, although unlike IPC, OFC is not restricted by the use of ACC, ACC is usually applied to OFC due to its merits in the linearization of feedback control. As mentioned for the first approach described, models using ACC are not useful for the verification of the contribution of muscle synergy to motion control. As a result, OFC is also not useful for our aims.

To overcome the shortcomings of past studies in the simulation of the contribution of muscle synergy to robust motion control, we propose a new strategy for muscle synergy that optimizes the synergy between muscles to achieve robust feedback control. To embody this new concept, muscle units should be synergistically controlled to regulate control redundancy in external dynamic circumstances. Based on this design concept, we defined a group control policy (GCP) for controlling muscle group units classified by their functional similarity in joint control using a common intensity. This policy is most effective in resisting external dynamic circumstances, such as disturbances. The definition of GCP is based on studies of corticospinal motor (CM) neurons in the corticospinal tract (Shinoda, Zarzecki, & Asanuma, 1979; Shinoda, Yokota, & Futami, 1981; Andersen, Hagan, Phillips, & Powell, 1975). We defined an individual control policy (ICP) to control the identified functions of individual muscles to assist the GCP in precisely controlling motion. The biological definition of ICP is from the study of Bennett and Lemon (1996) on CM neurons in the corticospinal tract. (We introduce the biological design concept of these two control policies in section 4.1.) Based on these definitions, the GCP and the ICP are synergistically used to determine the synergistic control between multiple muscles in a feedback control loop. This synergy leads to the effective maintenance of the desired motion control in the face of environmental variations and external dynamics.

In this study, we define this synergistic combination of the two control policies as synergy strategy–based muscle control (SC). To validate this model, we propose a computational model that simulates the reinforcement of the SC for the target motion using the actor-critic model (Barto, 1995), which is presumed to reflect reinforcement learning (RL) in the basal ganglia (Houk, Adams, & Barto, 1995). RL is valuable for the simulation of motion control in the presence of disturbances not experienced before, as it is free from the constraints of predefined movement duration due to the use of MDP (Sutton & Barto, 1998), as described above. Further, to resolve the above-described problem associated with the discretized state in MDP-driven models (Liu & Todorov, 2007), the controller, which is the actor function, and its estimator, which is the critic function, were designed using a normalized gaussian network (NGSN), which manages continuous feedback states. (We introduce a complete description of this tool will in section 2.2.2.) Applied RL is optimally designed to simulate the contribution of the proposed SC to robust motion control. When performing this validation, we focused on two objectives. First, we confirmed that the GCP and ICP synergistically combine to control motion during the feedback process. Second, we verified that the synergy of these two control policies resulted in robust motion control in the face of new and different disturbances. To test these objectives, we simulated motion control of a muscular skeletal model under novel transient and sustained disturbances that did not involve RL. In this study, we defined motion control robustness in novel dynamic states as robust motion control. Furthermore, to evaluate the function of the proposed SC in reproducing human feedback control robustness, we compared our simulation to experimental data generated by human subjects.

2  Materials and Methods

2.1  Architecture of the Synergy Strategy–Based Feedback Control Model

As discussed in section 1, we proposed a new synergy concept of GCP and ICP, defined as two types of muscle control policies for regulating the redundancy in the control of muscles during robust feedback control in external dynamic circumstances. As shown in Figure 1, this synergy concept is based on the interrelationships between CM neurons in the primary motor cortex (M1) that govern multiple motor neuron pools (MNPs) in the spinal cord using a common GCP-driven control signal (Shinoda et al., 1979, 1981; Andersen et al., 1975) and ICP-driven individual corresponding control signals (Bennett & Lemon, 1996) via the corticospinal tract. (We discuss the biological basis for this proposed concept in section 4.1.) To neurophysiologically conceptualize the proposed SC, we used a model of the cortico-BG loop (Barto, 1995; Doya, 2000, 2007, 2008; Ito & Doya, 2011) that runs between the BG and the cerebral cortex. Here, the BG selectively disinhibits the activities of both the M1 and the brainstem to select the optimal tactic for motion control (Hikosaka, Takikawa, & Kawagoe, 2000). The extent of this disinhibition is controlled via dopamine release (Shinnamon, 1993) during RL. The BG dynamically switches from inhibition to disinhibition of the activity of the M1 during sequential motion control (Nambu, Tokuno, & Takada, 2002). Therefore, the BG is assumed to function as the feedback controller, which selects the optimal motion control tactic by inhibiting and disinhibiting the activities of CM neurons in M1. In the proposed model, the CM neurons in M1 correspond to intrinsic-like neurons used for muscle control (Kakei, Hoffman, & Strick, 1999).

Figure 1:

The architecture of the synergy strategy–based feedback control process. The basal ganglia acts as the feedback controller, which outputs motor neuron pool (MNP) control signals () to the spinal cord through the primary motor cortex (M1) in the corticospinal tract. To achieve this, the basal ganglia (BG) operate both the group control policy (GCP) and the individual control policy (ICP). The GCP controls multiple MNPs () with the signal and the ICP () controls () individually with corresponding individual signals () for the feedback state S transferred through the somatosensory cortex. The BG synergistically combines these two CPs to regulate the redundancy in the control of MNPs, which govern muscles. , , and (ICPs) and (GCP) are symbolized using the open circle, the star, the cross, and the black circle, respectively.

Figure 1:

The architecture of the synergy strategy–based feedback control process. The basal ganglia acts as the feedback controller, which outputs motor neuron pool (MNP) control signals () to the spinal cord through the primary motor cortex (M1) in the corticospinal tract. To achieve this, the basal ganglia (BG) operate both the group control policy (GCP) and the individual control policy (ICP). The GCP controls multiple MNPs () with the signal and the ICP () controls () individually with corresponding individual signals () for the feedback state S transferred through the somatosensory cortex. The BG synergistically combines these two CPs to regulate the redundancy in the control of MNPs, which govern muscles. , , and (ICPs) and (GCP) are symbolized using the open circle, the star, the cross, and the black circle, respectively.

In this framework, shown in Figure 1, we suppose that the BG optimizes the motion control tactic using sequential synergistic combinations of inhibition and disinhibition of the two control policies, which activate corresponding CM neurons in M1. GCP and ICP are symbolized as P and P (), respectively. Thus, BG regulates the control redundancy of multiple muscles for achieving robust feedback control as follows.

The group control policy P disinhibits the activity of the black-circled neuron in M1. To assist P, the individual control policy (where ) disinhibits activity in other neurons, such as those represented by the star, cross, and white circle. This synergy between P and results in optimal regulation of the redundancy in the control of the activity of multiple motor neuron pools (MNPs) in the spinal cord. The synergistic mechanism used to control muscles results in a trade-off between the integrity of the signal from P and the individuality of the signal from (where ) during motion control. Based on this mechanism, we verify the contribution of the synergistic regulation of muscle control redundancy to motion control robustness in feedback control.

To verify the contribution of muscle synergy to motion control robustness in feedback control, an initial equilibrium state, such as posture, is disturbed by an inexperienced disturbance. The proposed model simulates how muscle synergy recovers the initial state or finds a new equilibrium state through feedback control after this disturbance (Kandel, Schwartz, & Jessell, 2000). To evaluate our simulation, the experiment was performed using the setup parameters we describe in section 2.5. We assumed that the feedback state during the sensory feedback delay is estimated using the internal forward dynamic model (Wolpert, Ghahramani, & Jordan, 1995; Todorov & Jordan, 2002). The proposed model was designed to analyze how reinforcement of the synergy between GCP and ICP contributes to robust feedback control using the reinforcement learning and control model. (We introduce the precise structure of this model in section 2.2.)

2.2  Synergy Strategy–Based Reinforcement Learning Model and Control for Feedback Control

To simulate the synergy between GCP and ICP introduced in Figure 1, we use the synergy strategy–based RL and control model shown in Figure 2, where the synergy between GCP and ICP is reinforced by the actor-critic model (Barto, 1995). This model is thought to represent RL in BG in the cortico-BG loop (Houk et al., 1995). In this model, the actor, which functions as the controller of all muscles, estimates the vector of muscles' control signal . The critic estimates V(s, which evaluates the actor's performance as the motion controller for a goal. The vector of state is an afferent copy that is fed back to the actor and the critic. The control tactic of the actor for the feedback state is evaluated based on the reward, which is calculated using equation 2.1. The estimated reward value is transformed into the temporal difference (TD), which upgrades the critic function V(s in equations 2.4. The method used to model TD will be introduced in equation 2.5c. The TD upgrades the weight v of an NGSN for V(s, which will be introduced in equation 2.5a. To apply the proposed SC model to the actor function, the TD upgrades the gating weights and of the GCP and ICP, which then upgrade the NGSN weight w of the actor function for each muscle. The gating parameters and for the GCP and ICP are driven by V(s. The modeling of the synergy strategy of GCP and ICP is introduced in equations 2.7a and 2.7b. The synergy-driven weight W is applied to the muscle activity approximator, which estimates in equation 2.6a. This simulation model is explained in the following four subsections.

Figure 2:

Muscle synergy strategy–based reinforcement learning and control model. The computational learning model: In the corticospinal-basal ganglia loop in Figure 1, motion control is trained using the synergy strategy–based actor-critic reinforcement learning model, which optimally regulates the control redundancy of multiple muscles for the feedback state s with the synergy between the portion of GCP and the gating portion of ICP. The gating weights of these portions are and , respectively.

Figure 2:

Muscle synergy strategy–based reinforcement learning and control model. The computational learning model: In the corticospinal-basal ganglia loop in Figure 1, motion control is trained using the synergy strategy–based actor-critic reinforcement learning model, which optimally regulates the control redundancy of multiple muscles for the feedback state s with the synergy between the portion of GCP and the gating portion of ICP. The gating weights of these portions are and , respectively.

2.2.1  Reward

The agent evaluates the control tactic using the reward for the feedback state . The reward at time is expressed as
formula
2.1

The function is used to calculate the reward by evaluating the Cartesian difference between the objective position of the end effector and its current position , as the controlled motion is generally recognized based on the position of the end effector in sight. The function is used to evaluate the control cost according to the total value of the muscle activities. The symbols , , , and are the constant weight of , the constant value, the th muscle activity, and the total number of muscles, respectively.

2.2.2  Network Structure

Because the proposed model needs to be simulated in continuous dynamic environments, actor and critic functions are organized based on the Hebbian rule-based (Hebb, 1949) network model of a self-organizing map (Kohonen, 1982) between finite nodes. This model constructs the continuous environment driven by the gaussian network between the finite state nodes. We used the continuous RL method to simulate motion control for continuous feedback states based on the framework of this network (Doya, 2000). The RL method is based on an NGSN (Moody & Darken, 1989), which is the most useful framework for the design of a nonlinear controller for continuous feedback states. The gaussian base function b(s, which estimates the topological relationship between the input state and all of the network state nodes, was allocated to its corresponding node in the network. Based on this framework, to estimate the actor function and its critic function using the continuous model described in appendix A, an NGSN based on a network model comprising three layers is expressed as
formula
2.2
where and are the state vector and the network weighting vector for approximating the function, respectively. The function is designed using the NGSN and consists of the base function as follows:
formula
2.3

The NGSN is designed using the network nodes that are determined before the learning takes place. is the total number of base functions. This predetermined format is designed based on the grid distribution of the two-dimensional state. The symbol is the th element of the state vector . The symbol is the th element of the center of the th base function, and is its range.

Network model for the critic function. The critic function is modeled by the value function
formula
2.4
which is updated by the TD that is determined based on the reward . The value function has the same framework as that used in equation 2.2. The weight vector of used to approximate is upgraded by
formula
2.5a
formula
2.5b
formula
2.5c

The weight of the th base function in equation 2.5a is upgraded by the continuous-time version of the TD error (Doya, 2000), which is represented by in equation 2.5c and is based on appendix A. The symbol is the learning rate, and equation 2.5b is the model for upgrading the eligibility parameter (Sutton & Barto, 1998; Doya, 2000). The symbols and are the time constant and the time constant for discounting future rewards, respectively.

Network model for the standard actor function: Non-SC controller. The standard actor function, which is defined as a non-SC controller that does not involve SC, is modeled by
formula
2.6a
formula
2.6b
using the network structure based on equation 2.2. The function is the controller of the th muscle, and its activation level is estimated to range from 0.0 to based on the sigmoid function represented by . The weight vector of is upgraded by equation 2.6b, where is the white noise function for allocating the weight variation to of individual muscles.

2.2.3  Refinement of Sigmoid Function for Minimization of Muscle Activity

The reward, which is defined in equation 2.1, includes an energy minimization term in consideration of the redundancy of the control of the multiple muscles in controlling a joint. However, since the energy minimization term is designed to minimize the activities of muscles in the learning trials, its learning cost is very high and its minimization is limited. Thus, baseline control of the sigmoid function in the proposed system is applied to robustly minimize the activities of the muscles. The baseline of the actor function in equation 2.6a, which was determined using the zero-input sigmoid value, is controlled by the constant B in equation 2.6a. As shown in Figure 3, the baseline of the standard function is determined as Y = 0.5 by setting B equal to 0. This baseline is modified to Y 0 by setting B equal to 4.0. This baseline shift in the sigmoid function lowers muscle activity in the nonshift case. The muscle activity that is generated by the sigmoid function is effectively minimized using this model.

Figure 3:

Baseline control of the sigmoid function. X is the input parameter of the sigmoid function and Y is its output. The broken line is transferred to the solid line by lowering the baseline from 0.5 to 0.0 in the direction of the arrow.

Figure 3:

Baseline control of the sigmoid function. X is the input parameter of the sigmoid function and Y is its output. The broken line is transferred to the solid line by lowering the baseline from 0.5 to 0.0 in the direction of the arrow.

2.2.4  The learning model for the synergy strategy

We devised the model described below to simulate the SC in the learning and control model introduced in Figure 2. As shown in Table 1, the muscles in the peripheral nervous system that control the elbow joint are classified into four groups, with each group having its own components. In order to design the learning model for the SC with the synergy strategy of the GCP and ICP, the ICP portion for the individual control signal and the GCP portion for the group control signal are dependent on the critic , as shown in the following equation:
formula
2.7a
The learning of the synergy between and is processed with the assumption that the sum of the two components is 1.0. At the initial learning stage, and start individually at 1.0 and 0.0, respectively, and the white noise components and are individually multiplied by these two components to allocate the weight variation to of the individual muscles introduced in equation 2.6b. As learning progresses, increases. Accordingly, the increasing values result in decreases in the group parts with and increases in the individual parts with . The relationship between and in the learning process is explained by simulating the control of CM neurons activated by and . In this process, the -controlled CM neurons and the -controlled CM neurons are synergistically activated under the assumption that the sum of and is 1.0, as defined in equation 2.7a. With successful learning, the critic will increase the values, which then decrease the TD and values. This is the process by which synergy in learning reaches its goal.
Table 1:
The Neuromuscular System Related to Control of the Elbow Joint.
Flexors  
Group 1 (radial nerve) Brachioradialis 
Group 2 (musculocutaneous nerve) Biceps brachii (long head), biceps brachii (short head), and brachialis 
Group 3 (median nerve) Pronator teres 
Extensors  
Group 4 (radial nerve) Triceps brachii (lateral head), triceps brachii (long head), triceps brachii (medial head), and anconeus 
Flexors  
Group 1 (radial nerve) Brachioradialis 
Group 2 (musculocutaneous nerve) Biceps brachii (long head), biceps brachii (short head), and brachialis 
Group 3 (median nerve) Pronator teres 
Extensors  
Group 4 (radial nerve) Triceps brachii (lateral head), triceps brachii (long head), triceps brachii (medial head), and anconeus 
The system uses this process to determine the optimal control tactic. Equation 2.7a can be redefined as
formula
2.7b
with the gating process occurring between and . Equations 2.7a and 2.7b, which are based on equation 2.6b, are devised to estimate of the base function of the NGSN, which is used to calculate , which was introduced in equation 2.6a. The progress of and in this RL process mimics motor learning in babies, which is discussed in section 4.3.

2.3  A Musculoskeletal FE Model of a Human Arm

In order to obtain accurate movement of the human body in the musculoskeletal simulation analysis, the precise modeling of the muscles that control the mechanical behavior of the skeletal joints is required. However, it is also important to consider a simplified model in order to minimize the calculation costs of the simulation, as the proposed learning model requires numerous trial-and-error analyses in the learning process. Considering the trade-offs between analytical precision and calculation costs, we designed an FE modeling using multiple bar elements of muscles that formulate muscle paths between the origin and the insertion points on LS-DYNA (Livermore Software Technology Corporation, Livermore, CA, USA), as shown in Figure 4. Table 2 lists the architectural parameters of the muscles (Winter & Woo, 1990). The Force-Length and Force-Velocity properties of muscle were simulated using the Hill-type muscle contraction dynamic model (Thelen, 2003). Figure 5 shows the Force-Length properties. To simplify the proposed FE model, no tissue elements were directly represented in this study. However, fixed via points and edge-to-surface wrapping contacts (Hada et al., 2007) were used to mimic actual boundary conditions around the muscle tissue and obtain reasonable muscle moment arms. The fixed via-points method represented the muscle passing through points of muscle elements with fixed nodes on local coordinates of the skeletal segments. The use of edge-to-surface wrapping contacts prevented the muscle paths from crossing over the surrounding tissues or the centers of joints. However, wrapping the contacts in the FE analysis results in undesired muscle vibration due to inertia effects and collisions between the nodes and the wrapping surfaces. To reduce vibration, the weights of the nodes were reduced, and a certain level of minimum activity was provided to prevent slackening of the muscle elements. Anatomical references (Neumann, 2002) were used to align the origin and insertion points, the via points, and the wrapping contacts in order to represent the appropriate muscle moment arms. The predicted muscle moment arms were well validated against data from several experimental studies (Amis, Dowson, & Wright, 1979; Murray, Delp, & Buchanan, 1995). The modeled brachialis muscle moment arm during elbow extension was reasonably matched with the experimental data (see Figure 4B). According to the methods noted, this model can be used to simulate biomechanical motion with the activation of muscle elements and external effects. The characteristic features of muscle force, which is based on muscular length and velocity, were modeled using a Hill-type model (Hill & Sec, 1938; Zajac, Topp, & Stevenson, 1985), which showed comparatively good performance (Rosen, Fuchs, & Arcan, 1999; Cavallaro, Rosen, Perry, & Burns, 2006) in the muscle force and its computational cost.

Figure 4:

A musculoskeletal finite element (FE) model of the human arm. Each muscle shown in panel A consists of multiple nodes used to precisely model its path along the wrapping. The wrapping is used to keep the path of the muscle within the precise moment arm, as shown in panel B. (A) Skeleton modeling: one degree-of-freedom musculoskeletal FE model with nine muscles. The symbol denotes the angle of the elbow joint. The activated muscles are the nine muscles introduced in Table 1. (B) Comparison between the simulated result and previously published data on the moment arm of the brachialis muscle.

Figure 4:

A musculoskeletal finite element (FE) model of the human arm. Each muscle shown in panel A consists of multiple nodes used to precisely model its path along the wrapping. The wrapping is used to keep the path of the muscle within the precise moment arm, as shown in panel B. (A) Skeleton modeling: one degree-of-freedom musculoskeletal FE model with nine muscles. The symbol denotes the angle of the elbow joint. The activated muscles are the nine muscles introduced in Table 1. (B) Comparison between the simulated result and previously published data on the moment arm of the brachialis muscle.

Figure 5:

Effects of (A) active and (B) passive forces on the Hill-type muscle contraction dynamic model. Muscle stretch ratio is muscle fiber normalized by its optimal length .

Figure 5:

Effects of (A) active and (B) passive forces on the Hill-type muscle contraction dynamic model. Muscle stretch ratio is muscle fiber normalized by its optimal length .

Table 2:
Architectural Parameters of the Muscles.
PCSA (mm)Optimal Length (mm)Tendon Length (mm)
Brachialis 450 181.4 
Biceps brachii, long head 333 224.3 108.2 
Biceps brachii, short head 322 293.5 
Brachioradialis 150 258.4 
Triceps brachii, long head 387 324.3 12.8 
Triceps brachii, lateral head 452 283.2 13.0 
Triceps brachii, medial head 323 246.2 
Anconeus 318 39.2 
Pronator teres 323 124.1 
PCSA (mm)Optimal Length (mm)Tendon Length (mm)
Brachialis 450 181.4 
Biceps brachii, long head 333 224.3 108.2 
Biceps brachii, short head 322 293.5 
Brachioradialis 150 258.4 
Triceps brachii, long head 387 324.3 12.8 
Triceps brachii, lateral head 452 283.2 13.0 
Triceps brachii, medial head 323 246.2 
Anconeus 318 39.2 
Pronator teres 323 124.1 

Note: The tendon lengths of some muscles were set to zero due to the absence of data regarding their lengths.

As shown in Figure 4A, the proposed FE model consists of two rigid body parts: one representing the upper arm and shoulder and another representing the lower arm and hand. The two body parts are linked using a joint constraint that represents the ulnar-humeral joint. The mass of the lower arm was 1.7 kg. The principal moments of inertia of the lower arm body were I kg m, I kg m, and I kg m.

2.4  Simulation System

As shown in Figure 6, the system is composed of the agent for learning and control, the musculoskeletal FE model, and the interface. The agent estimates the activities of the muscle used to control the musculoskeletal model. The musculoskeletal model was designed using LS-DYNA, is an explicit FE code used for dynamic analyses. Because the agent was coded using C, the system had an interface for the integration of the agent and the FE model. The learning and control processes are shown in Figure 6. The first step is to record the state vectors, which are composed of the end effector, the joint angle, and the joint angular velocity into a file. This file is entered into the interface and is then transformed so that the agent can read it. The state vectors in the transformed file, which are entered into the agent, are the key data that the agent uses to determine the control tactic for the objective motion. Based on the SC, the agent calculates the muscle activity data, which are transformed to LS-DYNA–coded data at the interface. These data are entered into the muscles of the musculoskeletal model. The control tactic is then updated via these learning and control processes.

Figure 6:

Structure of the simulation system used for the FE model. The system operates using two terms: the musculoskeletal FE model and the agent for motion learning and control. As these two terms were individually coded using LS-DYNA and C, an interface was devised for the transformation of input-output parameters between the two terms.

Figure 6:

Structure of the simulation system used for the FE model. The system operates using two terms: the musculoskeletal FE model and the agent for motion learning and control. As these two terms were individually coded using LS-DYNA and C, an interface was devised for the transformation of input-output parameters between the two terms.

2.5  Experimental Setup

To validate the proposed simulation model by comparing the actions of human subjects under the same conditions as those used in the simulation, we measured the loading responses of three male subjects, as shown in Figure 7. These subjects were healthy and did not have any motor disorders. In the experiment, the elbow joint angle was measured while the subject held a 1 kg load in his hand. The subject was told to try to find a new equilibrium posture as soon as possible and to maintain his posture without attempting to recover the preloading posture, which was set to 70 degrees. All of the subjects were instructed to maintain their posture under this loading condition for 2.0 seconds. The shoulder and wrist joints were fixed during the measurement of the motion of the elbow joint. First, we measured the positions of the shoulder, elbow, and wrist using OPTOTRAK 3020 (Northern Digital, Waterloo, ONT, Canada), which is a three-dimensional position measurement device. We then used the measured positions of these three joints to calculate the angular movement of the elbow joint. To measure the responses to the loading condition using pure feedback control, the subjects were not informed regarding the timing of the loading. In addition, the distance between the initial falling point of the weight and the initial position of the hand was set close to zero, as shown in Figure 7. Furthermore, to approximate the novel condition as closely as possible, only data that were recorded during the first trial for each of the three subjects were used.

Figure 7:

Experimental condition. We traced the movement of the elbow joint used to maintain a posture against a 1 kg load. The goal was to find the new equilibrium posture for the sustained 1 kg load. In order to suppress the prediction of loading, the distance between the initial falling point of the weight and the initial location of the hand was kept close to zero. To allow angular movements at only the elbow joint, the shoulder joint was fixed using a band, and the wrist joint was fixed using a glove made for bowling.

Figure 7:

Experimental condition. We traced the movement of the elbow joint used to maintain a posture against a 1 kg load. The goal was to find the new equilibrium posture for the sustained 1 kg load. In order to suppress the prediction of loading, the distance between the initial falling point of the weight and the initial location of the hand was kept close to zero. To allow angular movements at only the elbow joint, the shoulder joint was fixed using a band, and the wrist joint was fixed using a glove made for bowling.

All subjects were provided written informed consent prior to their participation. The protocol was approved by the ethics committees of the Tokyo Metropolitan Institute of Medical Science (approval no. 14-40), and it was conducted in accordance with the ethical standards of the Declaration of Helsinki.

3  Results

3.1  Learning Condition and Performance

Through the learning process of the simulation system, the agent learned to control the forearm to reach a goal. During this learning process, the angle of the elbow joint was limited to 30 to 140 degrees. The goal was to move the hand to its goal position using an elbow joint angle of 70 degrees and to maintain this position. The degree of freedom of the joint was one. The nine muscles listed in Table 1 were used to control the elbow, as shown in Figure 4A. The time step was 0.01 second. If the total learning time in a trial was over 2.0 seconds or if the angle of the elbow joint was out of the learning control range, a new trial was started after randomly changing the initial position. This process was repeated 780 times. The NGSN was arranged using 12 12 gaussian basis functions located on a grid with even intervals in the individual dimensions of the input space (30 140, 300 300). We used c 0.1 and 100.0 in equation 2.1 and 0.11, 0.01, and 1.5 in equations 2.5 and 2.6. Using the learning process, the agent produced the optimal control tactic used to achieve the desired motion.

The learning performance of the proposed model in the simulation condition is shown in Figure 8. To evaluate the learning performance of the SC model, we compared the SC and non-SC models under the same learning conditions. The non-SC model is controlled by the standard actor function introduced in equations 2.6a and 2.6b. Figures 8A and 8B show the average total differences between the goal location and the current location of the hand for each trial. Figure 8A shows the results obtained with SC, and Figure 8B shows the results obtained without SC. Figures 8C and 8D show the average total rewards for each trial. Figure 8C is the result obtained with SC, and Figure 8D is the result obtained without SC. As learning progressed, the values in Figure 8A reached 0.0, and the reward in Figure 8C approached 1.0. Although learning without SC reached its target performance, as shown in Figures 8B and 8D, there was poor learning performance even after 500 trials. Thus, learning without SC is less stable than learning with SC. These results suggest that SC does not disturb learning performance but instead improves the process by optimizing the synergy between GCP and ICP. The progress of the synergy between the two control policies via the learning process is shown in Figure 11.

Figure 8:

Comparison of the learning performance of the SC model and that of the non-SC model. (A, B) The average absolute values of the distance between the position of the hand and that of its target in each trial. Panel A is the result obtained with SC, and panel B is the result obtained without SC. (C, D) The transition of the average reward signal in each trial. Panel C is the result obtained with SC, and panel D is the result obtained without SC.

Figure 8:

Comparison of the learning performance of the SC model and that of the non-SC model. (A, B) The average absolute values of the distance between the position of the hand and that of its target in each trial. Panel A is the result obtained with SC, and panel B is the result obtained without SC. (C, D) The transition of the average reward signal in each trial. Panel C is the result obtained with SC, and panel D is the result obtained without SC.

3.2  Simulation Used to Validate the Robustness of Control

To validate the robustness of control when using the SC, the simulation condition was set to control the forearm under two novel disturbances that do not involve learning. One of these disturbances was transient, and the other was sustained. These disturbances were chosen because they are the most common type of disturbances and involve unexpected conditions. The two simulations are described in sections 3.2.1 and 3.2.2, and the results were obtained from simulations using a common SC-learned control function. We next illustrate how the SC contributes to motion control robustness under novel, dynamic, and disturbed states that do not involve learning. Our results validated the robustness of the motion control of the proposed SC.

3.2.1  Controlling for Transient Disturbances

Using the acquired during the learning process in Figure 8, the agent was able to determine the optimal trajectories of both portion of the ICP and portion of the GCP for absorption of a transient 5-N impact, as shown in Figure 9A. First, in order to maintain the posture under the initial dynamic state resulting from gravity, was increased from 0.5 to over 1.0 and was decreased from 0.5 to 0.0 below. After the agent overcame the dynamic change resulting from gravity, both proportions reached stable states, as decreased from over 1.0 to 0.463769 and increased from 0.0 below to 0.536231 until 0.32 seconds. The hand was then subjected to a 5-N impact at 0.33 seconds. To recover the preimpact posture, was drastically increased from 0.463769 to 3.25692, and was drastically decreased from 0.536231 to 2.25692 until 0.38 seconds. At this time, the transient impact was almost completely absorbed. Therefore, in the next step, was drastically decreased to 0.108942, and was increased to 0.891058 until 0.4 seconds. Both proportions then gradually recovered their preimpact values until 1.0 second. After 1.0 second, both of the SC proportions remained stable at their recovered values and steadily decreased their vibrations.

Figure 9:

Comparison of the models with and without SC in recovering preimpact motion after a transient 5-N impact. (A) Trajectories of the individual control policy (ICP) portion and the group control policy (GCP) portion evaluated with the value function of the state in SC. (B) Comparison of the joint angular trajectories from the SC and non-SC models, and the experiment; the black arrowhead signifies the time of impact. The experimental trajectory was obtained from the experimental data (see Figure 2A) of Lacquaniti and Soechting (1986).

Figure 9:

Comparison of the models with and without SC in recovering preimpact motion after a transient 5-N impact. (A) Trajectories of the individual control policy (ICP) portion and the group control policy (GCP) portion evaluated with the value function of the state in SC. (B) Comparison of the joint angular trajectories from the SC and non-SC models, and the experiment; the black arrowhead signifies the time of impact. The experimental trajectory was obtained from the experimental data (see Figure 2A) of Lacquaniti and Soechting (1986).

Figure 9B shows the joint angular trajectory that occurs during control without SC, when the forearm falls below the lower limit of the elbow joint angle after the transient 5-N impact at 0.33 seconds. Unlike the condition without SC, the model with SC absorbed the shock from the disturbance after the impact by lowering the forearm less than 10 degrees from the preimpact position, which led to robust recovery of the preimpact posture within 200 ms of impact. The simulated joint angular trajectory in Figure 9B can be compared to the experimental data from Lacquaniti and Soechting (1986), who illustrated the recovery procedure following a transient forearm disturbance. The experimental data from that study also indicate that the elbow joint is lowered by less than 10 degrees after a transient 5-N impact. Similarly, the preimpact posture is recovered within 200 ms of impact. A comparison with these experimental data indicates that the SC simulated the robustness of control during the recovery of forearm posture by optimally absorbing the transient 5-N impact. Here we illustrated that the SC is able to control the dynamic changes that result from transient disturbances and optimally operate in a changing environment subjected to novel transient disturbances that do not involve learning.

3.2.2  Controlling for Sustained Loading

According to the SC tactic acquired during the learning process illustrated in Figure 8, the agent was able to estimate of the ICP portion and of the GCP portion using in the dynamic condition resulting from sustained 1 kg loading. Figure 10A shows the optimal trajectories of and under this loading condition. The first and second steps resulting from gravity in this trajectory were the same as those shown in Figure 9A. After the second step, the 1 kg load was placed on the hand at 0.33 seconds. In order to maintain posture under this loading condition, was drastically increased from 0.463769 to 2.6066, and was drastically decreased from 0.536231 to 1.6066 until 0.39 seconds. Both and then gradually achieved a steady state with decreasing fluctuations under the guidance of the V-value of the via states until 0.5 seconds. After 0.5 seconds, and were about 0.58 and 0.41, respectively, while their fluctuations steadily decreased. However, in contrast to the transient impact described in section 3.2.1, these two SC proportions in the postloading were controlled differently than they were during the preloading, as they sustained the loading disturbance. Therefore, the agent kept at a higher level after the loading than it did before the loading to determine a new equilibrium posture in response to the sustained 1 kg load. In contrast to , remained lower in the postloading condition than in the preloading condition. Consequently, the relative positions of these two proportions were reversed after the loading.

Figure 10:

Comparison of SC and non-SC models in finding a new equilibrium posture during sustained 1 kg loading. (A) The trajectories of individual control policy (ICP) portion and group control policy (GCP) portion evaluated using the value function of state . (B) Comparison of the joint angular trajectories during the SC and non-SC simulations and the experiment. The black arrowhead signifies the initial time of the 1 kg loading.

Figure 10:

Comparison of SC and non-SC models in finding a new equilibrium posture during sustained 1 kg loading. (A) The trajectories of individual control policy (ICP) portion and group control policy (GCP) portion evaluated using the value function of state . (B) Comparison of the joint angular trajectories during the SC and non-SC simulations and the experiment. The black arrowhead signifies the initial time of the 1 kg loading.

As shown in Figure 10B, the condition that did not involve SC could not result in the determination of a new equilibrium posture for loading, and the forearm fell to the lower limit. In contrast to the non-SC, the SC controlled the forearm in order to absorb the initial shock of the 1 kg loading and then rose by about 0.5 degrees. The forearm was then maintained at the new posture in a stable manner, as this was the new equilibrium point. This point was about 10 degrees below the preloading point. As introduced in section 2.5, the responses of the three subjects to the 1 kg loading were measured as motion trajectories. These experimental results are also shown in Figure 10B, which indicates that the forearm was initially lowered while absorbing the initial shock of the 1 kg load and then rose by about 2 degrees. The forearm was then stably maintained at the new equilibrium position, which was within 10 degrees of the preloading point. The SC was able to simulate robust control of the forearm by optimally determining a new equilibrium posture under a novel 1 kg load condition.

Here we confirmed that the synergy strategy between and acted optimally in a dynamic environment involving a novel external loading condition. Using this SC, the forearm was robustly controlled to determine a new equilibrium posture in response to a novel sustained disturbance that did not involve learning.

4  Discussion

Humans learn how to control a motion through experience. However, these experiences are finite. Despite this limitation, we can robustly control motion in various new situations. To investigate this versatility in the robustness of control, we focused on the synergistic control of multiple muscles under dynamic external conditions. Based on this viewpoint, we used the concept of muscle synergy, which involves the regulation of the control of multiple muscles, to control motion. However, this general concept was not sufficient to analyze the robustness of motion control. To analyze the robustness of motion control based on the concept of muscle synergy, we proposed a new muscle synergy concept that incorporated muscle synergy–embodied feedback control for external dynamic circumstances. The proposed muscle synergy is defined as the SC that synergistically combines GCP and ICP to regulate the combinations of muscle units used to respond to external dynamic changes. To study the effectiveness of muscle control in resisting dynamic changes, we defined GCP as a policy that controls group units consisting of muscles that governed using a common intensity. To assist the GCP in precisely controlling a motion, we used ICP, which was defined as the control policy for the contribution of an identified function of each muscle to motion control. These two CPs are synergistically combined to find their best synergy for optimal feedback control against dynamically changing conditions.

The SC is neurophysiologically based on the mechanism used to regulate the control of multiple muscles in the corticospinal tract, where CM neurons in the M1 synergistically regulate the control of MNPs in the spinal cord, as shown in Figure 1. As shown in Figure 8, this SC is reinforced by the proposed computational reinforcement learning and control model, which we introduced in section 2.2. Through this learning process, the synergistic combination of GCP and ICP can robustly function as the optimal strategy for feedback control during novel conditions, as shown in section 3.2. These results indicate that the proposed SC optimally regulates the synergy of multiple muscles under dynamic external conditions and contributes to the robustness of motion control.

4.1  Neurophysiological Bases of GCP and ICP as Control Policies for Muscles

To neurophysiologically formulate the GCP as a control policy used to regulate the control of multiple muscles, we focused on experiments reported by Shinoda et al. (1979, 1981), which indicate that each CM neuron governs multiple muscles. Based on this experimental result, we defined the GCP as the control policy for a CM governing multiple muscles when the activities of the multiple muscles in a group are controlled using a common intensity. However, Bennett and Lemon (1996) showed that under this mechanical control structure of CMs, some single CM neurons exert postspike facilitations on single target muscles during the fraction periods in the control of finger joints. These experimental results suggest that CMs can adapt their contributions to controlling muscles to facilitate single CMs to govern single target muscles. Therefore, the GCP is not sufficient by itself to model the functional variety of CM neurons in the corticospinal tract. To complement the GCP in the control strategy used for multiple muscles, we also proposed an ICP, which consists of the minimum activity of a CM neuron based on the results of the study by Bennett and Lemon (1996). This study showed that individual neurons control the activity of a muscle with a corresponding control signal. The ICP can be defined as the control policy of an individual CM for a muscle when the muscle is controlled by the strongest signal among all of the signals from the CM axons. The relationship between these two control policies consists of the combination of integral and individual signals, which results in the strongest synergy combination for controlling motion in response to different disturbances, as GCP and ICP complement each other. Accordingly, we hypothesized that these two control policies are the most basic control policies in the regulation of the redundant control of multiple muscles and that the GCP and ICP are optimally combined to result in the desired motion control during environmental variations and external dynamics.

4.2  Comparison to Previous Robust RL Models

Although the conventional actor-critic model is applicable as a feedback-learning controller (Kambara, Kim, Shin, Sato, & Koike, 2009), this learning model is not adaptable to simulating the learning of control robustness under novel dynamic conditions. To overcome this limitation, Morimoto and Doya (2005) proposed the use of the robust reinforcement learning (RRL) model, wherein the disturbance controller is designed to control motion during external disturbances. Although the use of RRL is suitable for the control of machines such as robots, the design of the disturbance controller is unsuitable for simulating motion control mechanisms of the CNS in response to disturbances. The RL model proposed here is comparatively free of this RRL limitation because the proposed model was based on the neurophysiological synergy mechanisms between CM neurons in the corticospinal tract. Based on this design, the proposed RL model simulated the robustness of control using a strategy to reinforce the CNS-based synergy of the two control policies.

4.3  Progress in Learning When Using the Synergy Strategy for GCP and ICP

The learning progress simulated using the proposed model was similar to the learning control procedure that babies use. Although a baby has enough power to control his or her arm, he or she is unable to control the arm well. This indicates that muscle activity is not effectively controlled during infancy. As demonstrated by equation 2.7a, the initial settings of the GCP and ICP in the learning process were designed to mimic motion learning procedures during infancy. To mimic this procedure, the GCP and ICP were initially set to 1.0 and 0.0, respectively. Using the control tactic of this GCP-inclined setting, we mimicked the undeveloped motion control strategy of a baby in such a way that it failed to precisely control the motion. As shown in Figure 11, the synergy between of GCP and of ICP was trained to control the arm to reach the goal via the learning progress described in Figure 8. The two control policies were then counterbalanced at about 0.5 after 400 trials. Consequently, we can hypothesize that a baby learns to control a motion well with this learning process and progresses toward counterbalanced use of the two policies when using the proposed synergy strategy.

Figure 11:

Progress of the of GCP and the of ICP in the reinforcement learning of the muscle synergy strategy.

Figure 11:

Progress of the of GCP and the of ICP in the reinforcement learning of the muscle synergy strategy.

4.4  Regulation of the Control Redundancy of Multiple Muscles in a Joint When Analyzing the Robustness of Synergy Strategy–Driven Motion Control

The redundancy in controlling muscles is reflected in the control of the multiple muscles in a joint. Although the motion of a joint can be extrinsically measured with 1 degree of freedom using measures such as torque and velocity, its motion is intrinsically driven via the control of multiple muscles. This control mechanism leads to control redundancy caused by the presence of one output for multiple input control signals. Therefore, the redundancy in the control of multiple muscles in a joint needs to be regulated to robustly control a motion. Therefore, its regulation is a basic objective in validating the robustness of SC-driven motion control.

Appendix A:  Continuous Temporal Difference

Conventional models such as supervised learning and feedback control require a distance difference between a goal and the current position as the teacher signal for learning the control function. However, the optimal control tactic of reinforcement learning is upgraded with the reward , which is the evaluation of the result that is controlled by the current policy in the state on time . As shown in equation A.1, the reward is accumulated with the discount rate (where in the direction of the future. According to this process, the total value in can be determined with
formula
A.1
where is the value function for the control tactic in state .
To evaluate the control tactic in continuous dynamic states, the value function in state is designed with the continuous model in
formula
A.2
which is based on equation A.1. The exponential fragment functionalizes the discount rate in equation A.1 (Doya, 2000). The symbol is the time constant for discounting future rewards:
formula
A.3a
formula
A.3b
Equation A.3a is determined by differentiating equation A.2. The condition of equation A.3a indicates that the value function can completely evaluate the control tactic. However, this condition is the ideal objective of learning. If the condition in is not satisfied with equation A.3a, the temporal difference (TD) signal , which indicates the error of the evaluation of the control tactic, is generated as equation A.3b. Therefore, learning proceeds to the goal with decreasing TD. Thus, TD functions as the key signal for upgrading both the value function and the control tactic.

Acknowledgments

We thank Takahiko Sugiyama and Chikara Nagai for their assistance with the FE modeling when they joined Toyota Central R&D Labs. We also thank Yasuharu Koike for permitting us to use the measuring device in his laboratory and Duk Shin and Jongho Lee for their experimental assistance.

References

Ait-Haddou
,
R.
,
Binding
,
P.
, &
Herzog
,
W.
(
2000
).
Theoretical considerations on cocontraction of sets of agonistic and antagonistic muscles
.
Journal of Biomechanics
,
33
,
1105
1111
.
Amis
,
A. A.
,
Dowson
,
D.
, &
Wright
,
V.
(
1979
).
Muscle strengths and musculoskeletal geometry of the upper limb
.
Engineering in Medicine
,
8
(
1
),
41
48
.
Andersen
,
P.
,
Hagan
,
P. J.
,
Phillips
,
C. G.
, &
Powell
,
T. P.
(
1975
).
Mapping by microstimulation of overlapping projections from area 4 to motor units of the baboon's hand
.
Proc. R. Soc. London, Ser. B.
,
188
,
31
60
.
Barroso
,
F. O.
,
Torricelli
,
D.
,
Moreno
,
J. C.
,
Taylor
,
J.
,
Gomez-Soriano
,
J.
,
Bravo-Esteban
,
E.
, …
Pons
,
J. L.
(
2014
).
Shared muscle synergies in human walking and cycling
.
Journal of Neurophysiology
,
112
,
1984
1998
.
Barto
,
A. G.
(
1995
). Apaptive critics and the basal ganglia. In
J. C.
Houk
,
J. L.
Davis
, &
D. G.
Beiser
(Eds.),
Models of information processing in the basal ganglia
(pp.
215
232
).
Cambridge, MA
:
MIT press
.
Bennett
,
K. M.
, &
Lemon
,
R. N.
(
1996
).
Corticomotoneuronal contribution to the fractionation of muscle activity during precision grip in the monkey
.
Journal of Neurophysiology
,
75
(
5
),
1826
1842
.
Cavallaro
,
E. E.
,
Rosen
,
J.
,
Perry
,
J. C.
, &
Burns
,
S.
(
2006
).
Real-time myoprocessors for a neural controlled powered exoskeleton arm
.
IEEE Trans. Biomed. Eng.
,
53
,
2387
2396
.
Crowninshield
,
R. D.
(
1978
).
Use of optimization techniques to predict muscle forces
.
Transactions of the ASME
,
100
,
88
92
.
d'Avella
,
A.
,
Philippe
,
S.
, &
Emilio
,
B.
(
2003
).
Combinations of muscle synergies in the construction of a natural motor behavior
.
Nature Neuroscience
,
6
(
3
),
300
308
.
Doya
,
K.
(
2000
).
Reinforcement learning in continuous time and space
.
Neural Computation
,
12
,
219
245
.
Doya
,
K.
(
2007
).
Reinforcement learning: Computational theory and biological mechanisms
.
HFSP Journal
,
1
,
30
40
.
Doya
,
K.
(
2008
).
Modulators of decision making
.
Nature Neuroscience
,
11
,
410
416
.
Hada
,
M.
,
Yamada
,
D.
, &
Tsuji
,
T.
(
2007
).
An analysis of equivalent impedance characteristics by modeling the human musculoskeletal structure as a multibody system
. In
Proceedings of the ECCOMAS thematic conference
(BM-1,
1–20
).
ECCOMAS
.
Haug
,
E.
,
Tramecon
,
A.
,
Allain
,
J. C.
, &
Choi
,
H. Y.
(
2001
).
Modelling of ergonomics and muscular comfort
.
KSME International Journal
,
15
(
7
),
982
994
.
Hebb
,
D. O.
(
1949
).
The organization of behavior
.
New York
:
Wiley
.
Hikosaka
,
O.
,
Takikawa
,
Y.
, &
Kawagoe
,
R.
(
2000
).
Role of the basal ganglia in the control of purposive saccadic eye movements
.
Physiol. Rev.
,
80
,
954
978
.
Hill
,
A. V.
, &
Sec
,
R. C.
(
1938
).
The heat of shortening and the dynamic constants of muscle
.
Proc. Roy. Soc. (London)
,
126B
,
136
195
.
Hogan
,
N.
(
1984
).
Adaptive control of mechanical impedance by coactivation of antagonist muscles
.
IEEE Transactions on Automatic Control
,
8
,
681
690
.
Houk
,
J. C.
,
Adams
,
J. L.
, &
Barto
,
A. G.
(
1995
). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In
J. C.
Houk
,
J. L.
Davis
, &
D. G.
Beiser
(Eds.),
Models of information processing in the basal ganglia
(pp.
249
270
).
Cambridge, MA
:
MIT Press
.
Ito
,
M.
, &
Doya
,
K.
(
2011
).
Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit
.
Current Opinion in Neurobiology
,
21
,
368
373
.
Kakei
,
S.
,
Hoffman
,
D. S.
, &
Strick
,
P. L.
(
1999
).
Muscle and movement representations in the primary motor cortex
.
Science
,
285
,
2136
2139
.
Kambara
,
H.
,
Kim
,
K.
,
Shin
,
D.
,
Sato
,
M.
, &
Koike
,
Y.
(
2009
).
Learning and generation of goal-directed arm reaching from scratch
.
Neural Networks
,
22
(
4
),
348
361
.
Kandel
,
E. R.
,
Schwartz
,
J. H.
, &
Jessell
,
T. M.
(
2000
).
Principles of neural science
(4th ed.).
New York
:
McGraw-Hill
.
Kohonen
,
T.
(
1982
).
Self-organized formation of topologically correct feature maps
.
Biological Cybernetics
,
43
,
59
69
.
Lacquaniti
,
T.
, &
Soechting
,
J. F.
(
1986
).
EMG responses to load perturbations of the upper limb: Effect of dynamic coupling between shoulder and elbow motion
.
Experimental Brain Research
,
61
,
482
496
.
Lee
,
D. D.
, &
Seung
,
H. S.
(
1999
).
Learning the parts of objects by non-negative matrix factorization
.
Nature
,
401
,
788
791
.
Liu
,
D.
, &
Todorov
,
E.
(
2007
).
Evidence for the flexible sensorimotor strategies predicted by optimal feedback control
.
Journal of Neuroscience
,
27
(
35
),
9354
9368
.
Moody
,
J.
, &
Darken
,
C. J.
(
1989
).
Fast learning in networks of locally-tuned processing units
.
Neural Computation
,
1
,
281
294
.
Morimoto
,
J.
, &
Doya
,
K.
(
2005
).
Robust reinforcement learning
.
Neural Computation
,
17
,
335
359
.
Murray
,
W. M.
,
Delp
,
S. L.
, &
Buchanan
,
T. S.
(
1995
).
Variation of muscle moment arms with elbow and forearm position
.
Journal of Biomechanics
,
28
(
5
),
513
525
.
Nambu
,
A.
,
Tokuno
,
H.
, &
Takada
M.
(
2002
).
Functional significance of the cortico-subthalamo-pallidal “hyperdirect” pathway
,
Neurosci. Res.
,
43
(
2
),
111
117
.
Nemirovsky
,
N.
, &
Rooij
,
L. V.
(
2010
).
A new methodology for biofidelic head-neck postural control
. In
Proceedings of the International Conference on the Biomechanics of Injury
(pp.
71
84
).
Zurich
:
International Research Council on Biomechanics of Injury
.
Neumann
,
D. A.
(
2002
).
Kinesiology of the musculoskeletal system: Foundations for physical rehabilitation
.
Maryland Heights, MO
:
Mosby
.
Petkos
,
G.
, &
Vijayakunar
,
S.
(
2007
).
Load estimation and control using learned dynamics models
. In
Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems
(pp.
1527
1532
).
Piscataway, NJ
:
IEEE
.
Rooij
,
L. V.
(
2011
).
Effect of various pre-crash braking strategies on simulated human kinematic response with varying levels of driver attention
. In
Proceedings of the Conference on the Enhanced Safety of Vehicles
.
Red Hook, NY
:
Curran
.
Rosen
,
J.
,
Fuchs
,
M. B.
, &
Arcan
,
M.
(
1999
).
Performances of Hill-type and neural network muscle models—toward a myosignal-based exoskeleton
.
Comput. Biomed. Res.
,
32
,
415
439
.
Safavynia
,
S. A.
, &
Ting
,
L. H.
(
2012
).
Task-level feedback can explain temporal recruitment of spatially fixed muscles synergy throughout postural perturbations
.
Journal of Neurophysiology
,
107
,
159
177
.
Shinnamon
,
H. M.
(
1993
).
Preoptic and hypothalamic neurons and initiation of locomotion in the anesthetized rat
.
Prog. Neurobiol.
,
41
,
323
344
.
Shinoda
,
Y.
,
Yokota
,
J.
, &
Futami
,
T.
(
1981
).
Divergent projection of individual corticospinal axons to motoneurons of multiple muscles in the monkey
.
Neuroscience Letter
,
23
,
7
12
.
Shinoda
,
Y.
,
Zarzecki
,
P.
, &
Asanuma
,
H.
(
1979
).
Spinal branching of pyramidal tract neurons in the monkey
.
Experimental Brain Research
,
34
,
59
72
.
Sutton
,
R. S.
, &
Barto
,
A. G.
(
1998
).
Reinforcement learning
.
Cambridge, MA
:
MIT Press
.
Thelen
,
D. G.
(
2003
).
Adjustment of muscle mechanics model parameters to simulate dynamic contractions in older adults
.
Transactions of the ASME
,
125
,
70
77
.
Thelen
,
D. G.
,
Anderson
,
F. C.
, &
Delp
S. L.
(
2003
).
Generating dynamic simulations of movement using computed muscle control
.
Journal of Biomechanics
,
36
,
321
328
.
Ting
,
L. H.
, &
Macpherson
,
J. M.
(
2012
).
A limited set of muscle synergies for force control during a postural task
.
Journal of Neurophysiology
,
93
,
609
613
.
Todorov
,
E.
, &
Jordan
,
M. I.
(
2002
).
Optimal feedback control as a theory of motor coordination
.
Nature Neuroscience
,
5
,
1226
1235
.
Torres-Oviedo
,
G.
,
Macpherson
,
J. M.
, &
Ting
,
L. H.
(
2006
).
Muscle synergy organization is robust across a variety of postural perturbations
.
Journal of Neurophysiology
,
96
,
1530
1546
.
Tresch
,
M. C.
,
Philippe
,
S.
, &
Emilio
B.
(
1999
).
The construction of movement by the spinal cord
.
Nature Neuroscience
,
2
(
2
),
162
167
.
Winter
,
J. M.
, &
Woo
,
S.L-Y
(
1990
).
Multiple muscle systems: Biomechanics and movement organization
.
Berlin
:
Springer
.
Wolpert
,
D. M.
,
Ghahramani
,
Z.
, &
Jordan
,
M. I.
(
1995
).
An internal model for sensorimotor integration
.
Science
,
269
,
1880
1882
.
Yamaguchi
,
G. T.
, &
Zajac
,
F. E.
(
1990
).
Restoring unassisted natural gait to paraplegics via functional neuromuscular stimulation: A computer simulation study
.
IEEE Trans. on Biomedical Engineering
,
37
(
9
),
886
902
.
Zajac
,
F. E.
,
Topp
F. L.
, &
Stevenson
P. J.
(
1985
).
A dimensionless musculotendon model
. In
Proceedings of the IEEE/Eight Annual Conference of the Engineering in Medicine and Biology Society
(pp.
601
604
).
Piscataway, NJ
:
IEEE
.