Anthropomimetic robotics differs from conventional approaches by capitalizing on the replication of the inner structures of the human body, such as muscles, tendons, bones, and joints. Here we present our results of more than three years of research in constructing, simulating, and, most importantly, controlling anthropomimetic robots. We manufactured four physical torsos, each more complex than its predecessor, and developed the tools required to simulate their behavior. Furthermore, six different control approaches, inspired by classical control theory, machine learning, and neuroscience, were developed and evaluated via these simulations or in small-scale setups. While the obtained results are encouraging, we are aware that we have barely exploited the potential of the anthropomimetic design so far. But, with the tools developed, we are confident that this novel approach will contribute to our understanding of morphological computation and human motor control in the future.
In the last 20 years it has become clear that a full understanding of what it is to be human must be rooted in the interplay between two products of coevolution: our human embodiment, and our human brain . One of the most important manifestations of this interplay is our ability to move our body: We do so with accuracy and effectiveness, both in navigating through the world, and in interacting with objects. However, in order to understand how the body is controlled by the brain, we would need to understand both the nature of the body as a physical and mechanical system, and the nature of the brain as a controller . We believe that the current state of knowledge of the brain as a controller is progressing rapidly, but does not yet provide an adequate foundation for such a quest (see, e.g.,  and ). In the work described here, we have therefore mainly focused on the nature of the human body, and how that nature influences and facilitates the ways in which its movements can, and perhaps must, be controlled . Out of necessity, we have employed the synthetic methodology “understanding by building” ; as well as providing insights into the essential character of the human body, this also opens a route in the long term to the construction of artifacts that will be capable of moving and acting in ways that are truly and distinctively human.
The study of artificial systems has already significantly contributed to our understanding of human motor control. For instance, industrial manipulators have been used to study the problems associated with the planning and execution of human limb trajectories . However, robotic manipulators are very different from humans. First, the use of high-precision torque actuators located within joints is an oversimplification of the human muscular system, where even single joint movements require the phased excitation and relaxation of multiple muscles. Second, robotic manipulators are traditionally composed of rigid links attached to single-degree-of-freedom revolute or prismatic joints , a characteristic that facilitates control and improves manipulation performance. However, the human body is neither stiff nor made exclusively of revolute joints. These missing properties pose severe limitations on the aspects of motor control that can be investigated by using industrial robots and on the conclusions that can be drawn from the experimental results. Therefore, more detailed replicas of the human motor apparatus are a requisite for the effective investigation of human motor control issues in artificial systems—in accordance with the concept of embodiment .
Some progress toward this goal has been made by the many humanoid robots that have been developed around the world, such as Asimo  or iCub , which can deliver impressive performance on the tasks that they are designed for (e.g., ). So far, however, they have failed to generate a significant stream of new findings in the area of human motor control. We believe that this is due to the fact that these anthropomorphic robots focus on replicating the appearance of humans, while they still rely on traditional methods of sensing, actuation, and control. It is therefore our strong conviction that the understanding of human motor control can only advance by a radically different approach, namely anthropomimetic design.
Anthropomimetic designs differ from conventional humanoid robots in not only imitating the morphological appearance of humans, but capitalizing on the replication of the inner structures of the human body such as equivalents of compliant muscles, tendons, bones, and joints. In principle, this approach is not entirely new, as demonstrated by the Japanese musculoskeletal robots Kenta, Kotaro, and Kojiro—all developed by the University of Tokyo [27, 39, 40]. These legged robots, which are ≈140 cm in size, are highly complex structures that provide many features that are relevant for the investigation of human motor control strategies and for aspects of morphological computation such as compliant, tendon-driven actuators and complex joint types. However, due to this complexity, significant results on motor control have yet to appear from research on these robots.
Therefore, the first problem we had to solve was “How could the complex musculoskeletal structure of the human body be replicated and manufactured?” Ideally, we would have aimed at a full-body walking robot, similar to Kojro, Kotaro, and Kenta. But we tempered ambition with realism, and confined ourselves to anthropomimetic torsos. In addition, we chose an iterative building approach to evaluate different design choices and to gradually increase the complexity of the musculoskeletal structure. The second question that immediately followed was “Is it possible to control such a nonstandard compliant multi-degree-of-freedom structure, and if so, by what means?” This question is a topic of ongoing discussion in biology and neuroscience, and two opposing theories have emerged: (i) the equilibrium point  and (ii) the internal model hypothesis . However, limiting ourselves to these theories would be too restrictive, and so multiple approaches, inspired by both classical control theory and neuroscience, were implemented and tested on the anthropomimetic robot platforms. Furthermore, to support the efficient evaluation of these approaches, a physics-based simulation toolbox was developed that is capable of simulating the complex dynamics of this class of tendon-driven robots.
This article summarizes the results of more than three years of robot and simulation model development as well as of research into the control issues of anthropomimetic robots. The problems associated with robot construction and manufacturing as well as the development of a human-inspired control architecture that can cope with the large number of actuators and sensors in these systems, in both hardware and software, are summarized in Section 2 and Section 3, respectively. Section 4 describes the physics-based simulation models that were used for the offline testing of the motor control approaches presented in Section 5. Finally, conclusions and prospects for future work are presented in Section 6.
2 The Robots
To date, we have developed and manufactured four anthropomimetic torsos of increasing complexity. Ecce-1, which is essentially a refurbished, sensorized Cronos (originally manufactured in 2006 ), features 44 tendon-driven actuators and two different types of shoulder designs (see Figure 1a). Its successor, Ecce-2, completed in 2010, introduced a more complex neck and head structure (see Figure 1b), whereas Ecce-3, finished in 2011, served as a test bed for a detailed replica of the spine (see Figure 1c). All improvements from these iterations are combined in the latest prototype, the Ecce-4, which is the robot described in the following sections. It is mounted on a mobile base and features the most detailed replicas of the human spine and shoulder girdle (see Figure 1d). In addition to these four torsos, two small-scale test rigs were designed and manufactured to support the investigation of specific control aspects in a less complex setup (cf., Figures 5a and 9c, discussed later in this article).
The adult human skeleton consists of ≈206 bones, and replicating this complex skeletal structure by conventional engineering techniques would be extremely challenging and perhaps impossible. Our solution was to employ a hand-crafted manufacturing process and to use a new type of engineering thermoplastic for the fabrication of the bones. This thermoplastic, which is known in the UK as Polymorph and in the U.S. as Friendly Plastic, is technically a caprolactone polymer. While it is like polythene in many ways, it fuses when heated to only 60°C (e.g., by plunging it into hot water) and can be freely hand-molded until it restiffens at ≈30°C. It has a distinctly bonelike appearance when cold, and since it is a true thermoplastic, it can be reheated and remolded as many times as necessary. In practical engineering terms, Polymorph is tough and springy. Its tensile strength is good—Polymorph has the highest tensile strength of all the caprolactones, at 580 kg/cm2. It can be further strengthened and stiffened, if necessary, by adding other materials, such as wires, metal rods, or bars.
One anatomical structure that we put particular emphasis on is the shoulder girdle, which extends the range of motions possible at the shoulder joint. In humans, the abduction of the humerus (the upper arm bone) away from the body is restricted to just 90° by contact with the arch of bone that provides the attachment for the deltoid, and all movements beyond this are achieved by a combination of the shoulder joint and the articulation of the scapula on the back of the thorax. The movement of the scapula itself is only restricted by the joint attaching the clavicle to the sternum, and so the scapula has great freedom of movement (e.g., upward as in shrugging the shoulders, or rolling as in raising the arm). The combined actions of the scapula and upper arm serve to keep the shallow cup of the shoulder joint beneath the ball of the humerus and thus stabilize the joint. More complex dynamic interactions also occur, as in the lifting of a heavy weight, where the powerful muscles that roll the scapula are employed to maintain the momentum of the weight once maximal force has been achieved in the shoulder muscles themselves. This complex shoulder girdle structure, with its six degrees of freedom, has been replicated for the Ecce-4.
Human skeleton muscles are endowed with mechanical properties, such as an intrinsic compliance, that contribute significantly to the grace, speed, efficiency, and robustness of human movement dynamics . Today, several technologies exist to mimic these properties for artificial muscles, most of them based on electromagnetic, pneumatic, or hydraulic actuators. However, even though each of these actuation technologies outperforms the human muscle in some aspects, no actuator exists that matches the overall performance of human muscles . For example, pneumatic actuators, such as the McKibben muscle, provide a high force-to-mass ratio and are inherently compliant, but require large compressors, unsuitable for mobile robots.
For our robots, we used electromagnetic, tendon-driven actuators, usually incorporating a series elastic element in the tendons. Each actuator consists of a DC motor equipped with a gearbox as the active element (corresponding to the muscle) and a length of kite line in series with an optional elastic element that functions as the tendon (see Figure 2). By coiling the kite line around the gearbox shaft, the artificial muscle can be either activated or relaxed, depending on the direction of rotation of the shaft. The motors and elastic elements were selected to match the required power and compliance properties of each individual muscle. Moreover, the attachment points of the actuators were chosen to reflect humanlike movement dynamics.
In humans, two types of mechanoreceptors provide feedback for proprioception—the sense of static positions and movements of the limbs of the body . The first type, the muscle mechanoreceptors, provide feedback for muscle properties, such as muscle length, speed, or stretch. These were imitated by equipping each tendon-driven actuator with: (i) a force sensor mounted to the distal attachment point of the muscle for stretch sensing, (ii) a current sensor within the electronic control units (ECUs) for motor control, (iii) an incremental encoder to provide a measure for the length of the kite line, and (iv) an absolute potentiometer that was used to avoid a calibration routine for the incremental encoders (see Figure 2).
The second type of receptors, the skeletal mechanoreceptors, provide joint angle feedback. Though these sensors are easily imitated for the revolute joints of industrial manipulators, they are difficult to replicate for the complex joints of anthropomimetic robot systems . Therefore, these sensors were omitted, but an approximation of the body pose can be backcomputed by fusing the sensory information from the various actuator sensors.
3 Control and Software Architecture
To cope with the large number of actuators and sensors in the torsos, a novel control and software architecture was developed. Here, inspiration was again drawn from biology, where the neuronal circuits of the spinal cord, brain stem, and forebrain form a hierarchical control architecture . This hierarchy was emulated using distributed ECUs for the low-level actuator feedback loops (see Section 5.1); these are controlled by a central computer via a controller area network (CAN see Figure 3 and ). Each ECU is capable of controlling two tendon-driven actuators and is equipped with a microcontroller, a CAN; interface, motor drivers for two brushed DC motors, several A/D converters for analog sensor connection, and two integrated Hall-effect-based measurement devices in the motor loop for motor current feedback. To achieve a high degree of robustness, each ECU was implemented as a fail-silent unit  with a firmware based on a finite state machine (FSM).
This control architecture was complemented by a comprehensive and modular software framework, termed the Eccerobot Research Platform (ERP; see Figure 4). The ERP was implemented in C++ and built on the robotic middleware package OpenRTM, which provides state-driven robot-technology components (RTCs) and uses the common object request broker architecture (CORBA) for inter-RTC communication . To form the core of the framework, multiple RTCs have been implemented to provide a plant abstraction layer for the robot-independent execution of actuator commands and sensor data acquisition. This component kernel is dynamically configurable through Extensible Markup Language (XML) files to support the various prototypes with different sensor and actuator configurations. Furthermore, high-level components based on Qt are built on this kernel to provide tools for real-time data analysis, robot debugging, or robot control.
In addition, a physics-based simulator was developed and integrated into the ERP for the simulation of the dynamics of tendon-driven robots. This simulator, called Caliper , is capable of importing and exporting simulation models using the open-standard XML schema Collaborative Design Activity (Collada) . It relies on the physics-based simulation engine Bullet Physics , originally developed for computer games, to simulate the dynamics of the robots (see Section 4).
4 Physics-Based Modeling
Simulating anthropomimetic robots using analytical modeling techniques is not feasible, due to the sheer complexity of the musculoskeletal structure that features spherical joints, muscles with pose-dependent lever arms, muscle-skeleton collisions, and even multi-articular muscles (muscles spanning more than one joint). Therefore, we used Bullet Physics  for the simulation of the skeleton (see Section 4.1) and implemented custom extensions to model the tendon-driven, compliant actuators (see Section 4.2). Bullet Physics, which is particularly optimized for performance, also offers the advantage of simulating the robot models in real time—a property that also enables entirely new prospective uses for the simulation, for example, as an internal model for prediction and control. Moreover, basic sensor models were implemented to provide the sensor modalities for the evaluation of the control approaches on a virtual robot, similar to the physical setups (see Section 4.3). Finally, an automated, steady-state pose calibration algorithm was developed, based on a (μ, λ) evolution strategy, to reduce the simulation-reality gap of the models (see Section 4.4).
4.1 Skeleton Modeling
Due to the hand-crafted manufacturing of the torsos, the impossibility of dismantling and reassembling individual components, and the high intrinsic complexity of the skeletons, the derivation of accurate skeleton models is an inherently difficult task. Hence, we tackled this problem by a three-step approach (see Figure 5a and ). In the first step we used true-to-scale photographs to define a preliminary 3D approximation of the skeleton structure. This step, however, suffered from two major drawbacks: (i) attachments, such as DC motors and potentiometers, were difficult to accurately position and orient in 3D, and (ii) there were size inaccuracies due to parallax error. In the second step, we employed laser scans of the robots to refine the 3D model of the first step, and a satisfactory approximation of the hand-crafted skeletons was eventually obtained by merging the rough model of the first step with the scan data.
In the last step, the initial static 3D model of the skeleton was converted into a dynamic model for the Caliper simulation framework (see Section 3). First, collision shapes were assigned to each bone. To ensure good simulation performance, complex meshed shapes were avoided, and shape primitives, such as cylinders, were used whenever possible. Second, masses were estimated based on the volume of the bone and the density of Polymorph, and the corresponding inertia tensors were computed from the assigned shapes and masses. Finally, holonomic constraints with custom viscous and Coulomb friction extensions were defined and integrated into the models to simulate the joints of the robot.
4.2 Actuator Modeling
For the tendon length LT, we evaluated two simulation models: (i) a virtual tendon, where LT is equal to the Euclidean distance between and , and (ii) a physical tendon simulated using soft-body dynamics, where LT is the sum of the Euclidean distances between all soft bodies. The virtual tendon implementation suffered from two main problems. First, no collisions between the skeleton and the tendon were considered, resulting in erroneous force vectors for some configurations. Second, stiff actuators—actuators that are not equipped with the optional elastic element—can only be approximated using high spring constants. The physical tendon implementation can cope with these problems, but introduces new ones due to the high computational load and occasional simulation instabilities resulting from the collisions between the tendon and the skeleton. We therefore used the virtual tendon implementation and extended this model by using spherical pulleys to compensate for the lack of tendon-skeleton collisions. These pulleys approximate the effects of the tendon when it collides with parts of the skeleton, by introducing spherical wrapping surfaces for the tendons.
4.3 Sensor Modeling
As listed in Section 2.3, each actuator is equipped with four types of sensors for proprioception. These sensors were modeled for the Caliper simulator to provide the required sensory feedback for the development and testing of controllers (see Figure 6a and ).
4.4 Model Calibration
Even though the derived models of the individual subcomponents of the robot, such as motors or springs, were identified and validated against their real counterparts, preliminary simulations of the full robot models exhibited a significant simulation-reality gap. We hypothesized that this is not due to the physics-based simulation approach per se, but rather due to: (i) the hand-crafted nature of the robot, which complicated the derivation of an accurate skeleton model, and (ii) the modeling simplifications that were necessary to simulate the complex dynamics of the tendon-driven muscles (as outlined in the previous sections). To evaluate this conjecture, we developed an automated, steady-state pose calibration algorithm based on a (μ, λ) evolution strategy (ES) . For the acquisition of the poses of the physical robot, a stereo-vision, infrared-marker-based motion capture system with real-time capabilities was developed, and a Gaussian-based, non-isotropic, self-adapting mutation operator was used by the ES to explore the search space .
For the calibration, we first recorded synchronized sensor data of various steady-state poses of the robot (kite line length of all muscles and corresponding joint angles) and evaluated the individuals of the ES by setting the recorded kite line length as the control reference for each muscle (see Section 5.1) and using the resulting joint angle deviations as the fitness measure. After the calibration, we assessed the quality of the calibration by analyzing the joint errors of the calibrated model for a second, disjoint set of validation poses created by linear interpolation of the kite line length reference values for two neighboring calibration poses.
The results of the calibration and validation of the anthropomimetic robot arm shown in Figure 5a with 11 muscles, a spherical shoulder, and a revolute elbow joint are presented in Figure 5b and Figure 5c, respectively. For these results, the following model parameters were selected as object parameters subject to the mutation operator of the ES: (i) the muscle attachment points and (both ∈ ), (ii) the initial kite line length LK0, (iii) the spring constant k of the linear spring-damper model, and (iv) the gearbox shaft spindle radius r, which is considered to be constant in the simulation but varies stochastically in reality due to the coiled kite line. These nine parameters were initialized for each actuator, based on laser scan data and manual measurements, resulting in a total of 99 object parameters. The parent and offspring population sizes were set to μ = 6 and λ = 36, respectively, to achieve a ratio of 1/6, which has been shown to provide a maximum rate of convergence . At generation 0, the average error between the physical and the simulated robot for all 15 calibration poses was equal to 32.9° for the elbow and 35.7° for the shoulder joint. However, the ES converged rapidly, and final average errors of 1.9° and 2.5° for the elbow and shoulder joints were achieved for the calibration poses after the full run of 250 generations. Similar deviations were observed for the best individual of the calibration phase and the disjoint set of validation poses. Here, final average errors of 1.9° and 2.0° were achieved for the elbow and shoulder joints, respectively.
5 Motor Control
In this section, we describe approaches to the control of anthropomimetic robots. First, a set of controllers for single tendon-driven actuators are presented in Section 5.1, which are used by the higher-level controllers. Two different approaches to joint space control are described in Sections 5.2 and 5.3. While the former relies exclusively on concepts from classical control theory, the latter introduces machine learning techniques to capture the effects of multi-articular and colliding muscles. The remaining sections deal with different approaches to operational space control, namely the positioning of the end-effector. Section 5.4 employs a developmental framework to learn reaching from sensorimotor interactions. Section 5.5 presents preliminary results on the application of reinforcement learning techniques to this problem in a full-fledged simulation of Ecce-2. Finally, Section 5.6 provides an integrated feedforward-feedback approach based on a black box model.
5.1 Actuator Control
For the control of a single tendon-driven actuator, four control modes were implemented on the ECU microcontrollers and emulated for the physics-based simulation: (i) a voltage control mode, (ii) a proportional (P) kite line length control mode that relies on feedback from the motor encoders (see Figure 6a), (iii) a proportional-integral-derivative (PID) current control mode that exploits the Hall-effect-based current sensors integrated in the ECUs, and (iv) a tendon force controller based on the force sensors integrated into the distal tendon attachment points (see Figure 6b). All these controllers can be parametrized and modulated at run time through higher control layers—reminiscent of the spinal cord and brain-stem circuits in humans . For the control cycles, a fixed frequency of 500 Hz was used. Even though this frequency is high compared to those in biological systems, it is low by engineering standards. Hence, particular emphasis had to be put on the control laws to achieve stable and responsive control, especially for the tendon force control mode. Here, a state space controller was implemented that proved to be superior to conventional proportional-derivative (PD) approaches . It can be designed directly in the discrete space domain, taking the control frequency into account at design time, utilizing the state space actuator model (see Section 4.2), by moving the closed loop poles to the desired values (Ackermann's formula ). Furthermore, to compensate for the steady state offset of the state space controller due to the missing integrator element, a prefilter was incorporated.
5.2 Puller-Follower Control
In this section we investigate the application of classical control methods to the control of hinge joints, each actuated by a pair of compliant agonist-antagonist actuators. The advantages of humanlike joint structures have been identified in a number of investigations [41, 57, 58], but definite answers to the control problem are yet to be found. Very few results have been presented on attempts at control of the antagonistically coupled compliant drives [28, 37]. In this section we focus on energy efficiency and suggest a new control approach inspired by patterns of electromyography signals measured in antagonistically coupled human muscles .
The suggested puller-follower approach is inspired by the asymmetric conditioning of human muscles; the agonist actuator, the puller, takes the responsibility for the joint position as the main control requirement, whereas the antagonistic actuator, the follower, keeps its tendon stretched (thus preventing slackening) by maintaining the tension at some appropriate low level. This prescribed value of the antagonistic tendon force is called the reference tension force. For this system to be energy efficient, the reference tension force has to be low, but sufficiently high to prevent the slacking of the tendons. Motions with variable speeds require the actuators to exchange roles (puller becomes follower and vice versa). This switching is followed by oscillations in tendon forces (switching shock), which can cause slackening. It has been proved that there exists an optimal (relatively high) level of the follower tension that minimizes this shock . However, this high reference tension force would reduce the energy efficiency of the system. Thus, we suggest an adaptive reference tension, which is low during regular motion and is increased during the periods of switching (see Figure 7a). To control the joint position as well as the antagonist tension force, we use input-output feedback linearization  through a two-step compensator. Satisfactory results in controlling a single joint were achieved.
We have also evaluated the puller-follower concept on a multi-joint system with seven DOF, each actuated by nonlinear compliant antagonistically coupled drives . The results revealed problems due to the dynamic coupling between joints, the variable effective joint inertia (depending on the entire mechanism position), and the influence of gravity. To overcome these difficulties we applied the H-infinity loop shaping method—a combination of loop shaping and robust stabilization —and online gravity compensation based on a static model . After these modifications, satisfactory results were obtained (see Figure 7b). However, one should be aware of serious limitations of the puller-follower approach, such as the lack of support of spherical joints or multi-articular muscles.
5.3 Computed Force Control
Classical control theory provides a unified and flexible framework for the control of industrial robots. Computed force control now aims at exploiting this framework by extending the concept of computed torque control for tendon-driven actuators.
However, the analytical derivation of the muscle Jacobian for our anthropomimetic robots is not possible, because of the absence of CAD data, and because of the presence of tendon-skeleton collisions. Therefore, we decided to approximate it by (i) utilizing machine learning techniques to learn the geometric mapping between tendon lengths l and joint angles q, and (ii) differentiating this geometric mapping using the difference quotient . As samples can be drawn from the entire joint space, the function approximator need only support the interpolation between samples, without the need for extrapolation. Well suited for this kind of problem are artificial neural networks (ANNs) that can be trained by a supervised learning approach using the well-known backpropagation algorithm . The main advantage of this machine learning approach is that the muscle Jacobian can be learned directly on the physical robot (e.g., by supervised movement of the limbs and backcomputation of the tendon length) and that complex musculoskeletal interactions, such as multi-articular actuators, are automatically captured. The only limitation is that the geometric mapping of the actuator lengths has to be unique and continuously differentiable with respect to the joint angles.
This computed force control approach was implemented and tested for the simulation model of our shoulder prototype (see Figure 5a). Samples for training the ANN were obtained by (i) moving the simulated joints to different joint angles and (ii) performing random movements through motor babbling. In summary, a control cycle proceeds as follows (see Figure 8a): First, reference accelerations are computed from the reference joint angles qref, using a standard PID controller. Subsequently, Equation 6 is used to compute the required joint torques τ to reach . In the next step, the machine-learned muscle Jacobian is utilized to transform the joint torques τ into the corresponding actuator forces f. However, as Equation 8 is underdetermined if solved for f, a quadratic optimization method was used to resolve this redundancy . As objective function, we chose to use the sum of squares of the Euclidean distances between all actuator forces, which minimizes the sum of internal forces. However, different objective functions like minimum energy or minimum muscle activation could be defined, which would result in different system dynamics. This is one of the advantages of the computed force control approach, as it relies on explicitly formulated system models, which can be exploited in different ways to obtain a behavior that possibly matches human movements. The optimization routine can also be used to prevent tendons from going slack. This can easily be achieved by defining a minimum actuator tension as a constraint for the optimization routine. Finally, the “optimized” forces f are set as new references for the force controllers of the 11 actuators (see Section 5.1). An example of joint trajectory tracking with this approach is shown in Figure 8b. The achieved tracking of the reference angles is impressive, both for single-joint movements (see markers 1, 2, and 3) and for multi-joint movements (markers 4 and 5).
We are currently preparing the testing of this computed force control approach on a physical robot. However, as the approach requires joint angle and velocity feedback, we first have to solve the problem of measuring these sensor modalities for the spherical shoulder joint. For this, approaches based on extrinsic sensors, such as the motion capture system used in Section 4.4, or on accurate system models to backcompute the joint angles from the proprioceptive sensors of the actuators, render promising alternatives.
5.4 Sensorimotor Control
The sensorimotor approach to control aims to include principles from developmental biology in the design of control architectures for artificial systems. In this section we describe the sensorimotor framework developed for learning to reach in a reduced motor space (for details, see ). Our framework includes two learning processes: an unsupervised learning process (ULP) and a supervised learning process (SLP). The former exploits the passive dynamics of our compliant robots to build a basic set of motor synergies based on Hebbian learning principles; the latter uses the motor synergies to learn reaching in a reduced motor space (see also [51, 54]).
The schematic diagram of the sensorimotor framework with the ULP and SLP is shown in Figure 9a. To describe the sequence of events in each process, we will use the numbering system in Figure 9a. The ULP works as follows. In U1, spontaneous motor twitches (SMTs) produce independent contractions in individual actuators. This type of actuator activity has been inspired by observations of human behavior during sleep, from early stages of development to adulthood [7, 49]. In U2, the SMTs produce forces, which are propagated through the musculoskeletal system (as well as through the environment where it is embedded). In U3, the changes produced in the musculoskeletal system are captured by the different body sensors, which (in U4) convert them into sensor activity. In U5, the correlation between sensor and motor activity is used to learn sensorimotor mappings, which form the basic motor synergies. A variant of this scheme has also been used to self-organize spinal reflexes .
Once the motor synergies are obtained, they are used by the SLP. The SLP consists of an iterative process that aims to identify the appropriate goal modulation parameters required to achieve a given goal in an iterative way; it works as follows. In S1, a goal is set in the system; in our case this is a 2D reaching goal. In S2, the parameters of the goal signals are modulated (i.e., scaled). These are the only parameters being learned during the SLP. In S3, the modulated goal parameters activate the motor synergies, which combine the activations of the individual actuators. In S4, the activation of the different motors cause the musculoskeleton to move, which (in S5) induces sensory stimulation. In S6, the novel sensor values are evaluated with respect to the goal. And in S7, the modulation gains are changed and the process is iterated.
For these experiments, we used a real tailored pendulum platform. This platform consists of a pendulum actuated by four tendon-driven actuators as shown in Figure 9c. Each actuator is equipped with a force sensor placed in series with the elastic element, and a potentiometer placed on the gearbox shaft (similar to Figure 2); the two resulting measurements are used to estimate the actual tendon length (i.e., the distance between the motor and the attachment point of the muscle on the pendulum rod). Additionally, a camera is placed on the base of each pendulum to track the position of the endpoint of the rod (projected on the image plane of the camera). For these experiments, only the information on the x and y axes from the camera is used for sensors. A schematic representation of the pendula is shown in Figure 9c.
The results of the ULP unsupervised process are shown in Figure 9b. The diagram represents the connectivity between each sensor element and each motor element in the matrix Q. An intuitive way of interpreting Q is considering the filled squares as the muscles that will be recruited when the goal entails a positive sign in the respective dimension, and the empty squares as the muscles that will be recruited when the goal entails a negative sign in the respective dimension. For example, if the goal entails a positive projection on the x axis, then muscles M1 and M2 will be recruited; if it entails a negative projection, then muscles M3 and M4 will be recruited. In a way this is related to the muscle Jacobian mentioned in Section 5.1.
During the SLP, the calculation of the individual motor signals from the motor synergies is given by M = Q(α ⊗ S∗) (see caption of Figure 9). The results of the SLP are shown in Figure 9d. For each trial, the plot shows the average and standard deviation achieved for 25 different reaching targets. The reaching targets are equally spaced on a grid covering the workspace of the pendulum. For each trial, we allow a total of 20 iterations. As can be seen, in general, the error decreases with increasing number of trials. The small increases observed in some of the iterations (e.g., iteration 17) are mainly due to the fact that the resting position of the pendulum is slightly different at every iteration step. To investigate the scalability of our algorithm, we are currently building a scaled-up version of this model in simulation, which contains more muscles and more degrees of freedom.
5.5 Control via Muscle Synergies Emergent Under Reinforcement Learning
In this section we investigate the application of reinforcement learning (RL) techniques to the learning of control strategies in anthropomimetic robots. Evidence for aspects of reinforcement or reward-based learning abounds in motor cortex research, yet classical RL approaches (TD learning, etc.) suffer heavily from the curse of dimensionality when used as a generic solution where every actuator can take any input level at every time step. Although generic RL algorithms optimized for high-dimensional systems are now emerging , we instead pursue the idea that the biomimetic nature of the Ecce's structure gives scope to significantly reducing dimensionality via a muscle synergy approach employing relatively simple control signals and leveraging the natural dynamics and redundancy of the structure.
In support we cite both anatomical evidence [24, 38] and the decomposition analysis of muscle EMG signals  suggesting that signals from the brain are projected not to individual muscles, but to groups of related muscles. Distinct time-invariant muscle weighting profiles (synergies) can be identified, which appear to be driven by shared time-varying activation signals . Individual muscle EMGs can be shown to match a linear combination of these synergies [9, 12–14]. Applying these principles, synergy-based low-dimensional controllers of modeled frog legs  and RL-based learning of synergy weightings for modeled musculoskeletal human arms  have been successfully demonstrated.
Here we report early results of applying this approach (manuscript in preparation) to the full torso, physics-based model of the Ecce-2 anthropomimetic robot (Figure 10a). It is a challenging control subject, and was considered of a similar order of difficulty to that of the robot itself.
For our approach, we will define a motor plan as comprising simply a parametrized control signal shared in weighted proportion between a set of muscles. New motor plans are generated for trialing by weighted combinations of existing plans, employing a basis function of the target location but also favoring those that have proven rewarding as contributors to past plans. In this way, we specifically seek to encourage emergence of particular muscle combinations (synergies) that have proven most effective as modular contributors to an action. The RL acts to optimize the control signal parameters and the muscle weighting combination within the plans. Of particular interest is the extent to which the controller can still perform on just a subset of plans acting as key muscle synergies, or primitives.
Initial work has focused on open loop motor control of a reaching task, namely, moving a hand to a target object located randomly per trial. This task was selected because it called for precision control of a complex, compliant structure, using multiple cooperating muscles.
Initial experiments considered a simple scenario with the robot torso model placed on a static base (Figure 10a). All trials commence with the model balanced at rest with arms at the sides. The input to the learning algorithm comprises solely the random [x, y, z] target position (Figure 10a, red zone). The target object is balanced on a minimal static base. Motor plans are allocated a reward for striking or touching the bottle and also for passing the hand into the larger reward zone (the green sphere) and holding it there, thus encouraging plans that slow or stop the hand near the target, supporting future extensions incorporating grasping.
Exploitation, exploration, reliability, and optimization are targeted through the use of different forms of added noise. The random placement of targets naturally generates weighting noise, as the contributions of plans are never the same. Signal-dependent noise proportional to amplitude is added to encourage the emergence of increasingly optimal velocity profiles as predicted by increasing reliability under noise .
We do not attempt as yet to bootstrap control emerging from a zero starting state; instead, the natural amenability to control offered by the structure's biomechanical constraints means that a starting set of rough motor plans can be relatively easily handcrafted. The RL acts to combine and optimize these plans to generate accuracy and reliability for any target in the zone. Future collaborative work may investigate the emergence of the initial set from reflexes and sensorimotor dynamics (see Section 5.4).
The initial results are generally encouraging. After a sufficient learning period, the robot model can at least strike the target on 76% of random trials (Figure 10b), although higher target locations are less reliably struck. Slowing the hand sufficiently for reliable grasping is less well developed and currently appears to be restricted to a subset of amenable locations. A video is also available to view (http://youtu.be/0D_pb8qjNKE).
Current analysis is investigating whether the motor plans that do slow the hand are doing so via antagonist stiffening, whether velocity and jerk profiles show any approach to optimality, and whether there is evidence that learning is exploiting compliance and natural dynamics. The emergence of key primitives is under investigation by looking at the effect of reducing the bank of stored plans. Future work will consider reaching from any starting posture, reaching in the presence of obstacles, and the applicability of this learning approach to one of our physical robot torsos. In conclusion, this appears to be a promising approach to controlling highly biomimetic structures, subject to the caveat that it is currently untested in higher-dimensional problem spaces.
5.6 Experience-Based Integrated Feedforward-Feedback Control
The control approaches presented in the previous sections were either exclusive feedforward or exclusive feedback control schemes. Voluntary movement control in humans, however, is proposed to rely on both of these concepts. For instance, whereas feedforward is considered to be important for adjusting the grip force during an arm movement , feedback is hypothesized to play a role in feedback-error learning of inverse models . Inspired by these ideas, we developed an integrated feedforward-feedback control approach that aims at solving the point-to-point control problem (end-effector position without orientation) in the operational space .
The feedforward stage of this approach consists of two phases. In the first phase, the experience acquisition, a series of input-to-output experiments is performed to form a black-box forward model of the robot, which we called the experience base. Here, different initial positions Xs of the end effector are specified, and linearly scaled input voltage patterns V(t)—derived from EMG recordings—are applied for each initial position. The resulting end-effector position Xf as well as the initial condition and the applied voltage pattern are recorded and stored in the experience base. In the second phase, which we called system exploitation, the acquired forward model of the first phase is employed as an inverse model to derive the required control input patterns Vd(t) for a desired end-effector position Xd, which was not part of the previous experience acquisition phase. Here, a linear interpolation scheme was used to compute the inputs from a set of closest neighbors of the target position.
This control approach was tested on a simulation model, comprising seven revolute joints and two tendon-driven actuators per joint. The results are shown in Figure 11. For the feedforward stage, the end-effector position offset was equal to 6 mm along the z axis and 2 mm along the x and y axes. This error was further decreased by both superimposed feedback controllers, and a final error of less than 1 mm was achieved for all position coordinates. However, the experience-based linear feedback gain approach has proven to be slightly superior to the fuzzy logic feedback, as can be seen in Figure 11.
As this control approach is based on input-to-output experience, it is normally expected that the methods and algorithms presented can be applied to a wider range of robots. However, the approach was tested in a relatively simple simulation model, and further tests are required to assess its performance in more complex setups providing additional challenges to the anthropomimetic design, such as spherical joints or multi-articular actuators.
In this article we have presented the results of our approach to anthropomimetic robotics. We have built four torso prototypes, each mechanically more complex than its predecessor. The latest prototype, the Ecce-4, is mounted to a mobile base and features 54 actuators as well as detailed replicas of the human spine and shoulder girdle. Furthermore, we developed a flexible, distributed control and software framework, tailored for this particular class of robots, that can cope with the large number of actuators and sensors of these systems. Moreover, we implemented all the required tools to simulate the dynamics of this class of robots.
From the scientific point of view, we have presented control approaches in the joint and operational space domain. We showed results of controlling antagonistically actuated joints using methods from traditional control theory, as well as two approaches based on unsupervised learning techniques, that suggest that machine learning is a promising technique to achieve operational space control. Finally, we have tested a neuro-inspired approach relying on a developmental framework that has the capacity to elucidate the development of human motor control.
Although we have made progress in the design, simulation, and control of anthropomimetic robots, we recognize that we have barely scratched the real potential of this novel approach, particularly with respect to morphological computation. In general, the advantages of tendon-driven robots are becoming more and more acknowledged in the morphological computation community. But a number of technical problems still prevent the field from being completely established. One such problem is the lack of required sensor modalities (e.g., joint angles) for this type of robot, which also prevented us from evaluating our control approaches on the full-scale robot. Another, very important issue is that the body and the controller have to coevolve due to their morphological computation interplay. Once these problems are fully understood and solved, we are confident that the anthropomimetic approach will shed new light on the role of morphological computation in human motor control.
The research leading to these results has received funding from the European Community's Seventh Framework Programme FP7/2007-2013–Challenge 2–Cognitive Systems, Interaction, Robotics, under grant agreement no. 231864—ECCEROBOT.
The Robot Studio SARL, 122 rue de la Versoix, 01220 Divonne les bains, France. E-mail: firstname.lastname@example.org
Automatic Control Lab, ETH Zurich, Switzerland. E-mail: email@example.com