One key idea behind morphological computation is that many difficulties of a control problem can be absorbed by the morphology of a robot. The performance of the controlled system naturally depends on the control architecture and on the morphology of the robot. Because of this strong coupling, most of the impressive applications in morphological computation typically apply minimalistic control architectures. Ideally, adapting the morphology of the plant and optimizing the control law interact so that finally, optimal physical properties of the system and optimal control laws emerge. As a first step toward this vision, we apply optimal control methods for investigating the power of morphological computation. We use a probabilistic optimal control method to acquire control laws, given the current morphology. We show that by changing the morphology of our robot, control problems can be simplified, resulting in optimal controllers with reduced complexity and higher performance. This concept is evaluated on a compliant four-link model of a humanoid robot, which has to keep balance in the presence of external pushes.
The control of compliant robots is inherently difficult, due to their nonlinear stochastic dynamics. Morphological computation poses the vision that parts of the complexity of a control or learning task can be outsourced to the morphology of the robot [11, 12]. This has been illustrated in many impressive biologically inspired robotic applications. For example, Tedrake et al.  showed that the task complexity of learning to walk is drastically reduced by exploiting the dynamics of a passive biped walker. Iida and Pfeifer  demonstrated that the stability of a simple four-legged robot exhibits surprisingly robust behavior using a mixture of active and passive joints. For swimming  or flying  robots, investigations of the morphologies yield a rich behavioral diversity using only a single actuator exploiting the dynamics of the environment.
These approaches typically used minimalistic control architectures, like open loop actuation with sinusoidal drives [8, 20, 19] or a simple linear combination of state-dependent features . The morphology is assumed to absorb much of the computation needed for controlling the robot, and therefore less emphasis is placed on the controller. More sophisticated control architectures were omitted, as these are typically more difficult to fine-tune for a new morphology. Thus, finding a good control law constitutes an individual optimization problem for each morphology, which renders the joint optimization of the morphology and the controller challenging. However, an autonomous system has to encompass both aspects—the morphology and the control architecture—for efficient learning of complex motor skills.
One remarkable exception to the typically used minimalistic control architectures is the theoretical study of the morphological contribution for compliant bodies in Hauser et al. . Those authors have demonstrated in simulations for simple but nonlinear generic models of physical bodies, based on mass-spring systems, that by outsourcing the computation to the physical body, the difficult problem of learning to control a complex body could be reduced to a simple and perspicuous linear learning task. Thus, it cannot get stuck in local minima of an error function. However, learning the optimal morphology and the optimal control law remains an open problem for many motor control tasks using real robots.
As a first step toward this vision, we propose to directly use optimal control methods for eliminating the dependence between the morphology of the robot and the control architecture. Thus, for evaluating the computational power of the morphology, we will always use the optimal control law connected to this morphology. This allows for a fair comparison of different morphologies and simplifies the search for an optimal morphology.
For determining our optimal control law for a given morphology, we can use one of the many tools provided by stochastic optimal control (SOC) methods, such as those of Todorov and Li , Kappen, , and Toussaint . These methods have been shown to be powerful approaches for movement planning in high-dimensional robotic systems. We will use approximate inference control (AICO) , as it is a state-of-the-art planning method for stochastic optimal control tasks. AICO is based on probabilistic inference for motor control. The beauty of this approach is that there is no distinction between sensor and motor, perception and action. We can include a multitude of variables, some of which might represent some features of the state, some of which might represent goals, constraints, or motivations in the future, and some of which might represent future actions or motor signals.
Like all other stochastic optimal control methods, AICO minimizes a cost function, which is in our case given by the quadratic distance to a target state and the energy used in the movement. In order to apply the AICO method to torque-constraint dynamical models, we will briefly explain in Section 2 how to extend the algorithm to systems with control limits.
Quantifying how much computation is done by the morphology of the robot and how much is done by the controller is often a difficult problem. Despite the advances in theoretical models of morphological computation  where the computational power of the plant can be assessed in a principled manner, this remains an open problem for many motor control tasks using real robots. Yet, by the use of optimal control laws, the computation done by the morphology can be quantified by investigating the complexity of the controller. Thus, if the controller needs to do a lot of computation, less computation is provided by the morphology in order to fulfill a task.
AICO also provides us with a time-varying linear feedback controller as a policy to generate movement trajectories. We will use the variance of the control gains as a complexity measure of the controller. If the control gains are almost constant in time, the control law is close to linear and does not perform much computation. However, if the control gains are highly varying, the controller needs to do a lot of computation, and therefore, less computation is provided by the morphology.
As different complexity measures are possible, we additionally use the final costs and the total jerk of a movement trajectory for comparison. We illustrate the power of morphological computation combined with optimal control on a dynamical model of a humanoid robot (70 kg, 2 m). The robot is modeled by a four-link pendulum, which has to keep balance in the presence of external pushes. We will show in Section 3 that on changing the morphology of a robot (e.g., the joint friction, the link lengths, or the spring constants), the resulting optimal controllers have reduced complexity. As we will demonstrate, there are optimal values for the physical properties for a given control task, which can only be found by the use of optimal control laws. With naive control architectures, different, suboptimal morphologies would be chosen.
1.1 Related Work
We propose to use optimal control methods to eliminate the dependence between the control architecture and the morphology of the robot. These SOC methods, such as those of Todorov and Li , Kappen , and Toussaint , have been shown to be powerful methods for controlling high-dimensional robotic systems. For example, the incremental linear quadratic Gaussian (iLQG)  algorithm is one of the most commonly used SOC methods. It uses Taylor expansions of the system dynamics and cost function to convert the nonlinear control problem in a linear dynamics, quadratic costs, and Gaussian noise (LQG) system. The algorithm is iterative—the Taylor expansions are recalculated at the newly estimated optimal trajectory for the LQG system.
An SOC problem can be reformulated as an inference problem in a graphical model [7, 16], which has the nice property that there is no distinction between sensor and motor, perception and action. The unknown quantities (i.e., the states and actions) can be efficiently inferred by using, for example, the AICO algorithm . The graphical model is given by a simple dynamic Bayesian network with states xt, actions ut, and task variables zt (representing the costs) as nodes; see Figure 1. In this graphical model, observations are denoted by shaded nodes, in contrast to the unknown random variables, which are indicated by circles. If beliefs in the graphical model are approximated as Gaussian, the resulting algorithm is very similar to iLQG. Gaussian message passing iteratively reapproximates local costs and transitions as LQG around the current mode of the belief within a time slice. One contrast with iLQG is that AICO uses forward messages instead of a forward rollout to determine the point of a local LQG approximation. Thus, AICO allows us to iterate belief reapproximation within a time slice until convergence. This may lead to faster overall convergence .
The dynamic Bayesian network shown in Figure 1 is fully specified by the conditional distributions encoded by the cost function and by the state transition model. As with most SOC methods [6, 15, 16], we assume that the cost function and the state transition model are known. However, for model learning, many types of function approximators can be applied [9, 10, 18]. For learning the cost function, inverse reinforcement learning methods such as those of Abbeel and Ng  and Boularias et al.  could be used. Since we demonstrate in this article how morphological computation and optimal control can benefit from each other in general, an extension to the more complex learning problems remains for future research.
The original formulation of the AICO method  does not consider torque limits, which are important for many robotic experiments as well for the dynamic balancing experiments we consider in this article. Therefore we extended the algorithm, as is discussed in the next section.
2 Probabilistic Inference for Motor Planning
We use the probabilistic planning method AICO  as the optimal control method. Most applications of AICO are in the kinematic planning domain. Here, we want to apply AICO to fully dynamic, torque-controlled robot simulations. Therefore, we had to extend the AICO framework with control or torque limits, as is explained in the next sections.
2.1 Approximate Inference Control
We will use the simplest version of AICO, where an extended Kalman smoothing approach is used to estimate the posterior P(x1:T, u1:T|z1:T = 1). The extended Kalman smoothing approach uses Taylor expansions to linearize the system and subsequently uses Gaussian messages for belief propagation in a graphical model. Gaussian message passing iteratively reapproximates local costs and transitions as an LQG system around the current mode of the belief within a time slice. Concise derivations of the messages for AICO are given in Appendix 1, as they are used to further extend the algorithm to implement control constraints.
AICO is only a local optimization method, and we have to provide an initial solution that is used for the first linearization. We will simply use the first state as the initialization: x2:T = x1.
If we use AICO with a constant cost and dynamic model for each time step, the algorithm reduces to calculating a linear-quadratic regulator (LQR), which is often used in optimal control. An LQR is the optimal linear feedback controller for a given linear system. In contrast, AICO uses a time-varying linear feedback controller, which may be different for each time step. In our experiments we will compare the two approaches on a dynamic nonlinear balancing task. Thus we evaluate the benefit of using AICO (time-varying linear feedback control ) and an LQR (constant linear feedback control ).
2.2 Cost Function for Constrained Systems
We investigate morphological computation combined with optimal control on dynamic nonlinear balancing tasks , where a robot gets pushed with a specific force and has to move so that it maintains balance. The optimal strategy is a nonlinear control law that returns the robot to the upright position. For our evaluations, different initial postures and multiple forces are used, as sketched in Figure 2. We study the morphological computation by changing different physical properties of a four-link model of a humanoid robot. In particular, we investigate the computation done by different friction coefficients and link lengths. Inspired by the work of Hauser et al. , we also incorporate springs into our model for analyzing different spring constants.
We use a four-link robot as a simplistic model of a humanoid (70 kg, 2 m) . The eight-dimensional state xt is composed of the ankle, the knee, the hip, and the arm positions and their velocities. Table 1 in Appendix 2 shows the initial velocities (resulting from the force F, which always acts at the shoulder of the robot) and the valid joint angle range for the task. In addition to the joint limits, the controls are limited to the intervals [±70, ±500, ±500, ±250]Nm (ankle, knee, hip, and arm). For more detail we refer to Atkeson and Stephens .
The movement trajectories are simulated with a time step of 0.5 ms. For the AICO algorithm, we use a planning time step of Δt = 5 ms and a horizon of T = 500, which results in a movement duration of 2.5 s. We use a torque-dependent noise model—the controls are multiplied by Gaussian noise ϵ with mean 1 and standard deviation 0.25, that is, we use 25% torque-dependent noise. The noise affects the system dynamics while simulating a trajectory, where . The experiments are performed for multiple forces sampled from , as shown in Figure 2.
3.2 Complexity Measures
We evaluate the average values of the final costs L(τ) in Equation 1, the variance of the time-varying controller gains returned by the AICO approach in Equation 2, and the total jerk of the trajectory (proportional to the squared derivative of the torque). Different complexity measures would be possible. However, the chosen ones are plausible, since the complexity of controlling the robot is reflected by how well the costs L(τ) are optimized. As the jerk of a movement tracks the derivatives of the torques, this also seems to be a reliable complexity measure. Finally, the variance of the time-varying controller gains quantifies the complexity of the control law in comparison with linear controllers (no variance).
3.3 Friction Induces Morphological Computation
In this experiment, we evaluate the influence of the friction coefficient γ on the quality of the optimal controller for the four-link model. The friction coefficient directly modulates the acceleration of the joints, that is, .
Figure 3a shows the resulting final costs for different friction coefficients. In order to illustrate the need for sophisticated control architectures, we compare the optimal control method AICO with a LQR controller and a simple constant linear feedback controller. While AICO calculates the optimal control law for the nonlinear system, the LQR controller linearizes the system at the upright position and subsequently calculates the closed form solution of the optimal controller for the resulting LQG system. The constant linear feedback controller does not adapt to the morphology of the robot. As we can see from Figure 3a, we cannot determine the optimal morphology with the simple linear controller, while we can recognize a clear minimum with the AICO and the LQR controller.
AICO could find the simplest control law according to the jerk criterion in Figure 3b and was also able to find control laws with considerably decreased costs. In Figure 3c, we illustrate the variance of the time-varying controller gains. The complexity of the resulting controllers is reduced by changing the friction coefficient. The minimum lies in the same range as the minimum of the cost function. Thus, in this case, a simpler controller resulted in better performance because the morphology (the friction) of the robot has simplified the control task. Still, we needed a more complex optimal control algorithm to discover this simpler control law than that given by the LQR.
In Figure 4 we depict the joint and torque trajectories for the hip and the arm joint, where we applied the friction coefficients γ = 0, γ = 12, and γ = 30. With γ = 0, the trajectories overshoot as shown in Figure 4a, whereas in Figure 4c more energy is needed for stabilizing the robot with γ = 30. The trajectories for the optimal friction coefficient γ = 12 (according to the jerk complexity measure) are shown in Figure 4b.
3.4 The Influence of the Link Lengths
To show the effects of morphological computation on a more complex variation of morphologies, we consider in this experiment the variation of the shank (i.e., the calf), the thigh, the trunk, and the arm length, l = [l1, l2, l3, l4]. Initially the link lengths are l0 = [0.5, 0.5, 1, 1] m and the weights of these links are specified by m0 = [17.5, 17.5, 27.5, 7.5] kg.
We consider variations of l of up to 50% from their initial configuration. Different link lengths of the four-link robot result in different initial velocities , which influences the complexity of the motor control problem. For this reason, we consider constant inertias for a fair comparison. Thus, for each link length l the masses of the links are given by m = m0l02/l2. To match the physical properties of humans we assume equal lengths for the shank and the thigh link: l1 = l2.
In addition to the link lengths of the thigh, the trunk, and the arm, we still optimize the friction coefficient, resulting in four parameters (l2, l3, l4, and γ) of our morphology. A naive search for optimal parameters as in the previous experiment is intractable in the multidimensional case. Therefore, we apply the nonlinear stochastic optimizer of Hansen et al.  as a local search method.
The stochastic optimizer locally explores the parameter space (l2, l3, and l4) until convergence. For example, the resulting final cost values for different link lengths are illustrated in Figure 5 when using AICO as the optimal control law. For this experiment, we used a friction coefficient of γ = 12, and we averaged over multiple initial states and multiple forces. The optimizer converged to link lengths of l2 = 0.39, l3 = 1.18, and l4 = 0.5 for the thigh, the torso, and the arm.
For multiple friction coefficients, an evaluation of the presented complexity measures is shown in Figure 6. According to the final costs in Figure 6a, the best controller was found with a friction coefficient of γ = 9. For the jerk criteria in Figure 6b and the variance as complexity measure shown in Figure 6c, the minimum of the complexity measures lies in the same range.
The morphologies found by the optimizer for the AICO and the LQR controller are sketched in Figure 7 for the friction coefficients γ = 0, γ = 9, and γ = 30. The initial configuration of the robot is shown in Figure 7a and in Figure 7e. Interestingly, with increasing values of the friction coefficients, the link lengths change to compensate for the larger motor controls. This is shown in Figure 7b–d for the AICO controller and in Figure 7f–h for the LQR controller. AICO could again find significantly different morphologies from the LQR controller. As is illustrated in the labels of Figure 7, the morphologies found by AICO could produce lower costs than the morphologies found by LQR.
3.5 Robot Model with Linear Springs
Inspired by the theoretical analysis of , we also evaluate the computational power of linear springs in our model. The springs are mounted on the four-link model as illustrated in Figure 8. These springs support the control of the robot as they push it back into its upright position. Thus, the torques applied to the joints are given by the sum of the controls provided by the feedback controller and the spring forces ut∗ = ut + diag(k)Δls. The vector Δls contains the displacements of the four springs, and k is a vector containing the four spring constants. Note that due to the calculation of the displacements, the springs act nonlinearly on the joint torques.
The adjustable parameters of the morphology include the friction coefficient and the four spring constants k = [k1, k2, k3, k4] for the ankle, the knee, the hip, and the arm joint. Initially the spring constants are set to k0 = [1, 103, 103, 1]. For the optimization, a lower bound of 0.1k0 and an upper bound of 100k0 are used.
As in the previous experiment, for different friction coefficients γ, we optimize the spring constants using the stochastic search method of Hansen et al. . The results for the final costs, the total jerk, and the variance of the feedback controller as complexity measures are shown in Figure 9. The benefit of the supporting springs is best illustrated by the final costs (AICO: 109.5; LQR: 122.2) in Figure 9a, which are drastically reduced compared to the final costs using the four-link model without springs (AICO: 146.9; LQR: 246.9) shown in Figure 3a.
The optimal spring constants strongly depend on the friction coefficient used. This is illustrated in Figure 10, where we compare the learned spring constants using AICO and LQR as control methods. With increasing friction, the spring constants increase to compensate for larger control costs.
3.6 Concluding Summary
We have shown that by using optimal control methods an optimal morphology could be found for the four-link robot model. This was evaluated on three tasks with increasing complexity: optimizing only the friction coefficient, optimizing the friction coefficient and the link lengths, and finally, optimizing the friction coefficient and the four spring constants. We can now also ask how beneficial it was to optimize the link lengths compared to optimizing the spring constants. This is best reflected by the achieved final costs in Figure 3a, Figure 6a, and Figure 9a, where the most complex task also performs best.
In this article, we have shown that optimal control and morphological computation are two complementary approaches that can benefit from each other. The search for an optimal morphology is simplified if we can calculate an optimal controller for a given morphology. This calculation can be done by new approaches from probabilistic inference for motor control, such as the approximate inference control algorithm . By the use of optimal control methods, we have shown that for dynamic nonlinear balancing tasks, an appropriate setting of the friction, the link lengths, or the spring constants of the incorporated springs of the compliant robot can simplify the control problem.
We have demonstrated that there are optimal values for the physical properties for a given control task, which can only be found by the use of optimal control laws. With naive control architectures such as the evaluated constant linear feedback controller, which does not adapt to the morphology, different, suboptimal morphologies would be chosen.
In our studies, we assumed that we can always compute an optimal controller given the current morphology. If this is possible, then the morphological computation can be quantified using the proposed method. However, for computing optimal control laws (using, e.g., AICO), we need to know the costs at each time step, as well as the dynamics model. For more complex tasks, these costs and the model are usually unknown. However, as argued in the related work in Section 1.1, for model learning, many types of function approximators can be applied [9, 10, 18]. For learning the cost function, inverse reinforcement learning methods such as those of Abbeel and Ng , and Boularias et al.  could be used.
In the future, we plan to investigate more complex and more nonlinear tasks. In these cases the benefit of AICO in comparison with LQR controllers should be even more prominent. In the end we are planning to simultaneously evolve walking controllers based on AICO and the morphology of bipedal robots, such as a model of a planar walker.
This article was written with partial support by the European Union projects FP7-248311 (AMARSI) and FP7-506778 (PASCAL2). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
In this article the immediate cost function is composed of the intrinsic costs and the constraint costs: ct(xt, ut) + cp(xt, ut).
Appendix 1: Extension of the Approximate Inference Control Method
The original formulation of the AICO method  does not consider a linear term for the control costs. However, this is needed to encode torque limits, which are important for our dynamic balancing experiments, and hence we needed to extend the AICO algorithm.
The introduction of a linear term for the control costs yields not only a modified cost function, but also different update equations for the messages, and finally different equations for the optimal feedback controller. For completeness we will first recap the main steps in deriving the AICO method, and will then discuss the modifications to implement control constraints.
A1.1 Approximate Inference Control Without Torque Limits
A1.2 Approximate Inference Control with Torque Limits
In order to implement torque and joint limits, we introduce an additional cost function cp, which punishes the violation of the given constraints. The function cp is just added to the current immediate costs. We use separate cost terms for control constraints ctu and joint constraints ctx, that is, cp(xt, ut) = ctx (xt) + ctu(ut). Here, we will only discuss how to implement the function ctu(ut) for the torque constraints; joint constraints are implemented similarly.
Joint limits can be imposed similarly by using additional terms as costs for ctx(xt). However, for joint limits, the update equations stay the same, because P(zt = 1|xt) already has a nonzero mean, denoted by rt in Equation 9.
Similarly to Toussaint , we use an iterative message-passing approach where we approximate the nonlinear system by linear dynamics, quadratic costs, and a Gaussian noise (LQG) system at the new mode of the trajectory. In Toussaint , this is done by using a learning rate on the current modes of the belief. However, in contrast to Toussaint , we also need an estimate of the optimal action ut in order to impose the control constraints. Using a learning rate on the control action ut turned out to be very ineffective, because feedback is attenuated. For this reason we will instead use a learning rate on the feedback controller.
The complete message-passing algorithm, considering state and control constraints, is listed in Algorithm 1. This is a straightforward implementation of Gaussian message passing in linearized systems, similar to an extended Kalman smoother:
In Toussaint  and Rawlik et al. , more time-efficient methods are presented, where for each time step the belief is updated until convergence, in contrast to updating all messages and iterating until the intrinsic costs L(τ) converge. The computational benefits of such an approach still need to be evaluated for our messages.
Appendix 2: Four-Link Robot Model Specifications
For the four-link model, the eight-dimensional state is composed of the ankle, the knee, the hip, and the arm positions and velocities. In our experiments, the velocities are instantaneously affected by the applied force F . These initial velocities and the valid joint angle ranges are shown in Table 1.
|Ankle||+1.2 × 10−2F||−0.8||0.8|
|Knee||−7.4 × 10−2F||−0.05||2.5|
|Hip||+5.1 × 10−2F||−2.0||0.1|
|Arm||−4.2 × 10−2F||−0.6||3.0|
|Ankle||+1.2 × 10−2F||−0.8||0.8|
|Knee||−7.4 × 10−2F||−0.05||2.5|
|Hip||+5.1 × 10−2F||−2.0||0.1|
|Arm||−4.2 × 10−2F||−0.6||3.0|
In addition, we use multiple initial joint angles. The ankle angle is given by ϕ1 = −α, the knee angle by ϕ2 = −αf, the hip angle by ϕ3 = αf, and the arm joint angle by ϕ4 = 2αf, where α is sampled from and the linear factor is uniformly distributed with . Four example initial states are illustrated in Figure 2.
Institute for Theoretical Computer Science, Graz University of Technology, Inffeldgasse 16b/1, Graz, 8010, Austria. E-mail: firstname.lastname@example.org
Intelligent Autonomous Systems, Technische Universität Darmstadt, Hochschulstraße 10, 64289 Darmstadt, Germany. E-mail: email@example.com