Abstract

Embodiment has led to a revolution in robotics by not thinking of the robot body and its controller as two separate units, but taking into account the interaction of the body with its environment. By investigating the effect of the body on the overall control computation, it has been suggested that the body is effectively performing computations, leading to the term morphological computation. Recent work has linked this to the field of reservoir computing, allowing one to endow morphologies with a theory of universal computation. In this work, we study a family of highly dynamic body structures, called tensegrity structures, controlled by one of the simplest kinds of “brains.” These structures can be used to model biomechanical systems at different scales. By analyzing this extreme instantiation of compliant structures, we demonstrate the existence of a spectrum of choices of how to implement control in the body-brain composite. We show that tensegrity structures can maintain complex gaits with linear feedback control and that external feedback can intrinsically be integrated in the control loop. The various linear learning rules we consider differ in biological plausibility, and no specific assumptions are made on how to implement the feedback in a physical system.

1 Introduction

Embodiment has led to a revolution in robotics, and it encompasses much of the research on the nature of cognition [2]. The term embodiment has many definitions, but they share a common notion. By not thinking of the agent body and its controller as two separate units, but instead taking the interaction of the body with its environment into account, a more attuned sensory representation is generated. This in turn makes the task of complex control of locomotion easier. Indeed, the principle of embodiment implies, among other conclusions, that the direct physical interaction between the body and its environment is crucial for advanced cognitive processing by the agent. In this work, we will use a pragmatic viewpoint on embodiment and investigate the idea in particular by studying the computational powers of body dynamics.

Pfeifer and Bongard demonstrated the importance of embodiment in their book How the body shapes the way we think. In the first chapter of their book, they write [51, p. 19]:

First, embodiment is an enabler for cognition or thinking: in other words, it is a prerequisite for any kind of intelligence. So, the body is not something troublesome that is simply there to carry the brain around, but it is necessary for cognition.

Indeed, the body is not just the brain's interface to the world, but it is the combination of body and brain that defines an agent. Clark wrote in the introduction of his book Being there: Putting brain, body and world together again [14], which was largely influenced by Brooks' work on embodied robotics [8, 9]:

We ignored the fact that the biological brain is, first and foremost, an organ for controlling the biological body. Minds make motions, and they must make them fast—before the predator catches you, or before your prey gets away from you. Minds are not disembodied logical reasoning devices.

This work, published 10 years before Pfeifer's, in our opinion still tends to put the emphasis on a brain, but acknowledges that the body is an important factor in what defines an agent.

Although previous work on embodiment has demonstrated the importance of thinking about the agent as a unit consisting of body and brain, it is unclear what the body is contributing in a computational sense.

In this work, we will study an exemplar family of highly dynamic body structures, controlled by the simplest possible “brains.” While this could be seen as an example of an extreme form of morphological computation [49], we propose to interpret the results presented here more broadly, namely as a particular implementation of the general principle of computation by complex nonlinear dynamical systems with relatively simple adaptive controls, called physical reservoir computing (PRC).

By doing so, we clearly demonstrate that it is possible on the one end of the spectrum to have very static bodies and highly complex controllers (as in classical robotics), while on the other end we can have a highly dynamic body, being controlled by a very simple controller. Still, both can perform similar computations in the environment. By investigating these extremes, we clearly demonstrate the spectrum of choices of how control can be implemented in the body-brain composite. Figure 1 gives an overview of this spectrum. In this tradeoff between brain and body computation, there are many known intermediate results: Dead fish are propelled forward in a vortex [5, 37]; single-celled organisms (e.g., amoebae with pseudopodia) can be thought of as only computing using their body and chemical pathways, as they do not possess neural substrates [44, 76]; nematodes have rich motor patterns with only a few neurons (302 for the hermaphrodite C. elegans), and thus their locomotion largely depends on the shape of their body [18, 19]; the finger-tendon network in humans is responsible for a large part of the computational load required in fine control [66]; it was demonstrated that the locomotion pattern of decerebrated cats can autonomously tune itself to the body-environment interaction [4, 22, 70]; and so on.

Figure 1. 

The computational tradeoff. Compliance and underactuation provide additional freedom to the system, which in turn corresponds to computational power, which can be used to simplify the control problem.

Figure 1. 

The computational tradeoff. Compliance and underactuation provide additional freedom to the system, which in turn corresponds to computational power, which can be used to simplify the control problem.

Note that there are also important philosophical questions on the relationship between brain, body, and environment (see, e.g., [63, 71]). The viewpoint that these three are separate, albeit interacting, entities has been challenged by the notion of embodied cognition: Cognition is no longer seen as an exclusive property of the brain, and the functionalities traditionally attributed exclusively to the brain, such as memory or complex transformations of sensory inputs, seem to be performed by other parts of the body as well. Thus, the distinction between brain, body, and environment becomes blurry, and the classical modular view on the relation between these three slowly disappears.

Recently, Hauser et al. [25, 26] showed that spring-mass nets have universal computational power,1 providing a theoretical foundation for morphological computation, which in this setting is an instantiation of the field known as reservoir computing [34, 43, 68], the field that studies how generic dynamic systems can be used for universal computation.

Building on these findings, we show that morphological computation can be used to effectively control dynamic robot bodies. An overview of the setup can be seen in Figure 2. Note that in this work we basically focus on pattern generation through feedback to the body. In previous work [13], we also showed that it is possible to extract high-level environmental information, such as surface properties, directly and linearly from the state of the body. And this was possible even while generating locomotion patterns in parallel. This goes significantly further than the now classic demonstrations by Paul, who was the first to conceive simple robots that performed computation through the body [49]. Indeed, Paul also considered tensegrity structures for morphological computation, but the controller in her setup was still external [50].

Figure 2. 

Overview of the approach.

Figure 2. 

Overview of the approach.

The main goal of this article is threefold. First, we show that the results of Hauser et al. are not merely theoretical, and that compliant robots indeed have real computational power, which can be easily exploited using simple learning algorithms. Secondly, we use as an example computation the generation of cyclic motion patterns (similar to the patterns generated by central pattern generators, CPGs), and use this to achieve locomotion. By using the morphology to generate CPG-like signals, the design of the controller can be drastically simplified. Indeed, integrating sensor data into CPGs is not an easy problem, and by integrating the body dynamics into the control structure, the robot intrinsically synchronizes with properties in its environment. Finally, by using tensegrity structures, we provide an implementation of the general principle of PRC, which is very close to the pure mass-spring nets of Hauser et al., but is physically implementable, as the nets can be made freestanding.

The structure of this article is as follows. We first provide an overview of tensegrity structures, CPGs, and reservoir computing to make the article self-contained. We then introduce and compare three learning rules for learning CPG-like motor patterns with tensegrity structures. Next, we provide a set of example applications. We show that the gait can be modulated by changing the equilibrium length of a subset of springs, which can prove useful for training robots to adapt their gait to the terrain. We optimize gaits using an external controller and then learn the equivalent gait with morphological computation to show that the control can actually be outsourced to the body. In the same spirit, we show an example of the control of an end effector. Finally, we show empirically that the presented methods work over a large parameter space and in nonlinear regions when the structures are driven far from their equilibrium state.

2 Tensegrity Structures

Tensegrities are remarkable structures consisting of compressive elements connected through tensile elements only [21, 59].2 In this section we first introduce the dynamics of tensegrity structures. We then review some of the literature on tensegrities in different fields. In  Appendix 4 we explain how we defined the spring constants and equilibrium state of the structures used in this work.

2.1 Tensegrity Dynamics

Formally, we can define a tensegrity as a finite set of labeled points called nodes or endpoints [15]. Tensegrity structures, trusses, tensile structures, and the mass-spring nets studied by Hauser et al. [25, 26] are pin-jointed structures. Here we only consider bar-spring tensegrities, in which pairs of nodes can be connected by a member that is a bar or a spring. At the end of this section we will show that mathematically, mass-spring nets and tensegrity structures are similar, but with additional nonlinearities arising in tensegrities from inertia properties and the fixed bar lengths. Springs are members resisting only tensile forces if they are stretched beyond their equilibrium length. They do not resist compression (they go slack), and for simulation purposes we neglect their mass properties. Bars are members resisting both compressive and tensile force. They do not change length.

We assume the mass of each bar to be evenly distributed along its longitudinal axis. It is further assumed that the bar is infinitely thin and thus the inertia of a bar can be described by only taking the moment of inertia around an axis perpendicular to the longitudinal axis into account. If we place a reference frame at the center of mass of the bar, then the moment of inertia is j = ml2/12, where m is the mass of the bar, and l is the length of the bar.

In this work, we will only consider class 1 tensegrities [58], that is, pure tensegrities. This means that no two bars ever share a common node. Furthermore, each node needs to be attached to a bar. Hence, there are exactly n = 2b nodes with b the number of bars.

Three-bar tensegrity prisms are the simplest class 1 tensegrities, consisting of three bars and nine springs. These structures can be stacked to create snakelike structures by adding springs between the prisms. An example of such a structure is shown in Figure 3. The structure is prestressed and free-standing (it does not collapse under gravity).

Figure 3. 

A snake tensegrity robot in our simulator, made out of five stacked tensegrity prisms. In the electronic version, the red cylinders are bars (fixed length) resisting both tensile and compressive forces, the thick green lines are springs resisting tensile forces, and the thin purple lines are springs with varying equilibrium lengths (actuated).

Figure 3. 

A snake tensegrity robot in our simulator, made out of five stacked tensegrity prisms. In the electronic version, the red cylinders are bars (fixed length) resisting both tensile and compressive forces, the thick green lines are springs resisting tensile forces, and the thin purple lines are springs with varying equilibrium lengths (actuated).

Let us now define the dynamics of a class 1 tensegrity with stiff bars and springs. The primary purpose of this section is to show where the nonlinearities of the system arise from and which parameters need to be chosen when constructing tensegrity structures.

We will follow the description in [58].3 First note that because the springs only generate forces, but do not have mass, we only need to integrate the trajectories of the bars. One degree of freedom is lost for each bar, because the bar length is fixed. Hence the total number of degrees of freedom is 5b.

A description of the dynamics with a minimum number of coordinates is given in [72].4 Here, we use six generalized coordinates q = [rTbT]T per bar (Figure 4), as this simplifies the equations. The coordinate vector r is fixed to the center of mass of the bar, and b is a unit vector along the longitudinal axis of the bar, the direction of which can be chosen arbitrarily.

Figure 4. 

Non-minimal set of generalized coordinates for the description of a bar. The vector r points to the center of mass of the bar, and b lies along the longitudinal axis of the bar.

Figure 4. 

Non-minimal set of generalized coordinates for the description of a bar. The vector r points to the center of mass of the bar, and b lies along the longitudinal axis of the bar.

Let the Cartesian coordinates of all nodes be given by5
formula
The transformation from generalized coordinates to the Cartesian coordinates is now given by
formula
where Q = [r1rbb1bb] and Ψ is a square, invertible matrix. If we order the nodes so that node i is connected to node i + b through a bar, then Ψ has a convenient structure:
formula
where the diagonal matrix L contains the lengths of the bars.
As [58] shows, the generalized forces can now be obtained from the forces acting on the nodes through a linear transformation:
formula
formula
where W contains the external forces acting on the nodes. The matrix C ∈ {0, 1, −1}s×n is called the connectivity matrix and contains only ones, zeros, and minus ones. Here s is the number of springs, which is at least 3n/2 (each node is connected to at least three springs).
In this work, we only consider linear springs, and therefore the force densities λ can be written as
formula
where k is the spring constant of the spring, l0 the equilibrium length of the spring, and ni the ith column of N. Normally one should prevent the springs from going slack, as this risks collapsing the structure.
The external forces are due to ground collisions (modeled as explained in  Appendix 2) and damping:
formula
formula
The damping we consider acts only along the springs R, with a uniform damping coefficient ζ.
We then obtain the following matrix differential equation:
formula
where the mass matrix M is given by M = diag([m1mbj1jb]) (ji are the moments of inertia of the bars), and Ξ = diag([0 … 0 ξ1 … ξb]) is a diagonal matrix describing the rotational equations of motion and the constraints ∥b∥ = 1. These Lagrange multipliers are given by
formula
where Fq,(b+i) is the b + ith column of Fq.

The dynamics of a mass-spring net are obtained for Ψ = I, M, a diagonal matrix containing the weights of the point masses, and Ξ = 0. This points out that tensegrities behave similarly to spring-mass nets, but with additional nonlinearities arising from the rotational equations of motion and the bar constraints. We note that even with linear springs a three-dimensional mass-spring net will have nonlinear dynamics for nonzero equilibrium lengths. This is due to the nonlinearity of the Euclidean distances in Equation 6.

2.2 Related Work on Tensegrity Structures

Tensegrity structures have an architectural and artistic background. Most of the early research on these structures focused on finding stable configurations and describing their static properties [12, 15, 21, 23]. The result of this research is that a vast literature on typical configurations and their properties is now available [16, 47].

Much less literature is available on the dynamic properties of tensegrities. Motro [47] lists a few examples of actuated structures, and Sultan et al. [61] investigated the linearized dynamics. Skelton and de Oliveira [58] provide Lyapunov-function-based control techniques, but the practical use of their method may be limited to underactuated robotic platforms.

Paul et al. [50] were (to the best of our knowledge) the first to link tensegrity structures to the morphological computation domain. They evolved gaits for tensegrity prisms and discussed the robustness of these robotics systems to actuator failures. Rieffel et al. [53] went a step further by introducing morphological communication. In their work, independent controllers for parts of a tensegrity structure interact only through the dynamics of the structure, that is, the structure itself is used as a communication tool. More recently, Bliss [7] has shown an interesting example of taking the (linearized) dynamics into account while developing a CPG-based controller (see Section 3) for a tensegrity structure.

Our choice for tensegrity structures also has a biological inspiration. Ingber has done remarkable work on cellular mechanics based on tensegrity structures [31, 32, 69]. On this level, there is no neural control and the information exchange is chemical. Because the techniques we present in this work make no assumptions on the type of actuation or sensor feedback, they might be used as a tool to explain or study the fundamental mechanisms of cell movement and mechanotransduction [33].

3 Central Pattern Generators

CPGs are neural circuits typically found in the spine of vertebrates that generate rhythmic activation patterns without sensory feedback or higher-level control input [30]. Our prime goal is to show that a lot of computational power can be exploited in compliant structures. The more computations can be outsourced to the body, the less effort one needs to put into the construction of CPGs (for robotics applications) and the less external computational power is needed.

Robotic systems are not often as compliant as the ones we study here, and the available morphological computational power may be insufficient to allow for the desired behavior with a static linear feedback. We argue that one should, however, try to keep the body's dynamics as much (and as soon) as possible in the loop, to be able to exploit the morphological computational power. Indeed, in the compliant tensegrity structures we can go as far as leaving out the external CPG completely.

3.1 Matsuoka Oscillators

The type of nonlinear oscillator we consider in this work as a model CPG is called the Matsuoka oscillator [45, 46]. It is one of the most fundamental oscillator structures, based on a simple integrating neuron with fatigue. Its dynamics are given by (dropping the time indices)
formula
formula
formula
Here A is the matrix describing how the neurons are connected. It is typically sparse (Matsuoka mostly analyzed small regular connection patterns). The positive semidefinite connection matrix A was constructed similarly to the stress matrix of the tensegrity structure with the diagonal (self-feedback) removed. More precisely,
formula
formula
where C is the connectivity matrix as defined in Section 2.1. Hence, the neurons are connected in the same way as the springs connect the nodes of the tensegrity structures. The choice of this connection pattern was in a sense arbitrary. However, random connection patterns tend to generate chaotic trajectories, which are unwanted in this work.

The integrating neuron and the fatigue have time constants τ1 and τ2, respectively. The steady state firing rate of the neuron is determined by ι, and γ is called the impulse rate of the tonic or slowly varying input [45]. In this work, we keep these parameters constant, that is, the oscillator itself is not modulated. The parameters are ι = 1, τ1 = 0.5, τ2 = 5, and γ = 1. Figure 5 shows an example of CPG signals generated by the above procedure. There were a total of 12 dimensions (five shown), and the connection pattern was taken from the tensegrity icosahedron shown in Figure 6.

Figure 5. 

Sample Matsuoka oscillator signals. A linear combination of such signals is used as the CPG signal for robot locomotion. There were a total of 12 dimensions (of which five are shown) in the CPG of this example, and the connection pattern is taken from the tensegrity icosahedron (Figure 6).

Figure 5. 

Sample Matsuoka oscillator signals. A linear combination of such signals is used as the CPG signal for robot locomotion. There were a total of 12 dimensions (of which five are shown) in the CPG of this example, and the connection pattern is taken from the tensegrity icosahedron (Figure 6).

Figure 6. 

Overview of physical reservoir computing with compliant tensegrity structures. The thin green lines are passive springs connected to a sensor measuring the force and its derivative on the spring. The thick red lines are bars (noncompliant compressive members). The dotted lines are actuated springs, implemented as a passive spring connected to a motor modifying the equilibrium length of the spring. The (new) equilibrium length of the actuated springs is computed as a linear combination of the sensor values. There are no joints in the system, and the only constraints are the fixed bar lengths.

Figure 6. 

Overview of physical reservoir computing with compliant tensegrity structures. The thin green lines are passive springs connected to a sensor measuring the force and its derivative on the spring. The thick red lines are bars (noncompliant compressive members). The dotted lines are actuated springs, implemented as a passive spring connected to a motor modifying the equilibrium length of the spring. The (new) equilibrium length of the actuated springs is computed as a linear combination of the sensor values. There are no joints in the system, and the only constraints are the fixed bar lengths.

In Equation 12, yosc contains the firing rate of the neurons, and vosc models the fatigue. The firing rates yosc are the outputs of the oscillator, and these signals are used to construct the target motor signals. We resampled the output signals yosc so that the signals had the correct frequency for the experiment (normally 1 Hz). The desired output signals are random linear combinations of this N-dimensional signal yosc.

Based on the signals yosc, we construct target motor signals as a linear combination:
formula
In practice, we also add a constant bias variable to yosc. The matrix Wtarget is random (normally distributed values) in most of this work (i.e., we assume the desired CPG signal to be known), except in Section 5.2, in which we optimize the CPG signal. Both Wtarget and yosc are random. The fundamental difference is that yosc is generated by a random oscillator. These signals will take values between 0 and 1, while the motor signals will need a correct offset and amplitude. This problem is solved by Wtarget, which combines the signals from yosc into meaningful motor commands.

We chose the Matsuoka oscillator because of its simple structure, which can be chosen similarly to the connection pattern of the tensegrity structure itself. While we have not yet explored this path, we hope the morphological communication idea in Rieffel et al. [53] can be integrated in this way.

4 Physical Reservoir Computing

Classical techniques for training recurrent neural networks, such as backpropagation through time [55], approximate a desired output signal by modifying the internal weights of the neural network (as well as the readout weights, if any). This is often cumbersome and difficult to implement correctly, as one needs gradient information to apply the chain rule. Furthermore, backpropagation through time is prone to local minima.

Reservoir computing (RC) is a conceptually much simpler technique to train such recurrent neural networks [68]. Instead of modifying the internal weights, the original network is left as is, and only the readout weights are modified. The original network essentially becomes a computational black box. The outcome of this training procedure typically depends on a few parameters that define the regime of the neural network.

RC is known under different names, depending on the type of recurrent network that is trained. Most importantly, we distinguish liquid state machines [43] and echo state networks [34, 35]. The core idea of RC, originally applied only to neural networks, has since been extended to other nonlinear dynamical systems, leading to what we call PRC. There have been demonstrations of the RC approach applied to different domains such as photonics [67] and, more abstractly, electronics [3]. All these implementations share the common idea that a system with complex dynamics is perturbed externally but left untouched otherwise, and a simple readout mechanism is trained to perform the desired computational task. While the idea of PRC originated in the context of neural networks, recent theoretical results have extended the applicability of this computational framework immensely, showing that any dynamical system of a given size, obeying easily satisfied constraints, has the same computational power [17].

Let us start with the most common and straightforward implementation of RC, namely, echo state networks in combination with a linear readout layer.6 The discrete-time network dynamics are given by
formula
formula

There are two main applications of such a system. First, one can use it to approximate nonlinear filters by training the readout Wout. In this case, one normally scales Wres so that the system has the fading memory property (based on the spectral radius). Simply stated, this means that when the input is removed, the system dynamics will die out.

Secondly, RC can also be used to implement functions that do not necessarily have the fading memory property [42]. This can be achieved by feeding the output back into the system:
formula
formula
The feedback weights Wfb are typically chosen at random, and again, only Wout is trained. This system can be used to generate desired signals autonomously.

The first kind of task is clearly easier to train, as it is an open loop system. For signal generation tasks, small changes to the feedback weights can have a large influence. To imitate CPG signals with morphological computation, we need to consider the second approach.

Figure 6 shows how we used tensegrity structures for PRC. The force and its derivative on each spring are sampled and used as input x. The equilibrium length of a subset of the springs is chosen as feedback to the system. Differently from [25], we use only linear springs (Equation 6). In our experiments, the nonlinearities arising from the changing geometrical configuration and inertia are sufficient for good performance. We now define the system state (cf. Equation 19) for our setup:
formula
where f(t) are the spring forces measured at time t, Δ = 20 ms is the controller time step, and k is the number of delay steps. For the tensegrity icosahedron simulations, we used k = 9 (maximum delay of 200 ms) and k = 3 for the snake robots. The main rationale for this is that it allows the feedback to filter out noise due to ground collisions to some degree (by averaging over the delayed inputs).7 One can see from Figure 22 in Appendix 1 that the time-delayed sensor information is indeed highly correlated. Using Hooke's law and Equation 6, each element of f(t) can be written as
formula
We explicitly use the time index for the equilibrium lengths l0,e(t) because the tensegrity structures we consider contain springs with varying equilibrium lengths. We shall call the subset of the springs of which the equilibrium length l0,e can be modified either actuators, actuated springs, or motors. We call the subset of passive, fixed-equilibrium-length springs pas, and the subset of actuated springs act. is a constant vector defined by the equilibrium state of the structure. is time-varying and is given by
formula
Here lmax is the maximum change in equilibrium length of the springs (w.r.t. the initial lengths in ) allowed by the actuators. For this to work, we must have , with a being the number of actuated springs. Now y(t) will in general be a linear combination of x(t) and a constant bias input:
formula
The goal of most of the algorithms we will study is to optimize the matrix W.

In the experiments presented in this article, we used g(y(t)) = tanh(x). It is important to justify the use of a nonlinear function, as it can provide computational power (as in the RC approach). Therefore, we also tested the setup with a hard limit g(y(t)) = min(max(y, −1), 1) and with the identity function (no limit). Both cases provided quantitatively similar results to those presented in the experimental Section 5. The identity function was discarded, because it does not guarantee boundedness of the feedback, and spurious sensor data can make the structures collapse. In practice, we noticed that with the identity function, the structure would operate correctly for (say) 30 s after training and then collapse because of an extreme sensor value during a ground collision. As explained in  Appendix 2, ideal motors were assumed. However, a physical implementation will always be limited by the maximum offset of the motor, which validates the use of g(y(t)).

To conclude this section, we note that for our setup, x[k] from Equation 19 is replaced by sensor measurements from the tensegrity structure, and the output y(t) is a linear combination of these values. In contrast to the classic RC or echo state network implementation, the feedback enters the system through a physical modification of the system by modifying the equilibrium lengths of a set of actuated springs. The system itself is continuous-time, but the spring lengths are only updated at discrete time steps.

5 Experiments

The experimental section of this article consists of three parts. First, we introduce a set of algorithms to train tensegrity structures to produce rhythmic patterns. Next, we discuss possible applications for locomotion. We end with a comparison of different parameter combinations to study the importance of nonlinearities in the system.

5.1 Outsourcing Motor Pattern Generation

5.1.1 Recursive Least-Squares Approach

The first training algorithm we will consider is based on the recursive least squares (RLS) algorithm [36]. When the same samples are presented to the RLS algorithm, it will compute the same weights as batch linear regression (which we used in previous work [13] and is also used by [25]). The advantage of RLS is that it allows us to gradually transition from a completely teacher-forced structure (the desired signals are fed into the system) to a system generating its own control signals, and to restart training if needed.

There are two disadvantages in our opinion. First, one needs to update the matrix containing the covariances of all the input variables, which does not scale well.

The second and more fundamental disadvantage is the dependence on explicit knowledge of the target function, because one needs to know the difference (error) between the optimal motor signal and the current signal generated by the RLS algorithm. In a practical setting we do not always know the target signal and often only have some global performance measure at hand.

We now describe the RLS algorithm in detail. During training, the output signal is a mixture of the target output signal and the feedback output signal that is being trained. The influence of the target signal on the output signal is gradually reduced until the output signal is given by the trained feedback only:
formula
formula
At each time step the weights Wrls are updated using the RLS equations:
formula
formula
formula
formula

There is only a single parameter, namely the teacher-forcing decay time constant τrls. The covariance matrix Prls was initialized using the identity matrix. We note the difference from force learning [62], in which initially chaotic systems are used. The main reason for this is that tensegrity structures are inherently damped, and to create chaos one would need a feedback loop to drive the system. From a practical point of view this might be inefficient, as one would need additional actuators that were only used to keep the system active. In this sense, the RLS approach used here is closer to the teacher-forcing approach [74]. In this approach, the desired output is fed into the system during training, and the state of the system, x(t), is stored. Then, regression is used to approximate the desired output from the system state. Finally, during testing, the approximate output based on the system state is fed back into the system, and the system will generate the desired patterns autonomously. The testing phase is also called the free run, as the system is no longer forced by the external input.

The gradual change from teacher forcing to free run, used in this work, allows the structure to take over the control in a smooth way and to restart learning in a straightforward way. We noticed that the RLS algorithm becomes unstable if learning continues with αrls too low (viz., αrls ≲ 0.03). So we simply switch to free run when αrls drops below the threshold. The most likely explanation for the instability is that it is caused by the phase drift between the output and the teacher signal when the system is unforced.

A demonstration of the RLS approach is shown in Figure 7 and Figure 8. In this case, six actuators were used (i.e., six output dimensions). One can observe that the output signal gets out of phase with respect to the target signal, due to collisions with the ground. There is noise in the system, due to the control time step (20 ms) and the ground collision.

Figure 7. 

Demonstration of the RLS algorithm. Six outputs were trained for 250 s, followed by 150 s of testing. Shown is the output at the end of the testing phase. The dashed line is the target signal, which is generated as in Equation 20. The solid line is the output signal, which is sent to the actuators. Note that the phase of the target signal is not matched, but that the relative phase of the outputs is fixed. This effect is due to the tensegrity structure synchronizing with its ground collisions.

Figure 7. 

Demonstration of the RLS algorithm. Six outputs were trained for 250 s, followed by 150 s of testing. Shown is the output at the end of the testing phase. The dashed line is the target signal, which is generated as in Equation 20. The solid line is the output signal, which is sent to the actuators. Note that the phase of the target signal is not matched, but that the relative phase of the outputs is fixed. This effect is due to the tensegrity structure synchronizing with its ground collisions.

Figure 8. 

Demonstration of the RLS algorithm as in Figure 7. Shown are two output dimensions out of six in total during 20 s of testing. In the electronic version, the light line is the target signal, the dark line the output signal. Clearly the system has learned the attractor robustly. The small perturbations are mostly due to ground collisions.

Figure 8. 

Demonstration of the RLS algorithm as in Figure 7. Shown are two output dimensions out of six in total during 20 s of testing. In the electronic version, the light line is the target signal, the dark line the output signal. Clearly the system has learned the attractor robustly. The small perturbations are mostly due to ground collisions.

Figure 8 shows a phase portrait of two output signals from Figure 7. The output signals stay in phase with each other, which is important for locomotion. The RLS rule can capture the complex details of the target signals through the nonlinearities provided by the structure.

One might ask if the system is not simply vibrating along one of its shape modes. Such a result would not be useful, as for locomotion tasks we want the system to undergo large shape deformations. The shape modes of the tensegrity icosahedron that was used in this experiment (without the actuators) can be found in [23]. We show in Figure 9 that the answer is no. In this example we simulated a tensegrity in free fall to prevent collisions and again trained a random motor pattern with nine actuators. The complex trajectories of the endpoints of each bar are shown.

Figure 9. 

Complex motor motion patterns learned by the tensegrity structure, based on random CPG signals. In the electronic version, shown in blue are the trajectories of the endpoints of each bar. In the electronic version, the bars are red, the springs are green, and the actuated springs are dashed lines (nine actuators). Compare with the shape modes from (e.g.) [23].

Figure 9. 

Complex motor motion patterns learned by the tensegrity structure, based on random CPG signals. In the electronic version, shown in blue are the trajectories of the endpoints of each bar. In the electronic version, the bars are red, the springs are green, and the actuated springs are dashed lines (nine actuators). Compare with the shape modes from (e.g.) [23].

5.1.2 Gradient Descent Approach

To overcome the first disadvantage of the RLS algorithm, namely its complexity vis-à-vis its biological plausibility, we use stochastic gradient descent (GD) on the error signal. The following equation is obtained easily by differentiating the quadratic error at a time step. We can then replace the update of Wrls with
formula
Because the learning rate αgd has to be chosen small enough to prevent instability, the GD rule converges slower in practice than the RLS rule.

5.1.3 Eliminating the Teacher: Reward-Modulated Hebbian Approach

For various reasons one might prefer to use only a single reward signal instead of having an error function per output. We might not be able to conceive a suitable error function, when for example an error measure is only available at a non-actuated spring. It is also biologically more plausible to have only a limited number of reward signals. We will only consider instantaneous reward, but extensions are possible by keeping track of eligibility traces [20].

Note that the essence of the gradient rule is that the weight changes should follow the correlation between the input variables and the error signal. A reward can be interpreted as the inverse of an (absolute) error. The reward is large (in amplitude) when the error is small, and vice versa. So instead of doing gradient descent on the error signal, we can equivalently do gradient ascent on the reward signal.

The meaning of a large reward is ambiguous, because to replace erls(t) with some measure of the reward, we need the scalars in this vector to take positive values when reinforcing the weights to this output would increase the reward and vice versa, while a reward can have an arbitrary offset.8 So the trick is that we need to subtract the baseline performance from the reward, or, stated differently, we need to know how surprising a reward is [57].

We often do not know the reward signal explicitly. Hence, we cannot find an analytic form of the derivative of the reward. The trick to overcome this is to use finite differences to estimate the derivative of the reward. For this we add random noise to the output and observe changes in the reward signal. The reward signal is usually one-dimensional, so we need to find out which weight should be reinforced.

Legenstein et al. recently used a learning rule based on these observations for training large (compared to our tensegrity structures) neural networks [38]. In addition, they also assumed the noise signal to be unknown and estimated the noise from the system output. We assume the noise signal to be known and use a reward-modulated Hebbian learning rule similar to the one from [20, 40, 41]. Among these, [41] also provides non-Hebbian variations on this rule.

The reward-modulated Hebbian rule (RMH) we used is given by
formula
where R(t) is the (instantaneous) reward and is the short term average (baseline) of the reward. Here was computed by taking the average of the rewards during the last 100 ms. The reward R(t) should be a monotonic function of the error, that is, the reward should decrease if the error increases. For the injected noise ν(t), Gaussian white noise (GWN) was used with a decreasing standard deviation as a function of time. The noise ν(t) is not only used to update the weights, but also fed into the structure (y(t) = Wrmh(t − Δ)x(t) + ν(t)). Indeed, if this were not the case, one would need some critic that provides rewards based on hypothetical motor outputs. This rule reinforces weights according to how the reward and the inputs covary, which is why this rule is called Hebbian-like. It is similar to the classic Hebb rule, but for reward and neural activity [27].
Legenstein et al.'s EH rule is given by
formula
where approximates ν(t) under the assumption that y(t) varies smoothly.

The learning rule from Equation 32 or Equation 35 can be used in two ways. First, we can simply use it to replace the RLS or GD rule, when outsourcing the computation to the structure. In this case we still need some teacher to drive the system during learning, which limits its practical use, and it is more or less a replacement for the GD rule. Secondly, we can use it to train feedback without knowledge of the target signal at the neural level.

To apply RMH or similar techniques in a recurrent neural network, one typically starts from a chaotic network [28, 62], and the trained feedback drives the network toward a cyclic attractor. However, it is reasonable to assume that for robotics applications, chaotic movements might be undesired. Therefore, we took a slightly different training approach. We first trained the system (using RLS) to maintain an oscillatory pattern while noise was injected through additional actuators. Hence, we obtained robust but not chaotic patterns. There are, however, variations in the oscillations caused by the injected noise. Then, learning through RMH starts on the additional actuators.

One might argue that the use of RLS at this point negates the advantage of RMH. However, RLS is only used to keep the system active during RMH learning, and the target signals of RLS and RMH are independent (except for the fundamental frequency). A simple oscillator (e.g., a sine wave or coupled neurons) could also be used instead of a trained feedback controller. In a typical RC setup (with hyperbolic tangent neurons), it is possible to scale up the connection weights to start the learning process in a chaotic regime. In the case of tensegrity structures, we tried using a random feedback loop, which we then scaled to find a chaotic regime. Unfortunately, while doing this the structures often collapsed or did not stay active, and we thus concluded that this method would be cumbersome on a real platform.

The presented approach can be useful in robotic applications in which there is already some oscillatory behavior in the system. This can for example be generated by a very simple CPG signal. The RMH algorithm can then directly be applied, for example, to refine the motion. Hence, it is one possible application of the combination of a simplified CPG with our approach. The basic movements can also be provided by a controller based on linearized dynamics, where again RMH can be used to optimize the match between the actual plant and the linearized model.

In Figure 10, three major phases of training using the reward-based technique are shown. Here two feedback loops were trained using the RMH rule (out of a total of eight). RLS was used to train a random motion pattern (with the same frequency) on the first six outputs during 200 s (left graph). Then RMH learning starts, and initially the target signals are not at all matched. During training (middle graph), the outputs start to match the desired signal more closely, yet there is still some visible error. During testing (right graph), the noise source is disabled and the output almost exactly matches the desired signal. In this example, the tensegrity was in free fall to remove the disturbances from ground collisions so as to show that the desired signal can be closely matched.

Figure 10. 

The RMH algorithm during training and testing. Training of the two RMH feedback loops starts after 200 s of training with RLS to maintain activity in the system. The tensegrity was in free fall to clearly show the difference between the three phases without influence from ground collisions. A random six-bar tensegrity was used. The exploration noise decreased linearly as a function of time.

Figure 10. 

The RMH algorithm during training and testing. Training of the two RMH feedback loops starts after 200 s of training with RLS to maintain activity in the system. The tensegrity was in free fall to clearly show the difference between the three phases without influence from ground collisions. A random six-bar tensegrity was used. The exploration noise decreased linearly as a function of time.

Figure 11 shows a phase portrait of the two trained outputs during 40 s of testing, compared with the desired output. The target signal is almost perfectly matched. In Figure 12 it is shown how the RMH rule performs gradient ascent on the reward signal. The signals were smoothed over 2 s to show the evolution of the reward. The graph on the right shows the reward signal with the (estimated) baseline removed. For convergence, the (short-term) mean should approximate zero, as otherwise the magnitude of the weights will continue to rise or oscillate. For Figure 11 an informative reward was used, namely, the sum of absolute errors of both signals:
formula
Figure 11. 

Plot of the two trained outputs with the RMH algorithm. The system was trained for 2000 s. In the electronic version, yellow shows the target signal, and black shows the output signal during the last 40 s of testing.

Figure 11. 

Plot of the two trained outputs with the RMH algorithm. The system was trained for 2000 s. In the electronic version, yellow shows the target signal, and black shows the output signal during the last 40 s of testing.

Figure 12. 

The reward modulated Hebbian algorithm performing gradient ascent on the reward. The signals were smoothed by averaging over 2 s. From left to right: reward signal, reward signal minus its short term average, reward signal without exploration noise. The reward signal minus the short term average should approach 0 to ensure convergence of the weights.

Figure 12. 

The reward modulated Hebbian algorithm performing gradient ascent on the reward. The signals were smoothed by averaging over 2 s. From left to right: reward signal, reward signal minus its short term average, reward signal without exploration noise. The reward signal minus the short term average should approach 0 to ensure convergence of the weights.

Such an informative reward signal as in Equation 34 need not be available for the RMH or EH rule to work. In Figure 13, we applied a delta rule version of the EH rule (the noise is estimated) with a less informative reward signal:
formula
formula
The result is clearly less precise than with the RMH rule (Figure 11) and learning is slower. However, no knowledge of the noise is used in the learning rule, and the information contained in the reward signal is limited. The delta version can further reduce the computational/communication power needed, as only a single binary signal needs to be exchanged. In  Appendix 1, we provide a method to reduce the communication load from the sensors to the motors.
Figure 13. 

Plot of the two trained outputs with the EH rule and a less informative reward signal (Equation 36). Hence no knowledge of the target signal, the noise, or the precise reward is used. The system was trained for 3000 s. In the electronic version, yellow shows the target signal, and black shows the output signal during the last 40 s of testing.

Figure 13. 

Plot of the two trained outputs with the EH rule and a less informative reward signal (Equation 36). Hence no knowledge of the target signal, the noise, or the precise reward is used. The system was trained for 3000 s. In the electronic version, yellow shows the target signal, and black shows the output signal during the last 40 s of testing.

5.2 Applications

In this section, we present a set of practical applications of morphological computation in tensegrity structures. We first show that the structure can modulate its gait patterns when we change the equilibrium lengths of a few springs. Next we look at gait optimization. We optimize the gait pattern with an external controller and then outsource the resulting gait to our static, linear controller. Finally, we discuss a basic end-effector control application.

5.2.1 Modulating Motor Patterns

An important question is whether the trained tensegrity structures can react by adapting their gait to different configurations of the structure or, for example, to the slope of a hill. To test this, we added a single input signal to the system. This signal was fed into the tensegrity structure by modifying the equilibrium length of two actuated springs. The target motor patterns had to be modulated by the structure to linearly interpolate between two CPG patterns with the same frequency.

We again used the tensegrity icosahedron to show that such modulation is possible even in relatively small systems. Figure 14 shows a result from a run of the algorithm. We trained the system for only 400 s. At each time step, the system switches to another random input with probability 0.005. So the time between gait changes is variable. This also shows the robustness of the system, because accidental fast switches between input states disturb the system.

Figure 14. 

Modulating gait patterns through morphological computation. A single input signal was applied to the system by modifying the equilibrium length of two springs. The structure had to interpolate linearly between two CPG signals (three-dimensional) with the same fundamental frequency. The system was trained for 400 s with random inputs. The phase is not perfectly matched, because fast input changes disturb the system. Note that both the signal offset and shape are changed. A random tensegrity with six bars was used.

Figure 14. 

Modulating gait patterns through morphological computation. A single input signal was applied to the system by modifying the equilibrium length of two springs. The structure had to interpolate linearly between two CPG signals (three-dimensional) with the same fundamental frequency. The system was trained for 400 s with random inputs. The phase is not perfectly matched, because fast input changes disturb the system. Note that both the signal offset and shape are changed. A random tensegrity with six bars was used.

5.2.2 Gait Optimization

Gait optimization in robots is a complex problem, because small changes to, for example, the relative phase of two limbs or the duration of support phases can result in different locomotion patterns or failure in legged robots (see, e.g., [1] for reviews of animal gait patterns). For example, an animal typically positions its legs during locomotion to reduce the magnitude of joint moments and thus the required muscle forces [6].

Optimizing all aspects of gait properties is beyond the scope of this article. We assume the robot's configuration to be known, as well as the CPG frequency. Figure 15 gives an overview of the training procedure we will follow. Our goal will be to optimize the weights of the matrix Wtarget for a given basic CPG. We will then outsource the optimal gait to the structure with the RLS algorithm. As we will see, the gaits obtained using only morphological computation match those obtained during training. Thus, the structure can approximate the required motor patterns well enough for locomotion.

Figure 15. 

Overview of the training principle for gait optimization. We use CMA-ES to optimize the CPG pattern and then apply RLS to train a feedback to approximate this target pattern using morphological computation. If the robot has rich enough dynamics, the same gait will be obtained after outsourcing all the computations to the body.

Figure 15. 

Overview of the training principle for gait optimization. We use CMA-ES to optimize the CPG pattern and then apply RLS to train a feedback to approximate this target pattern using morphological computation. If the robot has rich enough dynamics, the same gait will be obtained after outsourcing all the computations to the body.

To optimize Wtarget we use the well-known CMA-ES algorithm [24]. The reason for this is that it is almost parameter free and has very good performance. The fitness function we use is simply the distance traveled by the center of mass of the tensegrity. Because of the compliance of the tensegrity structures, we do not need to include penalties, for example, for falling.

Figure 16 shows the trajectory of the center of mass of three different tensegrity structures. On the left, we see the tensegrity icosahedron with a number of additional springs. Remarkably, the gait was obtained after only 10 iterations of the CMA-ES algorithm. The population size was 50, and there were four actuators. The gait was evaluated over a period of 30 s. This means that only 4 h of exploration time would be necessary to obtain this locomotion pattern on a real robot.

Figure 16. 

Robot trajectories (center of mass) for three runs of the algorithm on different structures (see text). In the electronic version, in red (dotted) the trajectory during training; in blue the trajectory during testing. Morphological computation is powerful enough to maintain the same gait that was found by optimizing the external CPG.

Figure 16. 

Robot trajectories (center of mass) for three runs of the algorithm on different structures (see text). In the electronic version, in red (dotted) the trajectory during training; in blue the trajectory during testing. Morphological computation is powerful enough to maintain the same gait that was found by optimizing the external CPG.

The two other plots are from snakelike tensegrity structures that were constructed by stacking tensegrity prisms. Figure 17 shows the center structure in action, while Figure 3 is the tensegrity from the right structure in our simulator.

Figure 17. 

A complex snakelike structure controlled using morphological computation. Shown is the structure during locomotion with one of the found gait patterns. There are large shape deformations from the equilibrium state and a total of 20 actuators.

Figure 17. 

A complex snakelike structure controlled using morphological computation. Shown is the structure during locomotion with one of the found gait patterns. There are large shape deformations from the equilibrium state and a total of 20 actuators.

To show that the same gait is indeed maintained, we compared (Figure 18) the (vertical) ground reaction forces on the endpoints during training and testing of the large snakelike tensegrity from Figure 17. This system has 20 actuators in total. Due to the complexity of the structure, there is some variation in the ground reaction forces, but there is a clear pattern. The relative phase between the ground contacts is identical during training and testing. The training sample is taken from the beginning of the training (almost completely teacher forced), while the testing sample is from the end of testing (free run).

Figure 18. 

Vertical ground reaction forces during training (left) and testing (right) for the structure from Figure 17. Note the variation in the signals. The same gait is maintained during testing.

Figure 18. 

Vertical ground reaction forces during training (left) and testing (right) for the structure from Figure 17. Note the variation in the signals. The same gait is maintained during testing.

5.2.3 End-Effector Control

To end this applications section, we show that the same technique can also be used to control an end effector. The objective is now to control the position of the endpoint of a bar with respect to two other endpoints. For this we measure the length along two springs connecting the endpoint of the bar with the endpoints of the other bar. Imagine controlling the position of the wrist with respect to the shoulder.

We assume no model of the system is known and use CMA-ES to optimize Wtarget. The CPG has the same frequency as the target movement. Because the CPG has only a limited number of basis signals and the structure is underactuated, it is to be assumed that the target trajectory cannot be perfectly matched. In this example, we used a 30-dimensional CPG, based on a connection pattern from a stacked tensegrity prism.

To compute the fitness, we simulated the system for 100 s and computed the mean squared error over the last 80 s. The system was in free fall, and the springs along which we measured the position were not actuated. A tensegrity icosahedron with a total of 13 actuators (Figure 19) was used for this example (24 degrees of freedom, because the rigid body movements are ignored).

Figure 19. 

End-effector control in a tensegrity robot. In this example, we seek to control the lengths l1 and l2 of two springs, that is, the relative position of the endpoint of a bar (large dot) w.r.t. two other nodes (n1, n2) of the structure.

Figure 19. 

End-effector control in a tensegrity robot. In this example, we seek to control the lengths l1 and l2 of two springs, that is, the relative position of the endpoint of a bar (large dot) w.r.t. two other nodes (n1, n2) of the structure.

The result is shown in Figure 20. While the target trajectory cannot be perfectly matched (due to underactuation and the limitations of the CPG), the result is very encouraging. The end effector is part of the computational system itself, and the springs along which the position is measured influence the system as well. We only used 75 s of training, using RLS to transfer the control from the external CPG to morphological computation.

Figure 20. 

Trajectory of the end effector during testing after 75 s of training.

Figure 20. 

Trajectory of the end effector during testing after 75 s of training.

5.3 The Importance of Complex Dynamics

To complete this experimental section, we want to show that the nonlinearities can indeed improve the computational power of the system. Such a statement is of course task-dependent; for example, to generate sine waves, it is obviously not advantageous to have complex nonlinear dynamics in the system. We will again consider the generation of CPG-like signals based on the Matsuoka-type nonlinear oscillator in combination with the tensegrity icosahedron.

As Sultan et al. [61] indicate, the linearized dynamics of tensegrity deviate more from the nonlinear dynamics of the system at higher (generalized) velocities and lower pretension (i.e., when the system is more flexible or compliant). While typically one would restrict the velocities and deformations of the system so that the linearized dynamics are a good model of the system, our technique benefits from the opposite.

Many parameters of the structure can be tuned, and optimizing the configuration of the structure itself is a daunting task. In this section we only consider the importance of two parameters, the oscillator frequency and the maximal change of the actuator equilibrium length. One can easily define physically plausible regions of operations for both parameters (see  Appendix 2), and we would like to know if within these regions of operation, there are significant changes of computational performance.

The task we consider is again the simulation of 12-dimensional random Matsuoka-type nonlinear oscillators. The tensegrity icosahedron with a random number of actuators is used, varying from four to eight motors. We swept the frequency in steps of 0.1 Hz from 0.1 to 3 Hz. The maximum spring equilibrium offset (lmax) was varied in steps of 3.5 cm from 5.5 to 37 cm. For each tuple (frequency, distance), we performed 50 trials, for a total of 15,000 trials. We computed the normalized mean squared error, defined as
formula
where N is the number of samples, x is the vectorized output, and y is the vectorized target signal. For each set of 50 trials, only the best 30 are kept, to prevent failures (e.g., collapsing) from influencing the results. The results are shown in Figure 21.
Figure 21. 

Exploiting nonlinearity. Plots of the normalized mean squared error of the first 10 s of testing after training with RLS, as a function of the oscillator frequency and the maximum spring equilibrium offset. Top: contour plot showing the different regions. Bottom: result for each combination (frequency, maximum offset). The frequency is swept from 0.1 to 3 Hz in steps of 0.1 Hz, and the distance from 5.5 to 37 cm in steps of 3.5 cm. All tests are performed on the tensegrity icosahedron with randomly actuated spring connections (from four to eight motors). For each (frequency, distance) tuple, 50 trials were performed (15,000 total), of which the best 30 were retained, to reduce the influence of marginal cases. The target was a linear combination of random Matsuoka-type oscillators (12-dimensional). For the task at hand, the system clearly benefits from increasing the frequency of the oscillator and the maximum offset. Very good (computational) results are obtained for a region (bottom right) with only a small offset. This region might however not be suited for locomotion applications (because of limited shape changes).

Figure 21. 

Exploiting nonlinearity. Plots of the normalized mean squared error of the first 10 s of testing after training with RLS, as a function of the oscillator frequency and the maximum spring equilibrium offset. Top: contour plot showing the different regions. Bottom: result for each combination (frequency, maximum offset). The frequency is swept from 0.1 to 3 Hz in steps of 0.1 Hz, and the distance from 5.5 to 37 cm in steps of 3.5 cm. All tests are performed on the tensegrity icosahedron with randomly actuated spring connections (from four to eight motors). For each (frequency, distance) tuple, 50 trials were performed (15,000 total), of which the best 30 were retained, to reduce the influence of marginal cases. The target was a linear combination of random Matsuoka-type oscillators (12-dimensional). For the task at hand, the system clearly benefits from increasing the frequency of the oscillator and the maximum offset. Very good (computational) results are obtained for a region (bottom right) with only a small offset. This region might however not be suited for locomotion applications (because of limited shape changes).

So what can we learn from this? First, we see that for the task at hand, it is advantageous to work in a nonlinear region by increasing the frequency of the oscillator or the maximum spring equilibrium offset. It is important to note that although the frequency is a determining factor, the technique is not constrained to the natural frequency of the system. There is broad region of frequencies with similar performance. One might consider the bottom right region of operation, with only very small amplitudes. The practical use of this region is however limited, as the movement of the robot will be very limited.

On the other hand, going beyond the 30-cm range often causes instability (collapsing) and in practice will cause bars to collide. In practice, the performance will be restricted by a diagonal line going down from near the top left to the bottom right, because of practical limitations such as motor output power. So within the resulting region, better performance can be obtained by increasing the frequency or the maximum spring equilibrium offset.

Interestingly, for the lower frequency range (which might be interesting for energy efficiency reasons) it is advantageous to increase the maximum offset. Larger deformations of the structure cause the error to decrease.

6 Discussion

Compliant robots have been of interest to the robotics community for over a decade. We have seen many exciting examples of very simple control laws leading to complex behavior and of robustness against external perturbation. Compliance offers multiple advantages over classic, stiff robotics: It can allow for safer robot-human interactions, increased energy efficiency, robustness against external perturbations, and simplification of the control.

Notable examples of compliant robots that have very simple control laws are Puppy [29], Reservoir Dog [73], Wanda [78], and more recently, the one found in [52]. The control of the Reservoir Dog in irregular and unknown terrain was simply based on a sine wave with a different phase and offset for each leg, while a similar stiff robot would need complex sensory equipment and an elaborate controller [11].

In this work, we used tensegrity structures to model compliant systems. However, two important remarks need to be made.

First, the exact dynamics of the system need not be explicitly known to the learning algorithm. This is the underlying idea of reservoir computing: A dynamic system can be used as a computational black box, encoding a nonlinearly expanded history of environment interactions in the instantaneous state of the system. Such an abstraction has many advantages, as we can change substrates or construct hybrid systems while still using the same readout learning algorithms. It also does not define how the readout mechanism is actually implemented, and would allow, for example, a neural substrate, electrical wiring, or mechanical connections.

Secondly, we can exploit the fact that historically tensegrity structures have been used to model a plethora of complex systems from the micro to the macro scale. Even though tensegrity structures were initially used only in art and architecture, they have now also been successfully applied as a model for cellular cytoskeleton structures [31, 32, 69]. At the micro scale, the equations of motion are different and their exact form is often unknown, but we still find compressive elements (e.g., microtubules) and tensile elements (e.g., microfilaments). Inside a single-celled organism, there is no central nervous system, but chemical and mechanical interactions determine the cell's behavior, and flagella or cilia allow locomotion [39, 54, 64, 65]. Microorganisms such as nematodes are often capable of rich movement patterns and interaction with the environment while possessing only very simple nervous systems [18, 19]. Based on this, we can hypothesize that the results of our work could provide insight into the fundamental mechanism underlying how simple organisms can perform computations and locomotion required for their survival.

When taking a higher-level viewpoint on the nature of certain aspects of cognition and computation, our results can offer additional, empirically validated arguments in the quest for understanding cognition in biological organisms. Indeed, we have given several examples of systems in which the computational aspects of locomotion are for the most part physical in nature, making the structures discussed here prime examples of the idea of embodied cognition. Moreover, our analyses allow us to quantify the nature of the computation occurring in the substructures, most notably the controller and the physical system. This viewpoint is in our opinion applicable to many of the interactions between the body, sensory inputs, and early cognitive layers, but will probably not suffice to fully explain the complete array of cognitive capabilities of human-level intelligence.

When considering cognition as performing computation in the broadest sense, it is clear from our results that this computation is very much divided across the explicit linear control and the implicit nonlinear transformations of interactions with the environment, mediated by the physical properties of the structures. Indeed, the idea underlying the principle of PRC is precisely that the range of possible dynamical systems that can be used for computation is extremely broad, as are their properties regarding nonlinearity or memory. This is not merely a philosophical conjecture: A mathematical framework supporting this claim was recently introduced and proved in [17], showing that any dynamical system of a given size performs the same amount of computation, simply realizing different functions of its external perturbations. We would therefore propose that the question of the true seat of cognitive computation—mental or physical—is rather ill posed, and that the truth probably lies somewhere in between. Instead of viewing sensing and cognition as separate but linked entities, we propose that across organisms or even within a single organism, the distinction between mental cognition and strictly embodied cognition cannot be drawn and likely lies on a continuum.

7 Conclusions

In this work, we have introduced an extreme form of embodiment allowing for so-called physical reservoir computing in a very explicit sense. It was demonstrated by using highly dynamic and actuated tensegrity structures, effectively computing functions on the history of environmental interactions. This allows simple linear learning rules, with a varying degree of reward information, to be able to learn complex locomotion patterns or desired end-effector trajectories.

This provides a number of advantages from a robotic standpoint: The control complexity can be highly reduced, very uninformative reward signals can be used to train complex pattern generators, and the learned control law is robust to perturbations and can easily synchronize with environmental interactions.

But from a conceptual point of view, the conclusions are more profound. By demonstrating that dynamic “bodies” only require extremely simple “brains” to implement computations, we effectively opened up a whole spectrum of potential tradeoffs between brain-based computation and body-based computation. The powerful computational results from the field of reservoir computing [17, 26, 35, 43] can then be used to actually quantify and reason about the computations implemented by the physical body.

Acknowledgments

This research was funded by a Ph.D. fellowship of the Research Foundation—Flanders (FWO) and the European Community's Seventh Framework Programme FP7/2007-2013 Challenge 2 Cognitive Systems, Interaction, Robotics under grant agreement No. 248311—AMARSi. The authors would like to thank Robert Legenstein for providing the original reward-modulated Hebbian learning code from [38].

Notes

1 

With reasonable assumptions, they can be used to approximate any nonlinear filter with fading memory.

2 

There is some question whether Snelson [59] or Fuller [21] invented the tensegrity concept (see [47, p. 221] or [60] for a discussion).

3 

We have simplified the notation by only considering bars with uniform mass, described by their center of mass.

4 

We will also consider this description in  Appendix 3 for the feedback linearizability of a single bar attached to springs.

5 

We use the notation x to denote a scalar, x for a vector, and X for a matrix.

6 

Often a bias input is added, which makes the equations nonsymmetric. The nonlinearity tanh is most often used, but variants are possible.

7 

We also did a number of simulations with k = 0 (see also [13]) to verify that the system does not depend fundamentally on this delayed sensor information. This was indeed continued, but the ground collisions tend to render figures such as Figure 8 less intelligible.

8 

In fact this is not entirely true, because one can restrict the learning rules to reinforcements only, as in [28]; but we allow for both positive and negative weights here.

References

1. 
Alexander
,
R. M.
(
2003
).
Principles of animal locomotion.
Princeton, NJ
:
Princeton University Press
.
2. 
Anderson
,
M. L.
(
2003
).
Embodied cognition: A field guide.
Artificial Intelligence
,
149
(
1
),
91
130
.
3. 
Appeltant
,
L.
,
Soriano
,
M. C.
,
Van Der Sande
,
G.
,
Danckaert
,
J.
,
Massar
,
S.
,
Dambre
,
J.
,
Schrauwen
,
B.
,
Mirasso
,
C. R.
, &
Fischer
,
I.
(
2011
).
Information processing using a single dynamical node as complex system.
Nature Communications
,
2
(
13 September
),
468
.
4. 
Barbeau
,
H.
, &
Rossignol
,
S.
(
1987
).
Recovery of locomotion after chronic spinalization in the adult cat.
Brain Research
,
412
(
1
),
84
95
.
5. 
Beal
,
D. N.
,
Hover
,
F. S.
,
Triantafyllou
,
M. S.
,
Liao
,
J. C.
, &
Lauder
,
G. V.
(
2006
).
Passive propulsion in vortex wakes.
Journal of Fluid Mechanics
,
549
(
1
),
385
402
.
6. 
Biewener
,
A. A.
(
2003
).
Animal locomotion.
Oxford, UK
:
Oxford University Press
.
7. 
Bliss
,
T. K.
(
2011
).
Central pattern generator control of a tensegrity based swimmer.
Ph.D. thesis, University of Virginia
.
8. 
Brooks
,
R. A.
(
1990
).
Elephants don't play chess.
Robotics and Autonomous Systems
,
6
(
1–2
),
3
15
.
9. 
Brooks
,
R. A.
(
1991
).
Intelligence without representation.
Artificial Intelligence
,
47
(
1811
),
139
159
.
10. 
Brown
,
P. N.
,
Byrne
,
G. D.
, &
Hindmarsh
,
A. C.
(
1989
).
VODE: A variable-coefficient ODE solver.
SIAM Journal on Scientific and Statistical Computing
,
10
(
5
),
1038
.
11. 
Byl
,
K.
,
Shkolnik
,
A.
,
Prentice
,
S.
,
Roy
,
N.
, &
Tedrake
,
R.
(
2009
).
Reliable dynamic motions for a stiff quadruped.
In O. Khatib, V. Kumar, & G. Pappas (Eds.)
,
Experimental robotics
(pp.
319
328
).
Berlin
:
Springer
.
12. 
Calladine
,
C.
(
1978
).
Buckminster Fuller's tensegrity structures and Clerk Maxwell's rules for the construction of stiff frames.
International Journal of Solids and Structures
,
14
(
2
),
161
172
.
13. 
Caluwaerts
,
K.
, &
Schrauwen
,
B.
(
2011
).
The body as a reservoir: Locomotion and sensing with linear feedback.
In R. Pfeifer, H. Sumioka, R. M. Füchslin, H. Hauser, K. Nakajima, & S. Miyashit (Eds.)
,
2nd International Conference on Morphological Computation
(pp.
45
47
).
14. 
Clark
,
A.
(
1997
).
Being there: Putting brain, body and world together.
Cambridge, MA
:
MIT Press
.
15. 
Connelly
,
R.
(
1999
).
Tensegrity structures: Why are they stable.
In M. F. Thorpe & P. M. Duxbury (Eds.)
,
Rigidity theory and applications
(pp.
47
54
).
New York
:
Plenum Press
.
16. 
Connelly
,
R.
, &
Back
,
A.
(
1998
).
Mathematics and tensegrity.
American Scientist
,
86
(
2
),
142
151
.
17. 
Dambre
,
J.
,
Verstraeten
,
D.
,
Schrauwen
,
B.
, &
Massar
,
S.
(
2012
).
Information processing capacity of dynamical systems.
Scientific Reports
,
2
(
514
).
18. 
De Bono
,
M.
, &
Maricq
,
A. V.
(
2005
).
Neuronal substrates of complex behaviors in C. elegans.
Annual Review of Neuroscience
,
28
(
28
),
451
501
.
19. 
Ferrée
,
T. C.
, &
Lockery
,
S. R.
(
1999
).
Computational rules for chemotaxis in the nematode C. elegans.
Journal of Computational Neuroscience
,
6
(
3
),
263
277
.
20. 
Fiete
,
I. R.
, &
Seung
,
H. S.
(
2006
).
Gradient learning in spiking neural networks by dynamic perturbation of conductances.
Physical Review Letters
,
97
(
4
),
5
.
21. 
Fuller
,
R. B.
(
1975
).
Synergetics: Explorations in the geometry of thinking.
New York
:
Scribner
.
22. 
Grillner
,
S.
, &
Zangger
,
P.
(
1979
).
On the central generation of locomotion in the low spinal cat.
Experimental Brain Research
,
34
(
2
),
241
261
.
23. 
Guest
,
S. D.
(
2010
).
The stiffness of tensegrity structures.
IMA Journal of Applied Mathematics
,
76
(
1
),
57
66
.
24. 
Hansen
,
N.
(
2006
).
The CMA evolution strategy: A comparing review.
In J. A. Lozano, P. Larrañaga, I. Inza, & E. Bengoetxea (Eds.)
,
Towards a new evolutionary computation
(
Chap. 4
, pp.
75
102
).
Berlin
:
Springer
.
25. 
Hauser
,
H.
,
Ijspeert
,
A. J.
,
Füchslin
,
R. M.
,
Pfeifer
,
R.
, &
Maass
,
W.
(
2012
).
The role of feedback in morphological computation with compliant bodies.
Biological Cybernetics
,
106
,
595
613
.
26. 
Hauser
,
H.
,
Ijspeert
,
A. J.
,
Füchslin
,
R. M.
,
Pfeifer
,
R.
, &
Maass
,
W.
(
2012
).
Towards a theoretical foundation for morphological computation with compliant bodies.
Biological Cybernetics
,
105
,
355
370
.
27. 
Hebb
,
D. O.
(
1949
).
The organization of behavior.
New York
:
Wiley
.
28. 
Hoerzer
,
G. M.
,
Legenstein
,
R.
, &
Maass
,
W.
(
2012
).
Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning
Submitted to Cerebral Cortex
,
DOI 10.1093/cercor/bhs348
.
29. 
Iida
,
F.
, &
Pfeifer
,
R.
(
2006
).
Sensing through body dynamics.
Robotics and Autonomous Systems
,
54
(
8
),
631
640
.
30. 
Ijspeert
,
A. J.
(
2008
).
Central pattern generators for locomotion control in animals and robots: A review.
Neural Networks
,
21
(
4
),
642
653
.
31. 
Ingber
,
D. E.
(
1997
).
Tensegrity: The architectural basis of cellular mechanotransduction.
Annual Review of Physiology
,
59
(
1
),
575
599
.
32. 
Ingber
,
D. E.
(
2003
).
Tensegrity I. Cell structure and hierarchical systems biology.
Journal of Cell Science
,
116
(
7
),
1157
1173
.
33. 
Ingber
,
D. E.
(
2003
).
Tensegrity II. How structural networks influence cellular information processing networks.
Journal of Cell Science
,
116
(
8
),
1397
1408
.
34. 
Jaeger
,
H.
(
2001
).
The echo state approach to analysing and training recurrent neural networks).
(Technical Report GMD 148). German National Research Center for Information Technology
.
35. 
Jaeger
,
H.
, &
Haas
,
H.
(
2004
).
Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.
Science
,
304
(
5667
),
78
80
.
36. 
Kailath
,
T.
,
Sayed
,
A. H.
, &
Hassibi
,
B.
(
2000
).
Linear estimation.
Englewood Cliffs, NJ
:
Prentice Hall
.
37. 
Kanso
,
E.
, &
Newton
,
P. K.
(
2009
).
Passive locomotion via normal-mode coupling in a submerged spring-mass system.
Journal of Fluid Mechanics
,
641
,
205
.
38. 
Legenstein
,
R.
,
Chase
,
S. M.
,
Schwartz
,
A. B.
, &
Maass
,
W.
(
2010
).
A reward-modulated Hebbian learning rule can explain experimentally observed network reorganization in a brain control task.
Journal of Neuroscience
,
30
(
25
),
8400
8410
.
39. 
Lenaghan
,
S. C.
,
Davis
,
C. A.
,
Henson
,
W. R.
,
Zhang
,
Z.
, &
Zhang
,
M.
(
2011
).
High-speed microscopic imaging of flagella motility and swimming in Giardia lamblia trophozoites.
Proceedings of the National Academy of Sciences of the United States of America
,
108
(
34
),
E550
E558
.
40. 
Loewenstein
,
Y.
(
2008
).
Robustness of learning that is based on covariance-driven synaptic plasticity.
PLoS Computational Biology
,
4
(
3
),
10
.
41. 
Loewenstein
,
Y.
, &
Seung
,
H. S.
(
2006
).
Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.
Proceedings of the National Academy of Sciences of the United States of America
,
103
(
41
),
15224
15229
.
42. 
Maass
,
W.
,
Joshi
,
P.
, &
Sontag
,
E. D.
(
2007
).
Computational aspects of feedback in neural circuits.
PLoS Computational Biology
,
3
(
1
),
20
.
43. 
Maass
,
W.
,
Natschlager
,
T.
, &
Markram
,
H.
(
2002
).
Real-time computing without stable states: A new framework for neural computation based on perturbations.
Neural Computation
,
14
(
11
),
2531
2560
.
44. 
Mast
,
S. O.
(
1923
).
Mechanics of locomotion in amoeba.
Proceedings of the National Academy of Sciences of the United States of America
,
9
(
7
),
258
261
.
45. 
Matsuoka
,
K.
(
1985
).
Sustained oscillations generated by mutually inhibiting neurons with adaptation.
Biological Cybernetics
,
52
(
6
),
367
376
.
46. 
Matsuoka
,
K.
(
1987
).
Mechanisms of frequency and pattern control in the neural rhythm generators.
Biological Cybernetics
,
56
(
5–6
),
345
353
.
47. 
Motro
,
R.
(
2003
).
Tensegrity: Structural systems for the future.
Burlington, MA
:
Butterworth-Heinemann
.
48. 
Oja
,
E.
(
1982
).
A simplified neuron model as a principal component analyzer.
Journal of Mathematical Biology
,
15
(
3
),
267
273
.
49. 
Paul
,
C.
(
2006
).
Morphological computation: A basis for the analysis of morphology and control requirements.
Robotics and Autonomous Systems
,
54
(
8
),
619
630
.
50. 
Paul
,
C.
,
Roberts
,
J. W.
,
Lipson
,
H.
, &
Valero Cuevas
,
F. J.
(
2005
).
Gait production in a tensegrity based robot. In ICAR 05 Proceedings 12th International Conference on Advanced Robotics 2005
(pp.
216
222
).
51. 
Pfeifer
,
R.
, &
Bongard
,
J.
(
2007
).
How the body shapes the way we think: A new view of intelligence.
Cambridge, MA
:
MIT Press
.
52. 
Reis
,
M.
,
Maheshwari
,
N.
, &
Iida
,
F.
(
2011
).
Self-organization of robot walking and hopping based on free vibration.
In R. Pfeifer, H. Sumioka, R. M. Füchslin, H. Hauser, K. Nakajima, & S. Miyashit (Eds.)
,
2nd International Conference on Morphological Computation
(pp.
90
92
).
53. 
Rieffel
,
J. A.
,
Valero-Cuevas
,
F. J.
, &
Lipson
,
H.
(
2010
).
Morphological communication: Exploiting coupled dynamics in a complex mechanical structure to achieve locomotion.
Journal of the Royal Society Interface
,
7
(
45
),
613
621
.
54. 
Ringo
,
D. L.
(
1967
).
Flagellar motion and fine structure of the flagellar apparatus in Chlamydomonas.
The Journal of Cell Biology
,
33
(
3
),
543
571
.
55. 
Rumelhart
,
D. E.
,
Hinton
,
G. E.
, &
Williams
,
R. J.
(
1986
).
Learning internal representations by error propagation.
In D. E. Rumelhart & J. L. McClelland (Eds.)
,
Parallel distributed processing
,
Vol. 1
(
Chap. 8
, pp.
318
362
).
Cambridge, MA
:
MIT Press
.
56. 
Sanger
,
T.
(
1989
).
Optimal unsupervised learning in a single-layer linear feedforward neural network.
Neural Networks
,
2
(
6
),
459
473
.
57. 
Schultz
,
W.
(
2000
).
Multiple reward signals in the brain.
Nature Reviews Neuroscience
,
1
(
3
),
199
207
.
58. 
Skelton
,
R. E.
, &
de Oliveira
,
M. C.
(
2009
).
Tensegrity systems.
Berlin
:
Springer
.
59. 
Snelson
,
K.
(
1965
).
Continuous tension, discontinuous compression structures.
U.S. Patent 3,169,611
.
60. 
Sultan
,
C.
(
2009
).
Tensegrity: 60 years of art, science, and engineering.
In E. van der Giessen & H. Aref (Eds.)
,
Advances in applied mechanics
(
Chap. 2
, pp.
69
145
).
Burlington, MA
:
Elsevier
.
61. 
Sultan
,
C.
,
Corless
,
M.
, &
Skelton
,
R. E.
(
2002
).
Linear dynamics of tensegrity structures.
Engineering Structures
,
24
(
6
),
671
685
.
62. 
Sussillo
,
D.
, &
Abbott
,
L. F.
(
2009
).
Generating coherent patterns of activity from chaotic neural networks.
Neuron
,
63
(
4
),
544
557
.
63. 
Tani
,
J.
(
1998
).
An interpretation of the “self” from the dynamical systems perspective: A constructivist approach.
Journal of Consciousness Studies
,
5
(
5–6
),
516
542
.
64. 
Terashima
,
H.
,
Kojima
,
S.
, &
Homma
,
M.
(
2008
).
Flagellar motility in bacteria structure and function of flagellar motor.
International Review of Cell and Molecular Biology
,
270
(
08
),
39
85
.
65. 
Thormann
,
K. M.
, &
Paulick
,
A.
(
2010
).
Tuning the flagellar motor.
Microbiology
,
156
(
Pt 5
),
1275
1283
.
66. 
Valero-Cuevas
,
F. J.
,
Yi
,
J.-W.
,
Brown
,
D.
,
McNamara
,
R. V.
,
Paul
,
C.
, &
Lipson
,
H.
(
2007
).
The tendon network of the fingers performs anatomical computation at a macroscopic scale.
IEEE Transactions on Biomedical Engineering
,
54
(
6, Pt 2
),
1161
1166
.
67. 
Vandoorne
,
K.
,
Dambre
,
J.
,
Verstraeten
,
D.
,
Schrauwen
,
B.
, &
Bienstman
,
P.
(
2011
).
Parallel reservoir computing using optical amplifiers.
IEEE Transactions on Neural Networks
,
22
(
9
),
1469
1481
.
68. 
Verstraeten
,
D.
,
Schrauwen
,
B.
,
D'Haene
,
M.
, &
Stroobandt
,
D.
(
2007
).
2007 special issue: An experimental unification of reservoir computing methods.
Neural Networks
,
20
(
3
),
391
403
.
69. 
Wang
,
N.
,
Tytell
,
J. D.
, &
Ingber
,
D. E.
(
2009
).
Mechanotransduction at a distance: Mechanically coupling the extracellular matrix with the nucleus.
Nature Reviews Molecular Cell Biology
,
10
(
1
),
75
82
.
70. 
Whelan
,
P. J.
(
1996
).
Control of locomotion in the decerebrate cat.
Progress in Neurobiology
,
49
(
5
),
481
515
.
71. 
Wilson
,
M.
(
2002
).
Six views of embodied cognition.
Psychonomic Bulletin and Review
,
9
(
4
),
625
636
.
72. 
Wroldsen
,
A. S.
(
2007
).
Modelling and control of tensegrity structures.
Ph.D. thesis, Norwegian University of Science and Technology
.
73. 
Wyffels
,
F.
,
D'Haene
,
M.
,
Waegeman
,
T.
,
Caluwaerts
,
K.
,
Nunes
,
C.
, &
Schrauwen
,
B.
(
2010
).
Realization of a passive compliant robot dog.
In M. Mitsuishi & M. G. Fujie (Eds.)
,
Proceedings of the 3rd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechanics
(pp.
882
886
).
74. 
Wyffels
,
F.
,
Schrauwen
,
B.
, &
Stroobandt
,
D.
(
2008
).
Stable output feedback in reservoir computing using ridge regression.
In
International Conference on Artificial Neural Networks ICANN
(pp.
808
817
).
75. 
Yamane
,
K.
, &
Nakamura
,
Y.
(
2006
).
Stable penalty-based model of frictional contacts.
In
Proceedings of the IEEE International Conference on Robotics and Automation, ICRA
(pp.
1904
1909
).
76. 
Yanai
,
M.
,
Kenyon
,
C. M.
,
Butler
,
J. P.
,
Macklem
,
P. T.
, &
Kelly
,
S. M.
(
1996
).
Intracellular pressure is a motive force for cell motion in Amoeba proteus.
Cell Motility and the Cytoskeleton
,
33
(
1
),
22
29
.
77. 
Zhang
,
J.
, &
Ohsaki
,
M.
(
2006
).
Adaptive force density method for form-finding problem of tensegrity structures.
International Journal of Solids and Structures
,
43
(
18–19
),
5658
5673
.
78. 
Ziegler
,
M.
,
Iida
,
F.
, &
Pfeifer
,
R.
(
2006
).
“Cheap” underwater locomotion: Roles of morphological properties and behavioural diversity.
In
International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines
.

Appendix 1: Reducing Communication Load

The GD rule is nothing more than a simplification of the RLS approach to overcome updating a covariance matrix, at the cost of slower convergence and an additional parameter (the learning rate). This can be easily seen by replacing Prls with the identity matrix. We then have
formula
which is exactly the GD rule with a varying learning rate of ∥x(t)∥−2.

Thus, the GD rule is almost equivalent to the RLS rule for uncorrelated input variables. This also indicates how RLS can be implemented in a biologically plausible way. By adding a (linear) layer to project the inputs on their principal components, one obtains uncorrelated input variables. The generalized Hebbian algorithm (GHA) can be used to implement this [56]. Therefore, as long as an error signal is available for each motor signal, both RLS and GD are viable options that are straightforward to implement.

The GHA is an extension of Oja's rule [48] to multiple output networks. In fact, it can be seen as a stacked version of Oja's rule. From each additional output, all previous principal components are extracted. Oja's rule itself is simply a stabilized version of the classic Hebbian learning rule, and it is obtained after doing a first-order Taylor approximation of the Hebbian rule with normalization to keep the norms of the weights equal to one.

In matrix from the GHA algorithm can be written as
formula
formula
where LT(X) sets all elements on or above the diagonal to zero; hence it removes the (estimated) principal components from the previous outputs.

One application of the additional decorrelating layer is that it can be used to reduce the number of signals to be communicated. Indeed, the first principal components will extract the most fundamental properties of the signals in the system. Hence, it provides a natural way of forcing the RMH-EH algorithm to focus on the first principal components. This is due to two reasons. First, the first principal components will typically contain simpler signals. Secondly, the first principal components are more stable; hence they will tend to be reinforced more than fluctuating inputs.

Figure 22 shows information on the GHA layer when used in combination with RMH (in the loop). Shown are the correlation matrices of the raw sensor date (spring forces and their derivatives), the input to the GHA layer (10-time-steps-delayed raw sensor data) and the correlation of the GHA output trained with 100 outputs. Although the GHA output is not completely decorrelated (this can be improved by increasing the training time), the correlation between variables is much less pronounced than before.

Figure 22. 

Absolute values of the correlation of the input data without and with the decorrelating GHA layer. Left: correlation of the inputs after the GHA layer. One hundred outputs were trained. Center: correlation of the sensor data (spring forces and derivatives of spring forces). Right: correlation of the original input data with 10 delay lines.

Figure 22. 

Absolute values of the correlation of the input data without and with the decorrelating GHA layer. Left: correlation of the inputs after the GHA layer. One hundred outputs were trained. Center: correlation of the sensor data (spring forces and derivatives of spring forces). Right: correlation of the original input data with 10 delay lines.

Figure 23 shows the weights distribution after training with GHA combined with RMH. The same attractor from Figure 11 was trained, with a similar result. The weights from the first principal components have a larger magnitude.

Figure 23. 

A possible application of the GHA layer: reducing the communication needs. Shown is the weight distribution of the two outputs as in Figure 11, but this time combined with GHA. Because the first principal components are more stable and extract the lowest-frequency signals, the RMH algorithm will tend to assign larger weights to them.

Figure 23. 

A possible application of the GHA layer: reducing the communication needs. Shown is the weight distribution of the two outputs as in Figure 11, but this time combined with GHA. Because the first principal components are more stable and extract the lowest-frequency signals, the RMH algorithm will tend to assign larger weights to them.

Appendix 2: Simulation Details

We simulated the tensegrity systems by vectorizing the differential algebraic equation description (see Section 2). The VODE solver from [10] was used with a variable time step. Ground collisions were modeled as external forces acting on the endpoints of the rods. The reaction forces were computed as in [75]. Internal (bar-bar) collisions were not taken into account.

We assumed ideal motors (instant change of spring equilibrium length), but limited their range and velocity. More precisely, each motor could change the spring length by at most 0.3 m at 0.3 m/s. For the tensegrity icosahedron, we normally actuated nonstructural springs (i.e., the structure remains stable without these springs); hence the motors need not be actuated at rest. The actuator springs (for the icosahedron) had an equilibrium length equal to the distance between their endpoints with the actuator springs removed. The magnitude of the force on the springs was usually <10 N. We assumed that the rods weighed 0.4 kg/m. The rods had lengths between 0.2 and 1 m, depending on the configuration (around 0.5 m for the icosahedron, which had a mass of about 1.4 kg).

That these are reasonable conditions can be seen by computing the required motor power. Assuming a constant speed of 0.3 m/s with an applied load of 10 N, a motor power of 3 W is needed. Lightweight (<50 g) DC motors are available in this range. Small lithium polymer batteries can deliver enough power and energy to drive such motors over longer periods of time.

A controller frequency of 50 Hz was used to prevent stringent communication requirements. We can estimate the total bandwidth required, for example, for the tensegrity icosahedron. Spring forces can be measured using strain gages. ADCs in this frequency range are commonly available at low cost with a precision up to 24 bits (3 bytes). The icosahedron has 24 springs in its minimal configuration, and 60 when fully connected. Hence, the minimal communication bandwidth is (2 springs × 3 bytes/spring + overhead) × 50 Hz (spring force and derivative). We assume broadcasting is used, as the algorithms can be implemented locally on each rod. The overhead can contain the target signal if required. Let us assume an overhead of 5 bytes per motor; then we have (2 springs × 3 bytes/spring + 1 motor × 5 bytes/motor) × 50 Hz. Hence, for the icosahedron in its minimal configuration with 10 actuators (the forces on the actuator springs are not used), we obtain a bandwidth of only 9.7 KB/s.

Appendix 3: Feedback Linearizability of a Single Bar

For the sake of completeness, we investigate the feedback linearizability of a bar attached to springs. Consider a single bar attached to three springs at each outer end. This is a minimal assumption on freestanding tensegrity structures (each endpoint needs at least three attached springs). Each of the springs is fixed to the rod at one end and to a fixed location at the other.

The force on an endpoint due to a spring is given by
formula
where we have assumed that the springs are always in tension. Here u is the transformed control input. Assuming ideal motors, such an input always exists when the springs are strictly in tension. Note that in practice this can easily be implemented, as the spring length can be measured by a force sensor on the spring.
The force on an endpoint is simply the sum of the forces due to the three attached springs:
formula
Assuming ui to be unrestricted, the column space of the matrix [np0np1np2] is . This is trivial: As long as n does not lie on on the plane defined by the endpoints of the springs, it is fulfilled.
We will again assume the bar to be infinitely thin and the mass to be evenly distributed along the bar, and follow the description from [72]. Now define a minimal set of generalized coordinates:
formula
where r is the center of mass of the bar, and the orientation of the bar (around two orthogonal axes).
Now we define the matrix K:
formula
This matrix is singular for ϕ ∈ {0, kπ}.
Now we define the vector :
formula
The inertia matrix J is given by
formula
where .
We define the mass matrix as
formula
which is trivially positive definite, and the following matrix:
formula
Finally, the equations of motion are given by
formula
or, if H(q) is nonsingular,
formula
If we can show that the generalized forces fq (q) can take any value in , then the system is feedback linearizable with the condition ϕ ∈]0, π[, because with the change of variable fq(q) = we get
formula
Wroldsen shows that the generalized forces can be written as
formula
where Φ(q) is a matrix containing the partial derivatives of the nodal coordinate vector [n0Tn1T] w.r.t. the generalized coordinates q. This Jacobian matrix is given by
formula
Under the ϕ ∈]0, π[ constraint, Φ(q)T has full rank and thus we can always find values fn.

Appendix 4: Form-Finding

Form-finding of tensegrity structures is not an easy problem, and depending on the requirements (e.g., symmetry), different solutions are available. Let us first define the precise problem of interest. In this work, we assume the connectivity of the structure to be known. This means that the connectivity matrix C (Equation 5) is known as well as the bar connectivity. The bar connectivity is stored in a matrix B constructed like the spring connectivity matrix C. For the class 1 tensegrities studied in this work, B can trivially be rewritten (by reordering the nodes) as
formula
The problem we face is to find an equilibrium state for a structure with given matrices B and C, that is, we want to know the positions N of the nodes and the force density diagonal matrix Λ such that
formula
formula
for some diagonal matrix Γ. This simply means that without external forces, the forces due to the bars and springs are balanced. As we assume the bars to have fixed lengths and to have infinite tensile strength, this results in a net acceleration 0 of the nodes in this equilibrium configuration. If the bars do not have infinite tensile strength, then Γ depends on the Young's modulus of the bars, which is the common approach in the literature.
An important question is when this equilibrium is stable and nondegenerate. To prevent degeneracy we must ensure that the solution will not lie on a plane or a line. An equilibrium configuration does not guarantee that the potential energy has a local minimum. Stability can be investigated (up to second order) using the tangent stiffness matrix
formula
where f contains the forces acting on the nodes. If the tangent stiffness matrix K is positive semidefinite, then the structure will be stable.

The problem is now to find the force densities Λ and Γ to ensure that K is positive semidefinite. We applied the technique from Zhang and Ohsaki [77], which starts from a given connection matrix but without knowledge of the force densities and then iteratively updates an initial estimate of the force densities. This algorithm finds structures with 12 free variables (in three dimensions) for a given set of force densities. We then used CMA-ES to optimize these variables to maximize the minimum bar-to-bar distance.

Note that in [77], struts are assumed instead of bars. Hence, after form-finding, we replace the struts with bars, which does not change the equilibrium state.

Author notes

Contact author.

∗∗

Reservoir Lab, Electronics and Information Systems Department, Ghent University, Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium. E-mail: ken.caluwaerts@ugent.be (K.C.); michiel.dhaene@ugent.be (M.D.); david.verstraeten@ugent.be (D.V.); benjamin.schrauwen@ugent.be (B.S.)