## Abstract

Embodiment has led to a revolution in robotics by not thinking of the robot body and its controller as two separate units, but taking into account the interaction of the body with its environment. By investigating the effect of the body on the overall control computation, it has been suggested that the body is effectively performing computations, leading to the term *morphological computation*. Recent work has linked this to the field of *reservoir computing*, allowing one to endow morphologies with a theory of universal computation. In this work, we study a family of highly dynamic body structures, called *tensegrity structures*, controlled by one of the simplest kinds of “brains.” These structures can be used to model biomechanical systems at different scales. By analyzing this extreme instantiation of compliant structures, we demonstrate the existence of a spectrum of choices of how to implement control in the body-brain composite. We show that tensegrity structures can maintain complex gaits with linear feedback control and that external feedback can intrinsically be integrated in the control loop. The various linear learning rules we consider differ in biological plausibility, and no specific assumptions are made on how to implement the feedback in a physical system.

## 1 Introduction

Embodiment has led to a revolution in robotics, and it encompasses much of the research on the nature of cognition [2]. The term *embodiment* has many definitions, but they share a common notion. By not thinking of the agent body and its controller as two separate units, but instead taking the interaction of the body with its environment into account, a more attuned sensory representation is generated. This in turn makes the task of complex control of locomotion easier. Indeed, the principle of embodiment implies, among other conclusions, that the direct physical interaction between the body and its environment is crucial for advanced cognitive processing by the agent. In this work, we will use a pragmatic viewpoint on embodiment and investigate the idea in particular by studying the computational powers of body dynamics.

Pfeifer and Bongard demonstrated the importance of embodiment in their book *How the body shapes the way we think*. In the first chapter of their book, they write [51, p. 19]:

First, embodiment is an enabler for cognition or thinking: in other words, it is a prerequisite for any kind of intelligence. So, the body is not something troublesome that is simply there to carry the brain around, but it is necessary for cognition.

*Being there: Putting brain, body and world together again*[14], which was largely influenced by Brooks' work on embodied robotics [8, 9]:

We ignored the fact that the biological brain is, first and foremost, an organ for controlling the biological body. Minds make motions, and they must make them fast—before the predator catches you, or before your prey gets away from you. Minds are not disembodied logical reasoning devices.

Although previous work on embodiment has demonstrated the importance of thinking about the agent as a unit consisting of body and brain, it is unclear what the body is contributing in a *computational* sense.

In this work, we will study an exemplar family of highly dynamic body structures, controlled by the simplest possible “brains.” While this could be seen as an example of an extreme form of *morphological computation* [49], we propose to interpret the results presented here more broadly, namely as a particular implementation of the general principle of computation by complex nonlinear dynamical systems with relatively simple adaptive controls, called *physical reservoir computing* (PRC).

By doing so, we clearly demonstrate that it is possible on the one end of the spectrum to have very static bodies and highly complex controllers (as in classical robotics), while on the other end we can have a highly dynamic body, being controlled by a very simple controller. Still, both can perform similar computations in the environment. By investigating these extremes, we clearly demonstrate the spectrum of choices of how control can be implemented in the body-brain composite. Figure 1 gives an overview of this spectrum. In this tradeoff between brain and body computation, there are many known intermediate results: Dead fish are propelled forward in a vortex [5, 37]; single-celled organisms (e.g., amoebae with pseudopodia) can be thought of as only computing using their body and chemical pathways, as they do not possess neural substrates [44, 76]; nematodes have rich motor patterns with only a few neurons (302 for the hermaphrodite *C. elegans*), and thus their locomotion largely depends on the shape of their body [18, 19]; the finger-tendon network in humans is responsible for a large part of the computational load required in fine control [66]; it was demonstrated that the locomotion pattern of decerebrated cats can autonomously tune itself to the body-environment interaction [4, 22, 70]; and so on.

Note that there are also important philosophical questions on the relationship between brain, body, and environment (see, e.g., [63, 71]). The viewpoint that these three are separate, albeit interacting, entities has been challenged by the notion of embodied cognition: Cognition is no longer seen as an exclusive property of the brain, and the functionalities traditionally attributed exclusively to the brain, such as memory or complex transformations of sensory inputs, seem to be performed by other parts of the body as well. Thus, the distinction between brain, body, and environment becomes blurry, and the classical modular view on the relation between these three slowly disappears.

Recently, Hauser et al. [25, 26] showed that spring-mass nets have universal computational power,^{1} providing a theoretical foundation for morphological computation, which in this setting is an instantiation of the field known as *reservoir computing* [34, 43, 68], the field that studies how generic dynamic systems can be used for universal computation.

Building on these findings, we show that morphological computation can be used to effectively control dynamic robot bodies. An overview of the setup can be seen in Figure 2. Note that in this work we basically focus on pattern generation through feedback to the body. In previous work [13], we also showed that it is possible to extract high-level environmental information, such as surface properties, directly and linearly from the state of the body. And this was possible even while generating locomotion patterns in parallel. This goes significantly further than the now classic demonstrations by Paul, who was the first to conceive simple robots that performed computation through the body [49]. Indeed, Paul also considered tensegrity structures for morphological computation, but the controller in her setup was still external [50].

The main goal of this article is threefold. First, we show that the results of Hauser et al. are not merely theoretical, and that compliant robots indeed have real computational power, which can be easily exploited using simple learning algorithms. Secondly, we use as an example computation the generation of cyclic motion patterns (similar to the patterns generated by central pattern generators, CPGs), and use this to achieve locomotion. By using the morphology to generate CPG-like signals, the design of the controller can be drastically simplified. Indeed, integrating sensor data into CPGs is not an easy problem, and by integrating the body dynamics into the control structure, the robot intrinsically synchronizes with properties in its environment. Finally, by using tensegrity structures, we provide an implementation of the general principle of PRC, which is very close to the pure mass-spring nets of Hauser et al., but is physically implementable, as the nets can be made freestanding.

The structure of this article is as follows. We first provide an overview of tensegrity structures, CPGs, and reservoir computing to make the article self-contained. We then introduce and compare three learning rules for learning CPG-like motor patterns with tensegrity structures. Next, we provide a set of example applications. We show that the gait can be modulated by changing the equilibrium length of a subset of springs, which can prove useful for training robots to adapt their gait to the terrain. We optimize gaits using an external controller and then learn the equivalent gait with morphological computation to show that the control can actually be outsourced to the body. In the same spirit, we show an example of the control of an end effector. Finally, we show empirically that the presented methods work over a large parameter space and in nonlinear regions when the structures are driven far from their equilibrium state.

## 2 Tensegrity Structures

Tensegrities are remarkable structures consisting of compressive elements connected through tensile elements only [21, 59].^{2} In this section we first introduce the dynamics of tensegrity structures. We then review some of the literature on tensegrities in different fields. In Appendix 4 we explain how we defined the spring constants and equilibrium state of the structures used in this work.

### 2.1 Tensegrity Dynamics

Formally, we can define a tensegrity as a finite set of labeled points called nodes or endpoints [15]. Tensegrity structures, trusses, tensile structures, and the mass-spring nets studied by Hauser et al. [25, 26] are pin-jointed structures. Here we only consider bar-spring tensegrities, in which pairs of nodes can be connected by a member that is a bar or a spring. At the end of this section we will show that mathematically, mass-spring nets and tensegrity structures are similar, but with additional nonlinearities arising in tensegrities from inertia properties and the fixed bar lengths. Springs are members resisting only tensile forces if they are stretched beyond their equilibrium length. They do not resist compression (they go slack), and for simulation purposes we neglect their mass properties. Bars are members resisting both compressive and tensile force. They do not change length.

We assume the mass of each bar to be evenly distributed along its longitudinal axis. It is further assumed that the bar is infinitely thin and thus the inertia of a bar can be described by only taking the moment of inertia around an axis perpendicular to the longitudinal axis into account. If we place a reference frame at the center of mass of the bar, then the moment of inertia is *j* = *ml*^{2}/12, where *m* is the mass of the bar, and *l* is the length of the bar.

In this work, we will only consider class 1 tensegrities [58], that is, *pure* tensegrities. This means that no two bars ever share a common node. Furthermore, each node needs to be attached to a bar. Hence, there are exactly *n* = 2*b* nodes with *b* the number of bars.

Three-bar tensegrity prisms are the simplest class 1 tensegrities, consisting of three bars and nine springs. These structures can be stacked to create snakelike structures by adding springs between the prisms. An example of such a structure is shown in Figure 3. The structure is prestressed and free-standing (it does not collapse under gravity).

Let us now define the dynamics of a class 1 tensegrity with stiff bars and springs. The primary purpose of this section is to show where the nonlinearities of the system arise from and which parameters need to be chosen when constructing tensegrity structures.

We will follow the description in [58].^{3} First note that because the springs only generate forces, but do not have mass, we only need to integrate the trajectories of the bars. One degree of freedom is lost for each bar, because the bar length is fixed. Hence the total number of degrees of freedom is 5*b*.

A description of the dynamics with a minimum number of coordinates is given in [72].^{4} Here, we use six generalized coordinates ** q** = [

*r*^{T}

*b*^{T}]

^{T}per bar (Figure 4), as this simplifies the equations. The coordinate vector

**is fixed to the center of mass of the bar, and**

*r***is a unit vector along the longitudinal axis of the bar, the direction of which can be chosen arbitrarily.**

*b*^{5}The transformation from generalized coordinates to the Cartesian coordinates is now given bywhere

**= [**

*Q*

*r*_{1}…

*r*_{b}

*b*_{1}…

*b*_{b}] and

**Ψ**is a square, invertible matrix. If we order the nodes so that node

*i*is connected to node

*i*+

*b*through a bar, then

**Ψ**has a convenient structure:where the diagonal matrix

**contains the lengths of the bars.**

*L***contains the external forces acting on the nodes. The matrix**

*W***∈ {0, 1, −1}**

*C*^{s×n}is called the connectivity matrix and contains only ones, zeros, and minus ones. Here

*s*is the number of springs, which is at least 3

*n*/2 (each node is connected to at least three springs).

**λ**can be written aswhere

*k*is the spring constant of the spring,

*l*

_{0}the equilibrium length of the spring, and

*n*_{i}the

*i*th column of

**. Normally one should prevent the springs from going slack, as this risks collapsing the structure.**

*N***, with a uniform damping coefficient ζ.**

*R***is given by**

*M***= diag([**

*M**m*

_{1}…

*m*

_{b}

*j*

_{1}…

*j*

_{b}]) (

*j*

_{i}are the moments of inertia of the bars), and

**Ξ**= diag([0 … 0 ξ

_{1}… ξ

_{b}]) is a diagonal matrix describing the rotational equations of motion and the constraints ∥

**∥ = 1. These Lagrange multipliers are given bywhere**

*b*

*F*_{q,(b+i)}is the

*b*+

*i*th column of

*F*_{q}.

The dynamics of a mass-spring net are obtained for **Ψ** = ** I**,

**, a diagonal matrix containing the weights of the point masses, and**

*M***Ξ**=

**0**. This points out that tensegrities behave similarly to spring-mass nets, but with additional nonlinearities arising from the rotational equations of motion and the bar constraints. We note that even with linear springs a three-dimensional mass-spring net will have nonlinear dynamics for nonzero equilibrium lengths. This is due to the nonlinearity of the Euclidean distances in Equation 6.

### 2.2 Related Work on Tensegrity Structures

Tensegrity structures have an architectural and artistic background. Most of the early research on these structures focused on finding stable configurations and describing their static properties [12, 15, 21, 23]. The result of this research is that a vast literature on typical configurations and their properties is now available [16, 47].

Much less literature is available on the dynamic properties of tensegrities. Motro [47] lists a few examples of actuated structures, and Sultan et al. [61] investigated the linearized dynamics. Skelton and de Oliveira [58] provide Lyapunov-function-based control techniques, but the practical use of their method may be limited to underactuated robotic platforms.

Paul et al. [50] were (to the best of our knowledge) the first to link tensegrity structures to the morphological computation domain. They evolved gaits for tensegrity prisms and discussed the robustness of these robotics systems to actuator failures. Rieffel et al. [53] went a step further by introducing *morphological communication*. In their work, independent controllers for parts of a tensegrity structure interact only through the dynamics of the structure, that is, the structure itself is used as a communication tool. More recently, Bliss [7] has shown an interesting example of taking the (linearized) dynamics into account while developing a CPG-based controller (see Section 3) for a tensegrity structure.

Our choice for tensegrity structures also has a biological inspiration. Ingber has done remarkable work on cellular mechanics based on tensegrity structures [31, 32, 69]. On this level, there is no neural control and the information exchange is chemical. Because the techniques we present in this work make no assumptions on the type of actuation or sensor feedback, they might be used as a tool to explain or study the fundamental mechanisms of cell movement and mechanotransduction [33].

## 3 Central Pattern Generators

CPGs are neural circuits typically found in the spine of vertebrates that generate rhythmic activation patterns without sensory feedback or higher-level control input [30]. Our prime goal is to show that a lot of computational power can be exploited in compliant structures. The more computations can be outsourced to the body, the less effort one needs to put into the construction of CPGs (for robotics applications) and the less external computational power is needed.

Robotic systems are not often as compliant as the ones we study here, and the available morphological computational power may be insufficient to allow for the desired behavior with a static linear feedback. We argue that one should, however, try to keep the body's dynamics as much (and as soon) as possible in the loop, to be able to exploit the morphological computational power. Indeed, in the compliant tensegrity structures we can go as far as leaving out the external CPG completely.

### 3.1 Matsuoka Oscillators

**is the matrix describing how the neurons are connected. It is typically sparse (Matsuoka mostly analyzed small regular connection patterns). The positive semidefinite connection matrix**

*A***was constructed similarly to the stress matrix of the tensegrity structure with the diagonal (self-feedback) removed. More precisely,where**

*A***is the connectivity matrix as defined in Section 2.1. Hence, the neurons are connected in the same way as the springs connect the nodes of the tensegrity structures. The choice of this connection pattern was in a sense arbitrary. However, random connection patterns tend to generate chaotic trajectories, which are unwanted in this work.**

*C*The integrating neuron and the fatigue have time constants τ_{1} and τ_{2}, respectively. The steady state firing rate of the neuron is determined by ι, and γ is called the impulse rate of the tonic or slowly varying input [45]. In this work, we keep these parameters constant, that is, the oscillator itself is not modulated. The parameters are ι = 1, τ_{1} = 0.5, τ_{2} = 5, and γ = 1. Figure 5 shows an example of CPG signals generated by the above procedure. There were a total of 12 dimensions (five shown), and the connection pattern was taken from the tensegrity icosahedron shown in Figure 6.

In Equation 12, *y*^{osc} contains the firing rate of the neurons, and *v*^{osc} models the fatigue. The firing rates *y*^{osc} are the outputs of the oscillator, and these signals are used to construct the target motor signals. We resampled the output signals *y*^{osc} so that the signals had the correct frequency for the experiment (normally 1 Hz). The desired output signals are random linear combinations of this *N*-dimensional signal *y*^{osc}.

*y*^{osc}, we construct target motor signals as a linear combination:In practice, we also add a constant bias variable to

*y*^{osc}. The matrix

*W*^{target}is random (normally distributed values) in most of this work (i.e., we assume the desired CPG signal to be known), except in Section 5.2, in which we optimize the CPG signal. Both

*W*^{target}and

*y*^{osc}are random. The fundamental difference is that

*y*^{osc}is

*generated*by a random oscillator. These signals will take values between 0 and 1, while the motor signals will need a correct offset and amplitude. This problem is solved by

*W*^{target}, which combines the signals from

*y*^{osc}into meaningful motor commands.

We chose the Matsuoka oscillator because of its simple structure, which can be chosen similarly to the connection pattern of the tensegrity structure itself. While we have not yet explored this path, we hope the morphological communication idea in Rieffel et al. [53] can be integrated in this way.

## 4 Physical Reservoir Computing

Classical techniques for training recurrent neural networks, such as backpropagation through time [55], approximate a desired output signal by modifying the internal weights of the neural network (as well as the readout weights, if any). This is often cumbersome and difficult to implement correctly, as one needs gradient information to apply the chain rule. Furthermore, backpropagation through time is prone to local minima.

Reservoir computing (RC) is a conceptually much simpler technique to train such recurrent neural networks [68]. Instead of modifying the internal weights, the original network is left as is, and only the readout weights are modified. The original network essentially becomes a computational black box. The outcome of this training procedure typically depends on a few parameters that define the regime of the neural network.

RC is known under different names, depending on the type of recurrent network that is trained. Most importantly, we distinguish liquid state machines [43] and echo state networks [34, 35]. The core idea of RC, originally applied only to neural networks, has since been extended to other nonlinear dynamical systems, leading to what we call PRC. There have been demonstrations of the RC approach applied to different domains such as photonics [67] and, more abstractly, electronics [3]. All these implementations share the common idea that a system with complex dynamics is perturbed externally but left untouched otherwise, and a simple readout mechanism is trained to perform the desired computational task. While the idea of PRC originated in the context of neural networks, recent theoretical results have extended the applicability of this computational framework immensely, showing that any dynamical system of a given size, obeying easily satisfied constraints, has the same computational power [17].

There are two main applications of such a system. First, one can use it to approximate nonlinear filters by training the readout *W*_{out}. In this case, one normally scales *W*_{res} so that the system has the fading memory property (based on the spectral radius). Simply stated, this means that when the input is removed, the system dynamics will die out.

*W*_{fb}are typically chosen at random, and again, only

*W*_{out}is trained. This system can be used to generate desired signals autonomously.

The first kind of task is clearly easier to train, as it is an open loop system. For signal generation tasks, small changes to the feedback weights can have a large influence. To imitate CPG signals with morphological computation, we need to consider the second approach.

**. The equilibrium length of a subset of the springs is chosen as feedback to the system. Differently from [25], we use only linear springs (Equation 6). In our experiments, the nonlinearities arising from the changing geometrical configuration and inertia are sufficient for good performance. We now define the system state (cf. Equation 19) for our setup:where**

*x***(**

*f**t*) are the spring forces measured at time

*t*, Δ = 20 ms is the controller time step, and

*k*is the number of delay steps. For the tensegrity icosahedron simulations, we used

*k*= 9 (maximum delay of 200 ms) and

*k*= 3 for the snake robots. The main rationale for this is that it allows the feedback to filter out noise due to ground collisions to some degree (by averaging over the delayed inputs).

^{7}One can see from Figure 22 in Appendix 1 that the time-delayed sensor information is indeed highly correlated. Using Hooke's law and Equation 6, each element of

**(**

*f**t*) can be written asWe explicitly use the time index for the equilibrium lengths

*l*

_{0,e}(

*t*) because the tensegrity structures we consider contain springs with varying equilibrium lengths. We shall call the subset of the springs of which the equilibrium length

*l*

_{0,e}can be modified either actuators, actuated springs, or motors. We call the subset of passive, fixed-equilibrium-length springs

*pas*, and the subset of actuated springs

*act*. is a constant vector defined by the equilibrium state of the structure. is time-varying and is given byHere

*l*

_{max}is the maximum change in equilibrium length of the springs (w.r.t. the initial lengths in ) allowed by the actuators. For this to work, we must have , with

*a*being the number of actuated springs. Now

**(**

*y**t*) will in general be a linear combination of

**(**

*x**t*) and a constant bias input:The goal of most of the algorithms we will study is to optimize the matrix

**.**

*W*In the experiments presented in this article, we used *g*(** y**(

*t*)) = tanh(

*x*). It is important to justify the use of a nonlinear function, as it can provide computational power (as in the RC approach). Therefore, we also tested the setup with a hard limit

*g*(

**(**

*y**t*)) = min(max(

**, −1), 1) and with the identity function (no limit). Both cases provided quantitatively similar results to those presented in the experimental Section 5. The identity function was discarded, because it does not guarantee boundedness of the feedback, and spurious sensor data can make the structures collapse. In practice, we noticed that with the identity function, the structure would operate correctly for (say) 30 s after training and then collapse because of an extreme sensor value during a ground collision. As explained in Appendix 2, ideal motors were assumed. However, a physical implementation will always be limited by the maximum offset of the motor, which validates the use of**

*y**g*(

**(**

*y**t*)).

To conclude this section, we note that for our setup, ** x**[

*k*] from Equation 19 is replaced by sensor measurements from the tensegrity structure, and the output

**(**

*y**t*) is a linear combination of these values. In contrast to the classic RC or echo state network implementation, the feedback enters the system through a physical modification of the system by modifying the equilibrium lengths of a set of actuated springs. The system itself is continuous-time, but the spring lengths are only updated at discrete time steps.

## 5 Experiments

The experimental section of this article consists of three parts. First, we introduce a set of algorithms to train tensegrity structures to produce rhythmic patterns. Next, we discuss possible applications for locomotion. We end with a comparison of different parameter combinations to study the importance of nonlinearities in the system.

### 5.1 Outsourcing Motor Pattern Generation

#### 5.1.1 Recursive Least-Squares Approach

The first training algorithm we will consider is based on the recursive least squares (RLS) algorithm [36]. When the same samples are presented to the RLS algorithm, it will compute the same weights as batch linear regression (which we used in previous work [13] and is also used by [25]). The advantage of RLS is that it allows us to gradually transition from a completely teacher-forced structure (the desired signals are fed into the system) to a system generating its own control signals, and to restart training if needed.

There are two disadvantages in our opinion. First, one needs to update the matrix containing the covariances of all the input variables, which does not scale well.

The second and more fundamental disadvantage is the dependence on explicit knowledge of the target function, because one needs to know the difference (error) between the optimal motor signal and the current signal generated by the RLS algorithm. In a practical setting we do not always know the target signal and often only have some global performance measure at hand.

*W*^{rls}are updated using the RLS equations:

There is only a single parameter, namely the teacher-forcing decay time constant τ_{rls}. The covariance matrix *P*^{rls} was initialized using the identity matrix. We note the difference from force learning [62], in which initially chaotic systems are used. The main reason for this is that tensegrity structures are inherently damped, and to create chaos one would need a feedback loop to drive the system. From a practical point of view this might be inefficient, as one would need additional actuators that were only used to keep the system active. In this sense, the RLS approach used here is closer to the teacher-forcing approach [74]. In this approach, the desired output is fed into the system during training, and the state of the system, ** x**(

*t*), is stored. Then, regression is used to approximate the desired output from the system state. Finally, during testing, the approximate output based on the system state is fed back into the system, and the system will generate the desired patterns autonomously. The testing phase is also called the free run, as the system is no longer forced by the external input.

The gradual change from teacher forcing to free run, used in this work, allows the structure to take over the control in a smooth way and to restart learning in a straightforward way. We noticed that the RLS algorithm becomes unstable if learning continues with α_{rls} too low (viz., α_{rls} ≲ 0.03). So we simply switch to free run when α_{rls} drops below the threshold. The most likely explanation for the instability is that it is caused by the phase drift between the output and the teacher signal when the system is unforced.

A demonstration of the RLS approach is shown in Figure 7 and Figure 8. In this case, six actuators were used (i.e., six output dimensions). One can observe that the output signal gets out of phase with respect to the target signal, due to collisions with the ground. There is noise in the system, due to the control time step (20 ms) and the ground collision.

Figure 8 shows a phase portrait of two output signals from Figure 7. The output signals stay in phase with each other, which is important for locomotion. The RLS rule can capture the complex details of the target signals through the nonlinearities provided by the structure.

One might ask if the system is not simply vibrating along one of its shape modes. Such a result would not be useful, as for locomotion tasks we want the system to undergo large shape deformations. The shape modes of the tensegrity icosahedron that was used in this experiment (without the actuators) can be found in [23]. We show in Figure 9 that the answer is no. In this example we simulated a tensegrity in free fall to prevent collisions and again trained a random motor pattern with nine actuators. The complex trajectories of the endpoints of each bar are shown.

#### 5.1.2 Gradient Descent Approach

*W*^{rls}withBecause the learning rate α

^{gd}has to be chosen small enough to prevent instability, the GD rule converges slower in practice than the RLS rule.

#### 5.1.3 Eliminating the Teacher: Reward-Modulated Hebbian Approach

For various reasons one might prefer to use only a single reward signal instead of having an error function per output. We might not be able to conceive a suitable error function, when for example an error measure is only available at a non-actuated spring. It is also biologically more plausible to have only a limited number of reward signals. We will only consider instantaneous reward, but extensions are possible by keeping track of eligibility traces [20].

Note that the essence of the gradient rule is that the weight changes should follow the correlation between the input variables and the error signal. A reward can be interpreted as the inverse of an (absolute) error. The reward is large (in amplitude) when the error is small, and vice versa. So instead of doing gradient descent on the error signal, we can equivalently do gradient ascent on the reward signal.

The meaning of a *large* reward is ambiguous, because to replace *e*^{rls}(*t*) with some measure of the reward, we need the scalars in this vector to take positive values when reinforcing the weights to this output would increase the reward and vice versa, while a reward can have an arbitrary offset.^{8} So the trick is that we need to subtract the baseline performance from the reward, or, stated differently, we need to know how surprising a reward is [57].

We often do not know the reward signal explicitly. Hence, we cannot find an analytic form of the derivative of the reward. The trick to overcome this is to use finite differences to estimate the derivative of the reward. For this we add random noise to the output and observe changes in the reward signal. The reward signal is usually one-dimensional, so we need to find out which weight should be reinforced.

Legenstein et al. recently used a learning rule based on these observations for training large (compared to our tensegrity structures) neural networks [38]. In addition, they also assumed the noise signal to be unknown and estimated the noise from the system output. We assume the noise signal to be known and use a reward-modulated Hebbian learning rule similar to the one from [20, 40, 41]. Among these, [41] also provides non-Hebbian variations on this rule.

*R*(

*t*) is the (instantaneous) reward and is the short term average (baseline) of the reward. Here was computed by taking the average of the rewards during the last 100 ms. The reward

*R*(

*t*) should be a monotonic function of the error, that is, the reward should decrease if the error increases. For the injected noise

**(**

*ν**t*), Gaussian white noise (GWN) was used with a decreasing standard deviation as a function of time. The noise

**(**

*ν**t*) is not only used to update the weights, but also fed into the structure (

**(**

*y**t*) =

*W*^{rmh}(

*t*− Δ)

**(**

*x**t*) +

**(**

*ν**t*)). Indeed, if this were not the case, one would need some critic that provides rewards based on hypothetical motor outputs. This rule reinforces weights according to how the reward and the inputs covary, which is why this rule is called Hebbian-like. It is similar to the classic Hebb rule, but for reward and neural activity [27].

The learning rule from Equation 32 or Equation 35 can be used in two ways. First, we can simply use it to replace the RLS or GD rule, when outsourcing the computation to the structure. In this case we still need some teacher to drive the system during learning, which limits its practical use, and it is more or less a replacement for the GD rule. Secondly, we can use it to train feedback without knowledge of the target signal at the neural level.

To apply RMH or similar techniques in a recurrent neural network, one typically starts from a chaotic network [28, 62], and the trained feedback drives the network toward a cyclic attractor. However, it is reasonable to assume that for robotics applications, chaotic movements might be undesired. Therefore, we took a slightly different training approach. We first trained the system (using RLS) to maintain an oscillatory pattern while noise was injected through additional actuators. Hence, we obtained robust but not chaotic patterns. There are, however, variations in the oscillations caused by the injected noise. Then, learning through RMH starts on the additional actuators.

One might argue that the use of RLS at this point negates the advantage of RMH. However, RLS is only used to keep the system active during RMH learning, and the target signals of RLS and RMH are independent (except for the fundamental frequency). A simple oscillator (e.g., a sine wave or coupled neurons) could also be used instead of a trained feedback controller. In a typical RC setup (with hyperbolic tangent neurons), it is possible to scale up the connection weights to start the learning process in a chaotic regime. In the case of tensegrity structures, we tried using a random feedback loop, which we then scaled to find a chaotic regime. Unfortunately, while doing this the structures often collapsed or did not stay active, and we thus concluded that this method would be cumbersome on a real platform.

The presented approach can be useful in robotic applications in which there is already some oscillatory behavior in the system. This can for example be generated by a very simple CPG signal. The RMH algorithm can then directly be applied, for example, to refine the motion. Hence, it is one possible application of the combination of a simplified CPG with our approach. The basic movements can also be provided by a controller based on linearized dynamics, where again RMH can be used to optimize the match between the actual plant and the linearized model.

In Figure 10, three major phases of training using the reward-based technique are shown. Here two feedback loops were trained using the RMH rule (out of a total of eight). RLS was used to train a random motion pattern (with the same frequency) on the first six outputs during 200 s (left graph). Then RMH learning starts, and initially the target signals are not at all matched. During training (middle graph), the outputs start to match the desired signal more closely, yet there is still some visible error. During testing (right graph), the noise source is disabled and the output almost exactly matches the desired signal. In this example, the tensegrity was in free fall to remove the disturbances from ground collisions so as to show that the desired signal can be closely matched.

### 5.2 Applications

In this section, we present a set of practical applications of morphological computation in tensegrity structures. We first show that the structure can modulate its gait patterns when we change the equilibrium lengths of a few springs. Next we look at gait optimization. We optimize the gait pattern with an external controller and then outsource the resulting gait to our static, linear controller. Finally, we discuss a basic end-effector control application.

#### 5.2.1 Modulating Motor Patterns

An important question is whether the trained tensegrity structures can react by adapting their gait to different configurations of the structure or, for example, to the slope of a hill. To test this, we added a single input signal to the system. This signal was fed into the tensegrity structure by modifying the equilibrium length of two actuated springs. The target motor patterns had to be modulated by the structure to linearly interpolate between two CPG patterns with the same frequency.

We again used the tensegrity icosahedron to show that such modulation is possible even in relatively small systems. Figure 14 shows a result from a run of the algorithm. We trained the system for only 400 s. At each time step, the system switches to another random input with probability 0.005. So the time between gait changes is variable. This also shows the robustness of the system, because accidental fast switches between input states disturb the system.

#### 5.2.2 Gait Optimization

Gait optimization in robots is a complex problem, because small changes to, for example, the relative phase of two limbs or the duration of support phases can result in different locomotion patterns or failure in legged robots (see, e.g., [1] for reviews of animal gait patterns). For example, an animal typically positions its legs during locomotion to reduce the magnitude of joint moments and thus the required muscle forces [6].

Optimizing all aspects of gait properties is beyond the scope of this article. We assume the robot's configuration to be known, as well as the CPG frequency. Figure 15 gives an overview of the training procedure we will follow. Our goal will be to optimize the weights of the matrix *W*_{target} for a given basic CPG. We will then outsource the optimal gait to the structure with the RLS algorithm. As we will see, the gaits obtained using only morphological computation match those obtained during training. Thus, the structure can approximate the required motor patterns well enough for locomotion.

To optimize *W*_{target} we use the well-known CMA-ES algorithm [24]. The reason for this is that it is almost parameter free and has very good performance. The fitness function we use is simply the distance traveled by the center of mass of the tensegrity. Because of the compliance of the tensegrity structures, we do not need to include penalties, for example, for falling.

Figure 16 shows the trajectory of the center of mass of three different tensegrity structures. On the left, we see the tensegrity icosahedron with a number of additional springs. Remarkably, the gait was obtained after only 10 iterations of the CMA-ES algorithm. The population size was 50, and there were four actuators. The gait was evaluated over a period of 30 s. This means that only 4 h of exploration time would be necessary to obtain this locomotion pattern on a real robot.

The two other plots are from snakelike tensegrity structures that were constructed by stacking tensegrity prisms. Figure 17 shows the center structure in action, while Figure 3 is the tensegrity from the right structure in our simulator.

To show that the same gait is indeed maintained, we compared (Figure 18) the (vertical) ground reaction forces on the endpoints during training and testing of the large snakelike tensegrity from Figure 17. This system has 20 actuators in total. Due to the complexity of the structure, there is some variation in the ground reaction forces, but there is a clear pattern. The relative phase between the ground contacts is identical during training and testing. The training sample is taken from the beginning of the training (almost completely teacher forced), while the testing sample is from the end of testing (free run).

#### 5.2.3 End-Effector Control

To end this applications section, we show that the same technique can also be used to control an end effector. The objective is now to control the position of the endpoint of a bar with respect to two other endpoints. For this we measure the length along two springs connecting the endpoint of the bar with the endpoints of the other bar. Imagine controlling the position of the wrist with respect to the shoulder.

We assume no model of the system is known and use CMA-ES to optimize *W*_{target}. The CPG has the same frequency as the target movement. Because the CPG has only a limited number of basis signals and the structure is underactuated, it is to be assumed that the target trajectory cannot be perfectly matched. In this example, we used a 30-dimensional CPG, based on a connection pattern from a stacked tensegrity prism.

To compute the fitness, we simulated the system for 100 s and computed the mean squared error over the last 80 s. The system was in free fall, and the springs along which we measured the position were not actuated. A tensegrity icosahedron with a total of 13 actuators (Figure 19) was used for this example (24 degrees of freedom, because the rigid body movements are ignored).

The result is shown in Figure 20. While the target trajectory cannot be perfectly matched (due to underactuation and the limitations of the CPG), the result is very encouraging. The end effector is part of the computational system itself, and the springs along which the position is measured influence the system as well. We only used 75 s of training, using RLS to transfer the control from the external CPG to morphological computation.

### 5.3 The Importance of Complex Dynamics

To complete this experimental section, we want to show that the nonlinearities can indeed improve the computational power of the system. Such a statement is of course task-dependent; for example, to generate sine waves, it is obviously not advantageous to have complex nonlinear dynamics in the system. We will again consider the generation of CPG-like signals based on the Matsuoka-type nonlinear oscillator in combination with the tensegrity icosahedron.

As Sultan et al. [61] indicate, the linearized dynamics of tensegrity deviate more from the nonlinear dynamics of the system at higher (generalized) velocities and lower pretension (i.e., when the system is more flexible or compliant). While typically one would restrict the velocities and deformations of the system so that the linearized dynamics are a good model of the system, our technique benefits from the opposite.

Many parameters of the structure can be tuned, and optimizing the configuration of the structure itself is a daunting task. In this section we only consider the importance of two parameters, the oscillator frequency and the maximal change of the actuator equilibrium length. One can easily define physically plausible regions of operations for both parameters (see Appendix 2), and we would like to know if within these regions of operation, there are significant changes of computational performance.

*l*

_{max}) was varied in steps of 3.5 cm from 5.5 to 37 cm. For each tuple (frequency, distance), we performed 50 trials, for a total of 15,000 trials. We computed the normalized mean squared error, defined aswhere

*N*is the number of samples,

**is the vectorized output, and**

*x***is the vectorized target signal. For each set of 50 trials, only the best 30 are kept, to prevent failures (e.g., collapsing) from influencing the results. The results are shown in Figure 21.**

*y*So what can we learn from this? First, we see that for the task at hand, it is advantageous to work in a nonlinear region by increasing the frequency of the oscillator or the maximum spring equilibrium offset. It is important to note that although the frequency is a determining factor, the technique is not constrained to the natural frequency of the system. There is broad region of frequencies with similar performance. One might consider the bottom right region of operation, with only very small amplitudes. The practical use of this region is however limited, as the movement of the robot will be very limited.

On the other hand, going beyond the 30-cm range often causes instability (collapsing) and in practice will cause bars to collide. In practice, the performance will be restricted by a diagonal line going down from near the top left to the bottom right, because of practical limitations such as motor output power. So within the resulting region, better performance can be obtained by increasing the frequency or the maximum spring equilibrium offset.

Interestingly, for the lower frequency range (which might be interesting for energy efficiency reasons) it is advantageous to increase the maximum offset. Larger deformations of the structure cause the error to decrease.

## 6 Discussion

Compliant robots have been of interest to the robotics community for over a decade. We have seen many exciting examples of very simple control laws leading to complex behavior and of robustness against external perturbation. Compliance offers multiple advantages over classic, stiff robotics: It can allow for safer robot-human interactions, increased energy efficiency, robustness against external perturbations, and simplification of the control.

Notable examples of compliant robots that have very simple control laws are Puppy [29], Reservoir Dog [73], Wanda [78], and more recently, the one found in [52]. The control of the Reservoir Dog in irregular and unknown terrain was simply based on a sine wave with a different phase and offset for each leg, while a similar stiff robot would need complex sensory equipment and an elaborate controller [11].

In this work, we used tensegrity structures to model compliant systems. However, two important remarks need to be made.

First, the exact dynamics of the system need not be explicitly known to the learning algorithm. This is the underlying idea of reservoir computing: A dynamic system can be used as a computational black box, encoding a nonlinearly expanded history of environment interactions in the instantaneous state of the system. Such an abstraction has many advantages, as we can change substrates or construct hybrid systems while still using the same readout learning algorithms. It also does not define how the readout mechanism is actually implemented, and would allow, for example, a neural substrate, electrical wiring, or mechanical connections.

Secondly, we can exploit the fact that historically tensegrity structures have been used to model a plethora of complex systems from the micro to the macro scale. Even though tensegrity structures were initially used only in art and architecture, they have now also been successfully applied as a model for cellular cytoskeleton structures [31, 32, 69]. At the micro scale, the equations of motion are different and their exact form is often unknown, but we still find compressive elements (e.g., microtubules) and tensile elements (e.g., microfilaments). Inside a single-celled organism, there is no central nervous system, but chemical and mechanical interactions determine the cell's behavior, and flagella or cilia allow locomotion [39, 54, 64, 65]. Microorganisms such as nematodes are often capable of rich movement patterns and interaction with the environment while possessing only very simple nervous systems [18, 19]. Based on this, we can hypothesize that the results of our work could provide insight into the fundamental mechanism underlying how simple organisms can perform computations and locomotion required for their survival.

When taking a higher-level viewpoint on the nature of certain aspects of cognition and computation, our results can offer additional, empirically validated arguments in the quest for understanding cognition in biological organisms. Indeed, we have given several examples of systems in which the computational aspects of locomotion are for the most part physical in nature, making the structures discussed here prime examples of the idea of embodied cognition. Moreover, our analyses allow us to quantify the nature of the computation occurring in the substructures, most notably the controller and the physical system. This viewpoint is in our opinion applicable to many of the interactions between the body, sensory inputs, and early cognitive layers, but will probably not suffice to fully explain the complete array of cognitive capabilities of human-level intelligence.

When considering cognition as performing computation in the broadest sense, it is clear from our results that this computation is very much divided across the explicit linear control and the implicit nonlinear transformations of interactions with the environment, mediated by the physical properties of the structures. Indeed, the idea underlying the principle of PRC is precisely that the range of possible dynamical systems that can be used for computation is extremely broad, as are their properties regarding nonlinearity or memory. This is not merely a philosophical conjecture: A mathematical framework supporting this claim was recently introduced and proved in [17], showing that any dynamical system of a given size performs the same amount of computation, simply realizing different functions of its external perturbations. We would therefore propose that the question of the true seat of cognitive computation—mental or physical—is rather ill posed, and that the truth probably lies somewhere in between. Instead of viewing sensing and cognition as separate but linked entities, we propose that across organisms or even within a single organism, the distinction between mental cognition and strictly embodied cognition cannot be drawn and likely lies on a continuum.

## 7 Conclusions

In this work, we have introduced an extreme form of embodiment allowing for so-called physical reservoir computing in a very explicit sense. It was demonstrated by using highly dynamic and actuated tensegrity structures, effectively computing functions on the history of environmental interactions. This allows simple linear learning rules, with a varying degree of reward information, to be able to learn complex locomotion patterns or desired end-effector trajectories.

This provides a number of advantages from a robotic standpoint: The control complexity can be highly reduced, very uninformative reward signals can be used to train complex pattern generators, and the learned control law is robust to perturbations and can easily synchronize with environmental interactions.

But from a conceptual point of view, the conclusions are more profound. By demonstrating that dynamic “bodies” only require extremely simple “brains” to implement computations, we effectively opened up a whole spectrum of potential tradeoffs between brain-based computation and body-based computation. The powerful computational results from the field of reservoir computing [17, 26, 35, 43] can then be used to actually quantify and reason about the computations implemented by the physical body.

## Acknowledgments

This research was funded by a Ph.D. fellowship of the Research Foundation—Flanders (FWO) and the European Community's Seventh Framework Programme FP7/2007-2013 Challenge 2 Cognitive Systems, Interaction, Robotics under grant agreement No. 248311—AMARSi. The authors would like to thank Robert Legenstein for providing the original reward-modulated Hebbian learning code from [38].

## Notes

With reasonable assumptions, they can be used to approximate any nonlinear filter with fading memory.

We have simplified the notation by only considering bars with uniform mass, described by their center of mass.

We will also consider this description in Appendix 3 for the feedback linearizability of a single bar attached to springs.

We use the notation *x* to denote a scalar, ** x** for a vector, and

**for a matrix.**

*X*Often a bias input is added, which makes the equations nonsymmetric. The nonlinearity tanh is most often used, but variants are possible.

In fact this is not entirely true, because one can restrict the learning rules to reinforcements only, as in [28]; but we allow for both positive and negative weights here.

## References

### Appendix 1: Reducing Communication Load

*P*^{rls}with the identity matrix. We then havewhich is exactly the GD rule with a varying learning rate of ∥

**(**

*x**t*)∥

^{−2}.

Thus, the GD rule is almost equivalent to the RLS rule for uncorrelated input variables. This also indicates how RLS can be implemented in a biologically plausible way. By adding a (linear) layer to project the inputs on their principal components, one obtains uncorrelated input variables. The generalized Hebbian algorithm (GHA) can be used to implement this [56]. Therefore, as long as an error signal is available for each motor signal, both RLS and GD are viable options that are straightforward to implement.

The GHA is an extension of Oja's rule [48] to multiple output networks. In fact, it can be seen as a stacked version of Oja's rule. From each additional output, all previous principal components are extracted. Oja's rule itself is simply a stabilized version of the classic Hebbian learning rule, and it is obtained after doing a first-order Taylor approximation of the Hebbian rule with normalization to keep the norms of the weights equal to one.

One application of the additional decorrelating layer is that it can be used to reduce the number of signals to be communicated. Indeed, the first principal components will extract the most fundamental properties of the signals in the system. Hence, it provides a natural way of forcing the RMH-EH algorithm to focus on the first principal components. This is due to two reasons. First, the first principal components will typically contain *simpler* signals. Secondly, the first principal components are more stable; hence they will tend to be reinforced more than fluctuating inputs.

Figure 22 shows information on the GHA layer when used in combination with RMH (in the loop). Shown are the correlation matrices of the raw sensor date (spring forces and their derivatives), the input to the GHA layer (10-time-steps-delayed raw sensor data) and the correlation of the GHA output trained with 100 outputs. Although the GHA output is not completely decorrelated (this can be improved by increasing the training time), the correlation between variables is much less pronounced than before.

### Appendix 2: Simulation Details

We simulated the tensegrity systems by vectorizing the differential algebraic equation description (see Section 2). The VODE solver from [10] was used with a variable time step. Ground collisions were modeled as external forces acting on the endpoints of the rods. The reaction forces were computed as in [75]. Internal (bar-bar) collisions were not taken into account.

We assumed ideal motors (instant change of spring equilibrium length), but limited their range and velocity. More precisely, each motor could change the spring length by at most 0.3 m at 0.3 m/s. For the tensegrity icosahedron, we normally actuated nonstructural springs (i.e., the structure remains stable without these springs); hence the motors need not be actuated at rest. The actuator springs (for the icosahedron) had an equilibrium length equal to the distance between their endpoints with the actuator springs removed. The magnitude of the force on the springs was usually <10 N. We assumed that the rods weighed 0.4 kg/m. The rods had lengths between 0.2 and 1 m, depending on the configuration (around 0.5 m for the icosahedron, which had a mass of about 1.4 kg).

That these are reasonable conditions can be seen by computing the required motor power. Assuming a constant speed of 0.3 m/s with an applied load of 10 N, a motor power of 3 W is needed. Lightweight (<50 g) DC motors are available in this range. Small lithium polymer batteries can deliver enough power and energy to drive such motors over longer periods of time.

A controller frequency of 50 Hz was used to prevent stringent communication requirements. We can estimate the total bandwidth required, for example, for the tensegrity icosahedron. Spring forces can be measured using strain gages. ADCs in this frequency range are commonly available at low cost with a precision up to 24 bits (3 bytes). The icosahedron has 24 springs in its minimal configuration, and 60 when fully connected. Hence, the minimal communication bandwidth is (2 springs × 3 bytes/spring + overhead) × 50 Hz (spring force and derivative). We assume broadcasting is used, as the algorithms can be implemented locally on each rod. The overhead can contain the target signal if required. Let us assume an overhead of 5 bytes per motor; then we have (2 springs × 3 bytes/spring + 1 motor × 5 bytes/motor) × 50 Hz. Hence, for the icosahedron in its minimal configuration with 10 actuators (the forces on the actuator springs are not used), we obtain a bandwidth of only 9.7 KB/s.

### Appendix 3: Feedback Linearizability of a Single Bar

For the sake of completeness, we investigate the feedback linearizability of a bar attached to springs. Consider a single bar attached to three springs at each outer end. This is a minimal assumption on freestanding tensegrity structures (each endpoint needs at least three attached springs). Each of the springs is fixed to the rod at one end and to a fixed location at the other.

*u*is the transformed control input. Assuming ideal motors, such an input always exists when the springs are strictly in tension. Note that in practice this can easily be implemented, as the spring length can be measured by a force sensor on the spring.

*u*

_{i}to be unrestricted, the column space of the matrix [

**−**

*n*

*p*_{0}

**−**

*n*

*p*_{1}

**−**

*n*

*p*_{2}] is . This is trivial: As long as

**does not lie on on the plane defined by the endpoints of the springs, it is fulfilled.**

*n***is the center of mass of the bar, and the orientation of the bar (around two orthogonal axes).**

*r***Φ**(

**) is a matrix containing the partial derivatives of the nodal coordinate vector [**

*q*

*n***0**

^{T}

*n***1**

^{T}] w.r.t. the generalized coordinates

**. This Jacobian matrix is given byUnder the ϕ ∈]0, π[ constraint,**

*q***Φ**(

**)**

*q*^{T}has full rank and thus we can always find values

*f*_{n}.

### Appendix 4: Form-Finding

**(Equation 5) is known as well as the bar connectivity. The bar connectivity is stored in a matrix**

*C***constructed like the spring connectivity matrix**

*B***. For the class 1 tensegrities studied in this work,**

*C***can trivially be rewritten (by reordering the nodes) asThe problem we face is to find an equilibrium state for a structure with given matrices**

*B***and**

*B***, that is, we want to know the positions**

*C***of the nodes and the force density diagonal matrix**

*N***Λ**such thatfor some diagonal matrix

**Γ**. This simply means that without external forces, the forces due to the bars and springs are balanced. As we assume the bars to have fixed lengths and to have infinite tensile strength, this results in a net acceleration 0 of the nodes in this equilibrium configuration. If the bars do not have infinite tensile strength, then

**Γ**depends on the Young's modulus of the bars, which is the common approach in the literature.

**contains the forces acting on the nodes. If the tangent stiffness matrix**

*f***is positive semidefinite, then the structure will be stable.**

*K*The problem is now to find the force densities **Λ** and **Γ** to ensure that ** K** is positive semidefinite. We applied the technique from Zhang and Ohsaki [77], which starts from a given connection matrix but without knowledge of the force densities and then iteratively updates an initial estimate of the force densities. This algorithm finds structures with 12 free variables (in three dimensions) for a given set of force densities. We then used CMA-ES to optimize these variables to maximize the minimum bar-to-bar distance.

Note that in [77], struts are assumed instead of bars. Hence, after form-finding, we replace the struts with bars, which does not change the equilibrium state.

## Author notes

Contact author.

Reservoir Lab, Electronics and Information Systems Department, Ghent University, Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium. E-mail: ken.caluwaerts@ugent.be (K.C.); michiel.dhaene@ugent.be (M.D.); david.verstraeten@ugent.be (D.V.); benjamin.schrauwen@ugent.be (B.S.)