## Abstract

We present a general and fully dynamic neural system, which exploits intrinsic chaotic dynamics, for the real-time goal-directed exploration and learning of the possible locomotion patterns of an articulated robot of an arbitrary morphology in an unknown environment. The controller is modeled as a network of neural oscillators that are initially coupled only through physical embodiment, and goal-directed exploration of coordinated motor patterns is achieved by chaotic search using adaptive bifurcation. The phase space of the indirectly coupled neural-body-environment system contains multiple transient or permanent self-organized dynamics, each of which is a candidate for a locomotion behavior. The adaptive bifurcation enables the system orbit to wander through various phase-coordinated states, using its intrinsic chaotic dynamics as a driving force, and stabilizes on to one of the states matching the given goal criteria. In order to improve the sustainability of useful transient patterns, sensory homeostasis has been introduced, which results in an increased diversity of motor outputs, thus achieving multiscale exploration. A rhythmic pattern discovered by this process is memorized and sustained by changing the wiring between initially disconnected oscillators using an adaptive synchronization method. Our results show that the novel neurorobotic system is able to create and learn multiple locomotion behaviors for a wide range of body configurations and physical environments and can readapt in realtime after sustaining damage.

## 1. Introduction

The possibility of exploiting intrinsic chaotic dynamics has recently attracted the attention of both neurobiologists interested in how animals learn to coordinate their limbs (Mpitsos, Burton, Creech, & Soinila, 1988; Kelso, 1995; Korn & Faure, 2003), for instance, in locomotion behaviors, and roboticists striving to develop better, more efficient locomotion systems for articulated autonomous robots (Kuniyoshi & Suzuki, 2004; Steingrube, Timme, Worgötter, & Manoonpong, 2010). Chaotic dynamics emerging spontaneously from interactions of neural circuitry, bodies, and environments can be used to power a kind of search process as an embodied system explores its own possible motor behaviors. However, to date, it has not been clear how to harness chaos in a general goal-directed way such that desired adaptive sensorimotor behaviors can be explored, captured, and learned. We address this deficiency by presenting a general and fully dynamic embodied neural system, which exploits chaotic search through adaptive bifurcation, for the real-time goal-directed exploration and learning of the possible locomotion patterns of an articulated robot of an arbitrary morphology in an unknown environment.

Properly coordinated rhythmic movements for locomotion are ubiquitous in animals. Biological locomotor systems (usually involving coordinated limb movements) evolved to be highly adaptable, dextrous, and energy efficient. Consequently, they are a major source of inspiration when designing robot locomotion systems. Most biological locomotor systems involve neural networks acting as central pattern generators (CPGs), which are responsible for producing the basic rhythmic patterns for the oscillatory movement of limbs (Cohen, Rossignol, & Grillner, 1988; Stein, Grillner, Selverston, & Stuart, 1997). Understanding the subtleties of operation of such networks and how to design artificial versions for robotic applications are ongoing challenges (Ekeberg, 1993; Kimura, Akiyama, & Sakurama, 1999; Ijspeert, 2001; Ijspeert, Crespi, Ryczko, & Cabelguen, 2007).

While off-line search methods such as evolutionary algorithms or other global optimization processes have been extensively used to determine neural parameters for CPG-based robot locomotor systems (Gallagher, Beer, Espenschied, & Quinn, 1996; Ijspeert, 2001; Kamimura et al., 2003; Itoh, Taki, Kato, & Itoh, 2004; Floreano, Husbands, & Nolfi, 2008), the size and complexity of the search spaces often grow exponentially with regard to the number of variables, making the methods computationally expensive and time-consuming. Coupled with this, it is often very difficult to devise evaluation methods and metrics that can adequately cover the enormous number of unexpected situations that a robot can encounter during its lifetime, such as environmental change or body defects. This naturally led to efforts to develop adaptive methods that can be used online on the robot. Among these, reinforcement learning (RL) (Matsubara, Morimoto, Nakanishi, Sato, & Doya, 2006; Nakamura, Mori, Sato, & Ishii, 2007) and fast heuristic optimization algorithms (Sproewitz, Moeckel, Maye, & Ijspeert, 2008) have been successfully used. More systematic approaches such as continuous self-modeling, employing a number of stochastically optimized internal models (Bongard, Zykov, & Lipson, 2006), have also been developed. Although these are useful methods that allow more efficient online adaptation, they are not always free of the inherent difficulties of stochastic search (balancing exploration and exploitation, computational efficiency) and therefore often need to incorporate a priori knowledge or make use of a biased learning strategy in order to simplify and speed up the learning process.

Partly because of these issues, the exploitation of intrinsic chaotic dynamics has recently emerged as an attractive alternative approach to the real-time online exploration of the space of embodied motor behaviors of a system. A number of bio-inspired robotics experiments have demonstrated its power in this context (Kuniyoshi & Suzuki, 2004; Pitti, Lungarella, & Kuniyoshi, 2005; Pitti, Niiyama, & Kuniyoshi, 2010). The research presented here significantly extends this direction by showing how to achieve an integrated system for the goal-directed exploration, capture, and learning of motor behaviors.

### 1.1. Chaotic Neural Dynamics and Behavior.

A key influence on the current work is the growing body of observations of intrinsic chaotic dynamics in nervous systems (Guevara, Glass, Mackey, & Shrier, 1983; Rapp, Zimmerman, Albano, Deguzman, & Greenbaun, 1985; Freeman & Viana Di Prisco, 1986; Wright & Liley, 1996; Terman & Rubin, 2007). Some studies indicate intrinsic chaotic dynamics in animal motor behaviors at both the neural level (Rapp et al., 1985; Terman & Rubin, 2007) and the level of body and limb movement (Riley & Turvey, 2002). These seem particularly prevalent during developmental and learning phases (e.g., when learning to coordinate limbs) (Ohgi, Morita, Loo, & Mizuike, 2008). The existence of such dynamics in both normal and pathological brain states, at both global and microscopic scales (Wright & Liley, 1996), and in a variety of animals, supports the idea that chaos plays a fundamental role in neural mechanisms (Skarda & Freeman, 1987; Kuniyoshi & Sangawa, 2006).

Although the functional roles of chaotic dynamics in the nervous system are far from understood, a number of intriguing proposals have been put forward. Freeman and colleagues have hypothesized that chaotic background states in the rabbit olfactory system provide the system with “continued open-endedness and readiness to respond to completely novel as well as familiar input, without the requirements for an exhaustive memory search” (Skarda & Freeman, 1987). Kuniyoshi and Sangawa (2006) made the important suggestion that chaotic dynamics underpin crucial periods in animal development when brain-body-environment dynamics are explored in a spontaneous way as part of the process of acquiring motor skills.

Recent robotics studies have demonstrated that chaotic neural networks can indeed power the self-exploration of brain-body-environment dynamics in an embodied system, discovering stable patterns that can be incorporated into motor behaviors (Kuniyoshi & Suzuki, 2004; Kinjo, Nabeshima, Sangawa, & Kuniyoshi, 2008; Pitti et al., 2010).

### 1.2. Embodiment and Locomotion.

Studying neural circuitry underlying the generation of rhythmic motor behavior in isolation ignores the considerable advantage that can be obtained from incorporating the physical body and its environment—that is, exploiting the embodied nature of such behavior (Wheeler, 2005; Pfeifer & Bongard, 2007). In robotics, this has led to efforts to exploit ready-made functionality provided by the physical properties of an embodied system. One such line of inquiry involves using a frequency adaptive oscillator that can be tuned to the resonant frequency of the mechanical system (Buchli, Righetti, & Ijspeert, 2006; Raftery, Cusumano, & Sternad, 2008). Although this kind of adaptation accounts for some of the requirements for efficient locomotion, we believe that in general, the appropriate phase relationships between limbs should take priority when dealing with the creation of new motor patterns. One of the seminal works from this perspective is the exploration and acquisition of motor primitives, for a simple robot, using a mechanism that is embodied as a coupled chaotic field (Kuniyoshi & Suzuki, 2004; Pitti et al., 2005). That work modeled an extreme version of embodied coupling that had no electrical connection between neural units, with all neural coupling acting indirectly through body-environment dynamics. Neural oscillators were implemented using a simple logistic map with chaotic behavior, and the system dynamics rapidly developed to a stable, coherent rhythmic motion by using mutual entrainment between the neural circuit and the body-environment system. The process was completely deterministic, not making use of any random search method. More tractable systems (Pitti et al., 2010) have shown that a simple 2D simulated biped controlled by indirectly coupled chaotic maps can generate stable locomotion when the coupling strength between controller and body was set in the specific regime of phase synchronization. Phase synchronization between chaotic controller and physical system allows the flexible self-assembly of motor patterns and adaptive frequency matching to the resonant frequency of the body. However, the motor patterns that emerge through phase synchronization do not necessarily produce sustained locomotion behaviors unless the coupling strengths are properly set for a given neuromechanical system. Also, a more biologically plausible system was developed by Kuniyoshi and Sangawa (2006) in which a realistic musculo-skeletal model was employed with neural control circuits consisting of model CPGs. This was embedded within a larger system involving cortical maps. The biomechanical system was modeled as a series of redundant muscles acting on a joint, and information on the muscle combinations for any discovered coherent motor patterns was engraved on the model cortices as a sensorimotor representation. Later work (Kinjo et al., 2008) demonstrated the learning and replay of a motor pattern by adding a simple perceptron with a backpropagation learning on top of the previously learned sensorimotor maps. They showed that the representative power of the self-organized sensorimotor maps can greatly simplify the nontrivial sensorimotor learning problem into a simple mapping between the sensor and motor maps, but the learning pattern was manually fed to the system during learning; hence, it cannot be regarded as an example of an autonomous and goal-directed exploration-learning scheme.

Until now, concrete general methodologies for applying such techniques to the automatic generation of desired motor patterns for autonomous robots have remained elusive. In this letter, we build on the essential concepts of prior work, extending and generalizing it as we attempt to develop a generally applicable methodology based around self-organization through chaotic dynamics for neural-body-environment coupled systems. We present a study of goal directed online exploration of rhythmic motor patterns in an oscillator system coupled through physical embodiment, specifically generating forward locomotion behaviors without prior knowledge of the body morphology or its physical environment. This is explored in the context of simulated limbed robots. In an important departure from the previous work outlined above, our recent study (Shim & Husbands, 2010) introduced an approach to explore and drive system dynamics toward a desired state by employing the concept of chaotic mode transition with external feedback (Davis, 1990), which exploits the intrinsic chaoticity of a system orbit as a perturbation force to explore multiple synchronized states of the system, and stabilizes the orbit by decreasing its chaoticity according to a feedback signal that evaluates the behavior. This enabled the system to perform a deterministic search guided by a global feedback signal from the physical system, which facilitates an active exploration toward a desired behavior. This preliminary work showed how to guide the system orbit to selectively settle in one of the stable patterns, but the system was restricted in that it was unable to capture and learn high-performing transient (unstable) patterns. The research described in this letter enhances our previous study by addressing those deficiencies and provides a coherent integration of these procedures into a dynamical systems framework, building a complete self-driven exploration-capture-learning system.

## 2. Chaoticity as a Perturbation Strength

Conventional optimization strategies generally use (external) stochastic perturbations on system parameters for search space exploration. However, a few studies have addressed the effectiveness of a chaotic system replacing a stochastic source (Parker & Chua, 1989; Ott, Sauer, & Yorke, 1994), and have found that a deterministic chaotic generator outperforms a stochastic random explorer (Zhang & Shao, 2001; Morihiro, Isokawa, Matsui, & Nishimura, 2005). In these cases, the chaotic dynamics acts as an external module generating perturbations that cause system parameters to wander in parameter space. However, the adaptive chaotic search method presented here, using bifurcation to chaos, can directly drive the phase orbit of a bodily coupled system (where the neural elements are coupled indirectly through physical embodiment) for exploration because of the endogenous existence of chaotic dynamics in the system itself. The intrinsic dynamics of the system naturally power the search process without the need for external sources of noise.

The general idea of applying a chaotic search method that uses adaptive parametric feedback control had been previously presented in the field of optical sciences (Davis, 1990; Aida & Davis, 1994) and for memory search, where memory is stored as cyclical pattern sequences in a neural network (Nara & Davis, 1992). It has been argued that this method should be generally applicable when the target device is capable of supporting a variety of stable modes, between which there exist chaotic transitions, and which interacts with its environment such that there exists a feedback signal evaluating whether the mode is suitable or not. Chaotic transitions allow the system to try each of the modes sequentially, and the mode evaluated as suitable is selected and stabilized by changing a device chaoticity parameter to take it into a multistable regime. This can be thought of as a controlled version of the concept of chaotic itinerancy (Kaneko, 2003), where the system wanders from one quasi-attractor to another, getting entrained in each of them for a while. An indirectly coupled neural-body-environmental system, such as the one used in this letter, has the required characteristics of such a device, including multiple coordinated oscillation modes. It is known that a properly designed oscillator network can have multiple synchronized states that exhibit stable oscillations for both discrete (Feudel, Grebogi, & Yorke, 1996) and continuous (Vadivasova, Sosnovtseva, Balanov, & Astakhov, 1999) systems, and the structure of emergent behavior in these systems often reflects the spatial distribution of coupling strengths (Kaneko, 1994). Accordingly, a network of oscillators coupled through physical embodiment forms multiple synchronized states that reflect the body schema and its interactions with the environment, and each of them represents a potential candidate for meaningful locomotion behavior.

A conceptual description of the chaotic search process is illustrated in Figure 1. The goal of the system can be regarded as finding and becoming entrained in the basin of a particular attractor that has high performance (denoted by C) while escaping from the low-performing attractors (A and B) regardless of the initial point in the state-space. The idea is to open a new pathway that connects those isolated basins through the use of an additional dimension afforded by changing the system dynamics through tuning the chaoticity according to the evaluation signal. The orbit will visit and evaluate each of the attractors (A, B, C) systematically, yet chaotically, by adaptively varying the bifurcation parameter of the system according to a feedback signal until it reaches the basin of the desired attractor. The process can be interpreted as a continuous and deterministic version of trial-and-error search that exploits the intrinsic chaotic behavior of the system.

## 3. The Integrated Exploration-Learning System

The architecture of the neural part of the system developed in this letter is based on Kuniyoshi and Sangawa (2006) model, which is inspired by the organization of spinobulbar units in the vertebrate spinal cord and the medulla oblongata (the lower part of the brainstem, which mainly deals with autonomic, rhythmic, involuntary functions). But we use a more compact and modular configuration for each joint of the limbed robot and significantly extend the model to allow goal-directed exploration and learning. It is intended to be applicable to a wide range of robotic systems. The architecture consists of a number of identical control modules connected to each of the body parts. Each neuromuscular system for a joint that receives afferent sensory input and gives motor output to an antagonistic muscle pair can be encapsulated as a single motor unit, and the whole system consists of *N* identical motor units where *N* is the number of degrees of freedom of the robot (see Figure 2). Therefore, the system consists of uncoupled identical weakly forced limit cycle oscillators and a series of first-order leaky integrator equations. Prior work has demonstrated that uncoupled weakly forced oscillator systems can operate in stable modes (Kuniyoshi & Sangawa, 2006), and since our extensions to this work mainly involve elements based on stable first-order dynamics, it was possible to develop a system that can be stably operated with an arbitrary body-environment configuration.

### 3.1. The CPG Model.

Each motor unit has a pair of CPG neurons modeled by Bonhoeffer-van der Pol (BVP) equations (Asai, Nomura, Abe, & Sato, 2003), which drive the corresponding joint. When interacting with the body and environment, the motor unit can adjust its chaoticity by varying the difference between control parameters of the oscillators in the CPG pair. These differences change identically in all motor units as a function of the evaluation signal, acting as the global bifurcation parameter for the chaotic exploration with adaptive feedback. The BVP model allows the phase relationship between CPG activity and body motion to be flexibly locked according to a loop delay (Ohgane, Ei, & Mahara, 2009), which is a beneficial feature for covering a range of sensorimotor delays originating from different body-environment configurations. All CPGs in the system are fully interconnected in the electrical sense, but they are functionally disconnected during exploration (by having zero connection weights). When the system dynamics are stabilized by discovering a useful pattern, the connection weights become nonzero, according to a learning procedure described later, and the fully interconnected network is activated.

*m*is expressed as follows: is a time constant, and

*a*= 0.7,

*b*= 0.675,

*c*= 1.75 are the fixed parameters of the oscillator (Asai, Nomura, Abe et al., 2003). Each consecutive pair in the set of 2

*N*oscillators is sequentially allocated to each motor unit as

*l*=2

*m*−1 and

*r*=2

*m*(we use expressions such as

^{m}

*x*and

_{l}^{m}

*x*to refer to the

_{r}*m*th motor unit where it avoids confusion; see Figure 5). = 0.013 and = 0.022 are the coupling strengths for afferent input

*H*(

*s*), which is a function of raw sensor output

*s*, processed by the sensor adaptation module (SAM) described in the next section.

*F*is a coupling term between oscillators and is subject to the learning process.

^{j}_{i}*z*

_{1}and

*z*

_{2}are the control parameters for adjusting the chaoticity of the motor unit. Their difference () changes identically in all motor units and acts as the global bifurcation parameter. In the stable regime where the two control parameters are symmetric, it had been found (Asai, Nomura, Sato et al., 2003) that the two coupled BVP equations exhibit bistable phase locking of their oscillations in a parameter range of 0.6<

*z*

_{1}=

*z*

_{2}<0.88. From the observation of a number of experiments on the oscillator dynamics, we chose to fix

*z*

_{2}=0.73 and to vary

*z*

_{1}in order to ensure a higher probability of multistability of the system in its stable regime. Note that we need to preserve the topology of indirect couplings between oscillators close to that of Asai's basic form (couplings from excitatory nodes to all nodes; Asai, Nomura, Abe et al., 2003), but slight variations in the sensor input term need be made for some sensor designs (refer to section A.2.1 in the appendix for examples).

### 3.2. Homeostatic Sensory Regulation.

The sensor adaptation module (SAM) performs homeostatic adaptation (Turrigiano & Nelson, 2004; Turrigiano, 2008) for sensor input by calibrating the raw sensor signal using a linear transformation, which continuously adjusts the amplitude and offset of the periodic sensor signal in order to closely match its waveform to that of an antagonistic oscillator output. The sensory signal (in most cases, mechanosensory information from haptic sensors or muscle afferents) may vary according to the choice of sensors and the different body-environment interaction conditions. If the incoming signal is too large, the chaoticity of the system will be lost; if too small, the neural signals will be uncorrelated. The regulation of sensory activation ensures that the oscillator pair in a motor unit maintains a certain level of information exchange close to that of a weakly coupled oscillator pair so that the network dynamics are regulated within an appropriate range to generate flexible yet correlated activities. This also ensures the chaoticity of a motor unit is controlled in a systematic and collective way by the feedback signal regardless of the physical properties of the robotic system and the type of sensors.

*H*(

*s*,

*t*) is the implementation of a SAM. Given raw sensor signal

*s*and antagonistic oscillator output

*n*, the adaptation function

*H*(

*s*,

*t*) is where represents the continuous running average of

*x*as calculated from (this meaning for is used throughout the letter). The raw sensor signal

*s*is linearly transformed by a multiplicative factor

*e*

^{A(t)}and an additive factor

*B*(

*t*). The multiplicative function

*A*(

*t*) is updated by comparing the difference of the root mean square of the temporal average of the squares of the antagonistic neural output

*n*and the transformed incoming signal

*H*(

*s*,

*t*), which is analogous to the signal energy that reflects the strength or amplitude.

*B*(

*t*) is used as part of the scheme to remove the offset bias: each signal is subtracted by its average offset ( and ) before calculating the energy difference.

*B*(

*t*) is updated by the offset difference between two signals. The timescale of adaptation should be set longer than that of the oscillator, and we used as the timescale of performance evaluation () throughout this work, as described in the next section.

### 3.3. Evaluation and Feedback Bifurcation.

During exploration, the bifurcation parameter continuously drives the system between stable and chaotic regimes as a function of the evaluation signal. The evaluation signal is determined by a ratio of the actual performance (e.g., forward speed) to the desired performance. If the performance reaches the desired performance, the bifurcation parameter decreases to zero, and the system stabilizes. Since the robotic system is arbitrary, we do not have prior knowledge of what level of performance it can achieve. Drawing on concepts from goal-setting strategies (Barlas & Yasarcan, 2006) and the Rescorla-Wagner model of conditioning (Rescorla & Wagner, 1972), the dynamics of the desired performance are modeled as a temporal average of the actual performance, such that the expectation of a desired goal is influenced by the history of the actual performance experienced.

*E*is measured by the forward speed of the robot. Since the system has no prior knowledge of the body morphology of the robot, it does not have direct access to the direction of movement or information on body orientation. In order to facilitate steady movement in one direction without gyrating in a small radius, the center of mass velocity of a robot was continuously averaged over a certain time window, and its magnitude was used as the performance of the system. The performance signal

*E*at any time instance can be calculated by applying a leaky integrator equation to the velocity vector as follows: The timescale of integration was set as where

*T*( in our BVP model) is the period of an oscillator. The time course of the bifurcation parameter (=

*z*

_{2}−

*z*

_{1}) is given by determines the timescale of the change of and is normally set faster () than the oscillation period (

*T*) of the controller. If its value is too high, stabilization of the system dynamics is significantly delayed, which results in a partition mismatch (Aida & Davis, 1994). If it is too low, fluctuates too much according to the undulation of the robot movement, which acts as a disturbance for stabilization, or the system can become locked in a ring of undesirable patterns in a regime of intermediate chaoticity. was used throughout this work.

*G*(

*x*) implements a decreasing sigmoid function that maps monotonically from (0, 1) to (1, 0). 16

*x*−8 shapes the sigmoid function so that the boundary value at

*x*=1 and its derivative () become almost 0 so as to make the function smoothly vanish to zero. We automatically set

*G*(

*x*)=0 when , since the bifurcation parameter should be zero in order to make the system completely stable. The dynamics of the desired locomotion performance,

*E*, which slowly decays toward the current performance, is described by where is set sufficiently large so that

_{d}*E*does not follow

_{d}*E*too fast ( in this work). Since

*E*continuously decays toward

_{d}*E*, the changing speed of the control parameter depends on both and . Since

*G*(

*x*) decreases to zero asymptotically, was set to zero when it fell below a small threshold (0.0001), which also allows some margin for the system to stay in the stable regime () despite the small oscillation of

*E*/

*E*near unity. varies in the range [0, ] where is the maximum level of chaoticity of the system. From the analysis of a single BVP oscillator, it is well known that it exhibits Hopf bifurcation with an increase of the parameter

_{d}*z*(Nomura, Sato, Doi, Segundo, & Stiber, 1993). An analytically estimated critical value of

*z*

_{1}for equations 3.1 and 3.2, without their coupling and input terms, is

*z*

_{1}=

*z*=0.38247, which indicates that the maximum possible value of is . However, because the situation is different from the dynamics of a single oscillator, experiments on the robotic systems presented here revealed that the actual behavioral criticality of varies slightly (e.g., for a swimmer, for a quadruped) among different body and environmental settings. One way to determine the system-specific criticality of the control parameter is to simply observe the dynamics of the system with fixed . If the system is beyond its critical state, one of the oscillators in the motor unit will generate near-zero amplitude by crossing a Hopf bifurcation point. Normally we chose to be slightly less than its maximum observed value, taking into consideration the saturating region of the sigmoidal function

_{c}*G*(

*x*), so that it does not stay near the critical value for an unnecessarily long time when the oscillation amplitude becomes small.

Although this evaluation strategy does not explicitly impose a bias for continuously striving for higher-performing behaviors (because of the dynamics of *E _{d}*), an implicit bias toward better-performing behaviors is partially imposed on the system by the way in which the bifurcation parameter behaves as a function of

*E*/

*E*(see equation 3.9). Once the system has been stabilized to some behavior, the speed of system destabilization, for a given amount of behavior degeneration, depends on the performance level of the initially stabilized behavior. In the quasi-periodic regime that occupies a large portion of the entire system dynamics ( in the lower saturation part near zero and middle part of the sigmoid function,

_{d}*G*(

*x*)) the phase relationships of ongoing patterns shift slowly, while fast and catastrophic change occurs in the chaotic regime where is located around the upper saturation part (near ) of

*G*(

*x*). When the actual performance

*E*of a stabilized behavior decreases by a given amount, a low-performing behavior is destroyed more quickly because

*E*will be relatively small, while a high-performing behavior is smoothly degenerated, giving it much more of a chance of being sustained or reentrained to itself. In this way, in practice, the system fully stabilizes onto behaviors that exhibit stable relatively high performance.

_{d}### 3.4. Learning of Emergent Patterns.

As the exploration process stabilizes the system by discovering a high-performing locomotor behavior, the synaptic connections between oscillators are dynamically wired using an adaptive synchronization learning scheme. We adapted a learning model developed by Doya and Yoshizawa (1992), that decomposes the problem of weight learning between oscillators into a collection of cellular-wise processes by adjusting the input connection weights (also called the phase-lock matrix) of individual neurons to maintain a given phase relationship between the cellular activity and incoming signals. This is available only when the phase relationship between the neuronal activity and input signals is presented in advance, which provides a suitable interface for our exploration system. The coupling strengths are continually adjusted to follow the emergent patterns in parallel with the exploration process until the system is stabilized by discovering a desired pattern. When a switching parameter ( in equation 3.14, which is determined by the global bifurcation parameter, ) is triggered around the onset of system stabilization, the decrease of the learning rate of the phase-lock matrix and the activation of oscillator couplings simultaneously take effect. The learning rules are set up such that during the exploration phase, the couplings effectively remain functionally inactive. As dictated by equations 3.13 and 3.14, the coupling gain *g* is turned on when the bifurcation parameter goes to zero, which means learning is activated when the system is stabilized to some discovered pattern. Otherwise () the system is in an exploration phase and *g* is set to zero, which turns off the learning. Since the coupling is not strong and is activated gradually, highly unstable patterns that show short-lived high performance are naturally filtered by the instability of the pattern itself during the activation period (the system destabilizes and returns to the exploration phase). Thus, exploration and learning are merged as a continuous dynamical process such that the desired locomotion pattern is spontaneously explored, discovered, and memorized in a coherent way.

*x*and

*y*(in equation 3.1 and equation 3.2) of oscillator

*i*as

*x*

^{1}

_{i}and

*x*

^{2}

_{i}. Considering

*M*(=2

*N*, where

*N*is the number of degrees of freedom of the robot) fully connected oscillators, the coupling term

*F*for state

*j*(=1, 2) of oscillator

*i*(

*x*) can be written as where

^{j}_{i}*g*is a small feedback gain term and

*gp*represents the adaptive connection strength coupling from

^{jl}_{ik}*x*to

^{l}_{k}*x*, which forms a covariance-like learning rule. is the continuous running average of

^{j}_{i}*x*calculated with time constant . The full derivation of the learning rule can be found in section A.1. During the exploration process, the feedback gain

*g*and the weight learning rate are adaptively adjusted according to the global control parameter so that the couplings between oscillators are gradually activated around the onset of system stabilization.

*g*and are controlled according to where and are constants and

*D*(

*x*) is the heaviside function with very small . As the incoming weights are learned in order to match the sum of afferent signals close to the oscillator's signal, it is sufficient to use (input weights in equations 3.1 and 3.2), which has similar intensity to the sensory input. was set to have the same timescale as the evaluator. is the smooth activation signal that controls both the learning rate of connection weights and feedback gain according to the value of . This signal gradually activates the functionally connected network rather than suddenly switching it on, thus preventing the destruction of stable patterns while allowing unstable ones to be filtered out.

## 4. Experiments with Simulated Robots

Detailed experiments with the framework described above used the two simulated robots shown in Figure 3: a four-armed aquatic swimmer and a quadruped.^{1} Initial experiments used the swimmer, which has four fins, each at the end of a separate arm, placed in a simulated hydrodynamic planar (2D) environment. Since the information transfer between CPGs is mediated by sensory information, the information structure provided by physical embodiment is considerably influenced by the design and choice of sensory systems. While it is possible to use composite sensory information from multiple sensors (e.g., a combination of the input from fin sensors and muscle receptors), for simplicity we use only a single fin angle sensor for a motor unit. This requires a slightly modified sensor input term in the CPG equations in order to make the pair of CPGs in a motor unit deal with a single sensor (see equations A.11 and A.12). The functional structure of coupling between motor units through embodiment is formed by the transmission of hydraulic reaction forces from one arm to the others as the body articulates. The robot's radially symmetric shape in a 2D underwater environment is interesting because it makes generating continuous asymmetric propulsion forces challenging: forward locomotion is nontrivial. The robot will not be able to move in a single direction unless the movements of all four arms are successfully coordinated with appropriate phase differences.

### 4.1. Exploration of Stable Patterns Without Oscillator Learning.

First, we fixed the bifurcation control parameter to the stable regime (, no chaotic search) and ran the 4-fin swimmer simulation to see what kinds of behaviors emerged from various initial states. More than 1000 simulation runs were tested in order to observe and categorize the behaviors. Basic movement behaviors of the swimmer were categorized into motion in four directions (along the body axes dir1, dir2, dir3, and dir4, as shown in Figure 3), which met expectations given the symmetric shape of the swimmer.

Taking the directional symmetry into account, we observed six different behaviors and classified them according to the locomotion performance, as shown in Table 1; their phase relationships are shown in Figure 4. The forward locomotion involves straight movements (ST), moving in circles (STC), and peg-leg (PL) motions. ST locomotion is a frog-like swimming action that has the highest performance (see Figure 8), and STC motion moves in a circle due to a slight asymmetry between contralateral arms caused by passive fin dynamics and can be either clockwise or counterclockwise. PL motions involve one of the arms moving with a small amplitude while the other three arms all use the same large amplitude. The phase relationship of the PL pattern is essentially similar to that of bound antiphase, except that the amplitude of one arm is smaller than the others and its phase continuously shifts (with a small irregularity) compared to the others, which achieves a slow forward locomotion by asymmetric propulsion forces.

Pattern . | Number of Variations . | Average E
. |
---|---|---|

1. Straight (ST) | 4 (each dir) | 0.7 |

2. Circular (STC) | 8 (4×(CW,CCW)) | 0.6 |

3. Rotate (R) | 2 (CW,CCW) | 0.06 |

4. Peg-leg (PL) | 4 (each arm) | 0.04 |

5. Vibration (VB) | 2 (dir 1-3 and 2-4) | 0.03 |

6. Bound antiphase (BA) | 1 | 0.0 |

Pattern . | Number of Variations . | Average E
. |
---|---|---|

1. Straight (ST) | 4 (each dir) | 0.7 |

2. Circular (STC) | 8 (4×(CW,CCW)) | 0.6 |

3. Rotate (R) | 2 (CW,CCW) | 0.06 |

4. Peg-leg (PL) | 4 (each arm) | 0.04 |

5. Vibration (VB) | 2 (dir 1-3 and 2-4) | 0.03 |

6. Bound antiphase (BA) | 1 | 0.0 |

Also nonlocomotion movements were observed such as bound antiphase (BA), vibration (VB), and rotation (R). BA motion results in no net movement of the robot torso due to antiphase locking between adjacent pairs of arms. VB arm movements are contralaterally antiphase and ipsilaterally in-phase based on the vibrating axis. The movements of arms in the rotation motion are out of phase with each other and fluctuate irregularly. The fluctuation and shifting of phase relationships suggest that an emergent behavior does not necessarily exhibit concrete phase locking between subsystems in the neuro-body-environment setting.

If a robot behavior was observed as being permanently sustaining, it was identified as an individual behavior. The number of completely stable behaviors in the absence of oscillator learning was determined to be six, without counting their variations. The shape of the 4-fin swimmer robot is radially symmetric, so different synchronized pairs of joints (variations) can exist for a single behavior. For example, the straight swimming behavior has four different combinations of synchronized joint pairs, all of which show the same frog-like swimming behavior. As shown in Table 1, there are 21 different arm coordinations when including all variations. Careful viewing reveals that the circling movement (STC) can show slightly different circling radii resulting from small differences in passive fin tilting, but these are too small to be considered separate distinguishable behaviors. In order to keep the analysis clearer, these kinds of variations are not counted as different behaviors.

Note that the PL patterns appear as a stable pattern only when sensory homeostasis is present. Sensor adaptation makes the lame arm synchronize with the corresponding motor unit with a small amplitude, resulting in the partial loss of the phase correlation with the other arms as it transfers the inertial or hydrodynamic forces less strongly to them through physical embodiment. Again, the motion of the other three arms is coordinated in such a way that the net forces are transferred at a reduced rate to the lame arm. Therefore, the homeostatic regulation of sensory signal results in an opposing effect, which leads to the diversification of limb motion, that is, the multiple combinations of the amplitudes and offsets of limb motions can be explored and stabilized by sending the standardized sensory input signals to the neural controller (see Figure 9). In turn, different limb-wise oscillations may cause different interlimb coordination as well.

The stable dynamics of the system begin to fluctuate as increases, exhibiting a series of transient dynamics from quasiperiodicity to chaos (see Figure 5). In the higher chaotic regime, complex transitory dynamics similar to chaotic itinerancy occurs, which drives the system to briskly explore the phase space. To see the effect of chaotic search, the distribution of visits to each of the behaviors identified in Table 1 was investigated under the presence and absence of chaotic search. One hundred simulations were performed for each case, and the visiting counts of six major behaviors were recorded by observation. Figure 6 shows a clear difference between the visiting ratios in the two cases, suggesting the effectiveness of chaotic search, which tended to settle on high-performing dynamically stable locomotion. During the search process, all variables and control parameters vary continuously as parts of the neuro-body-environment system, and the time evolution plots of phase differences, performances, and bifurcation parameter (see Figure 7A and 7B) show that the stabilization and destabilization of the system occur repeatedly in a trial-and-error manner until it settles on an effective form of locomotion. The sensor parameters (see Figures 7C and 7D) also change continuously and settle to different values through adaptation.

Due to the symmetric shape of the 4-fin swimmer, the BA motion has inherent dynamic stability with large basins in the phase space, so the system was often entrained in the BA pattern and sometimes took a relatively long time to reach one of the desired states. This deficiency, the so-called *deep-path* (Shim & Husbands, 2010), occurs when an orbit that tries to escape from BA by system destabilization is reinjected to BA, so the actual performance *E* stays low and the desired performance *E _{d}* decays close to

*E*. This makes the time spent in the chaotic regime shorter, resulting in reduced exploration and increased time to escape. The escape orbit is often stabilized to PL patterns, which indicates that these patterns are located in the vicinity of BA in the phase space. However, the use of an adaptive

*E*(see equation 3.10), sensory adaptation and oscillator learning have all helped to significantly alleviate this issue. Figure 10 shows an example of the exploration time taken for stabilization of the systems with and without adaptation. The fixed sensor gain of the nonadaptive system was chosen to produce a similar behavior category to the adaptive case. While the adaptive system was stabilized within 1000 cycles in general, a number of runs of the nonadaptive system showed it could take up to 10 times as long to stabilize compared to the adaptive system. Also the nonadaptive system exhibited bad-lock (Shim & Husbands, 2010) onto nonlocomotion patterns (rotation and vibration) where the bifurcation parameter does not reach zero but oscillates near zero being phase-locked with other system variables.

_{d}### 4.2. Stabilizing Transient Patterns by Oscillator Learning.

Often there are high-performing locomotion patterns that are not completely stable and appear for only a while during the exploration process. These transient target behaviors can be captured and memorized by the oscillator learning process. We tested this using a “damaged” version of the swimmer robot by reducing the length of one of its fins (damaged fin) or removing one of its arms (three-armed), such that there are few or no stable patterns in the stable regime but there exist a series of useful transient patterns.

Figure 11 shows the exploration and learning of the robot with a damaged fin, where the length of the fin on arm 4 was reduced by 90%. It had only one stable pattern whose phase relationship is the same as that of the BA pattern in the undamaged robot, which has almost zero performance. With learning, it captured one of the high-performing transient patterns after a few trials. The approximate direction of locomotion is toward dir-3. Figures 11C and 11D show that the sensor gain (*A*(*t*)) of the damaged fin (fin 4) was increased to amplify its signal, and the fact that fin 1 has the smallest gain tells us that arm 1 is the main source of propulsion. The salient deviation of the offset (*B*(*t*)) of the fin 1 sensor (opposite side of fins 2 and 3) indicates that the discovered transient pattern involves the oscillation of fin 1 in a tilted position, granted by its mechanical compliance; consequently it compensated the asymmetric hydrodynamic forces and achieved forward locomotion. The homeostatic sensory regulation participates in the exploration process as the slow variables diversify the course of transient patterns during search and slows them down at the onset of discovery, which is beneficial to the real-time pattern capture by oscillator learning. While the case of the swimmer robot has shown a relatively limited variety of patterns due to its strong, embodied coupling resulting from the densely structured physical environment it inhabits (the robot is always surrounded by liquid and hence is continually subjected to significant hydrodynamic forces), we will see later that the effect of sensory regulation on terrestrial movements becomes more prominent. Figure 12 shows a particular case of an alternative three-armed robot (formed by removing arm 4) where two different locomotion patterns are periodically exchanged while not losing the overall stability of the whole behavior. The robot alternates its moving direction between dir-3 and dir-4 by exchanging two unstable undulating motions. The periodicity of this conjoined behavior also exhibits a small degree of irregular fluctuation as in the case of loosely coordinated behaviors previously shown in Figure 4. However, being captured by oscillator coupling, it is sustained by global coordination between subsystems that include adaptive sensor dynamics.

Since the oscillator learning process is automatically regulated by a control parameter (), it is possible to operate the exploration-learning system continually without reset. Figure 13 shows a typical successful example of the real-time recovery of locomotion behavior after body damage of an unknown variety, that is, with no a priori knowledge. During an initially learned stable behavior (similar to STC-dir3-CCW), the same damage as in Figure 11 was sustained. The performance of the robot immediately dropped below *E _{d}*, and the system entered into the search phase. After a few hundred cycles, the system found a new locomotion behavior for the changed body (undulating movement similar to Figure 11). The superposed graphs of two behaviors (see Figures 13E and 13F) show a slight frequency increase in arm movements after recovery due to the change of mechanical impedance of the robot.

### 4.3. Quadruped Locomotion.

We demonstrate the generality of the approach by also applying it to a quadruped robot in a 3D terrestrial environment. The stretching force (see equation A.13) experienced by a torsional muscle was used as the sensory signal and fed to the CPG in the relevant motor unit. Under conditions where static stability against gravitational force is guaranteed in both the 2D swimmer and 3D quadruped, the walking machine has fewer behavioral constraints for producing forward locomotion since the resistance force is not always present in the 3D terrestrial environment (e.g., there is no friction on a leg as it moves through the air during a swing phase). The neural-body-environmental phase space of the quadruped can be envisaged as an undulating landscape of rolling hills, while the 2D swimmer case has a few deep basins of attraction. While this increased the number of candidate patterns for forward locomotion in the quadruped, there existed latent instabilities such as slipping due to dynamic friction or the spontaneous occurrence of sharp-amplitude, high-frequency perturbation stemming from the ground contact, all of which caused a slow degeneration of the ongoing locomotor pattern. In practice, the movement patterns of the quadruped observed in the stable regime of the oscillator system exhibited no ultimately permanently sustained behavior (also true in tests on other walking robots). Interestingly, locomotor patterns similar to the quadruped walking gait frequently emerged during exploration (see Figure 14). Other kinds of as-it-could-be gait patterns and their variations that exploit given active and passive dynamics were also observed, which are difficult to categorize qualitatively.

The degeneracy of locomotor behavior could be greatly improved by using homeostatic sensory adaptation and then completely stabilized by oscillator learning. Figures 15A and 15B show the system behaviors for quadrupeds with and without sensor adaptation. All experiments were started from the same initial condition. In Figure 15A, the sensor adaptation was turned off when the system was stabilized to the first discovered pattern. The performance of the emergent pattern in the adaptive system (see Figure 15B) degenerates much more slowly than in the nonadaptive case. Sensor adaptation prevented abrupt changes in phase relationships by buffering sudden changes of incoming sensor signals, so the initial movement pattern slowly changed, giving it a greater probability of being maintained. The patterns could be completely stabilized by introducing oscillator learning (see Figure 15C). However, if oscillator learning was presented without sensory adaptation (see Figure 15D), the pattern could not be sustained completely because the oscillator coupling was not strong enough to maintain the coordinated pattern against the degeneracy. As a result, the role of homeostatic sensory adaptation becomes more prominent in the case of terrestrial behaviors. The experiments with the 2D swimmer have shown little variance of sensor parameters after convergence, and pattern degeneracy was hardly observed, which indicates that the transient patterns of the swimmer are strongly attracted to a small number of stable patterns. The adaptation of the sensor parameters of the quadruped yielded more diverse values, where the offset parameter (*B*(*t*)) of lower leg muscles (leg 5–8) typically showed notable deviation under the effect of constant body weight. In a few cases, the speed of degeneracy under the control of the oscillators after adaptation is so slow that the locomotor behavior, which appears stable, is eventually destroyed after a very long period of simulation, which triggers a period of readaptation. This can also appear in the form of a long-term behavioral periodicity (see Figure 16).

## 5. Summary and Discussion

We have presented an integrated system that can explore and learn the emergent behaviors of a neuro-body-environment system coupled through physical embodiment by applying a novel chaotic search method. The whole system is treated as a single high-dimensional dynamical system using intrinsic chaotic dynamics as a driving force for the exploration of its own emergent patterns. The search process is completely deterministic and is able to selectively entrain the system orbit to one of the patterns by imposing goal directedness toward a desired behavior. Adaptive calibration of incoming sensor signals was established by using homeostatic sensory regulation. By adjusting the waveforms of input signals to be close to those of the neural activities, the synchronicity between the neural and physical system was enhanced, and the neural system was able to cope with an arbitrary robotic system. The regulation in the input system resulted in the diversification of output behaviors in which the same neurosensory coordination could be achieved by different limb movements, accomplishing multiscale exploration. The discovered rhythmic pattern is memorized and sustained by wiring initially disconnected oscillators using an adaptive synchronization method. The oscillator learning process was naturally merged with the exploration system by using the emergent pattern as a supervising signal and could capture both stable and transient locomotor patterns in real time.

The overall process from the perspective of creating a new behavior can be briefly sketched as follows. The mutual entrainment between the neural and physical systems initially creates a phase space that contains several stable and transient patterns. If the current entrained state is not satisfactory, the system bifurcates to a chaotic state in order to escape from that state and restabilizes when a desired pattern appears. However, the phase space of the restabilized system differs from the previous one because some of the system parameters (sensor parameters) have also been changed by the chaotic drive. If we define the onset of stabilization (at the time becomes 0) as the time of returning, whenever the state orbit returns to the target space, it never experiences exactly the same phase space as before. This process is what we call multiscale exploration, and its eventual behavior after the onset of stabilization varies over different physical embodiments. The final dynamics of sensor adaptation after returning involves each parameter being locked around a particular value (potentially different for each parameter) with small oscillations. This diversity of parameter convergence can be regarded as the neutral stability of the system since different motor movements can cause the same sensory input. For the case of the 2D swimmer, which has a small number of strong basins of attraction, the sensor parameters tend to converge to one of the previous distributions, although their precise values may differ. The neutrality in the convergence of sensor parameters has a wider range in the case of the quadruped; hence more diverse stabilized behaviors are exhibited. Even in the case where the sensor parameters eventually converge to the same set of distributions, the intermediate trajectories before convergence can take various routes, which can be captured by oscillator learning, resulting in the creation of a new behavior. Therefore, this process differs from a simple action selection mechanism where predetermined stable patterns are selected by a chaotic jump. Rather, it creates various streams of transient patterns by driving both the state orbit and the system parameters using chaotic dynamics.

Although our system has demonstrated a good degree of generality and an ability to automatically adapt to unknown bodies and environments, further analysis is necessary in order to determine the optimum values of fixed parameters used in the search process. For example, the timescales of slow dynamics such as evaluation (), goal seeking (), sensor adaptation (), and feedback bifurcation () affect the search dynamics. Preliminary results of investigating the effect of different timescales revealed that the ratio between the timescales for evaluation, goal seeking, and feedback bifurcation determines the balance between the memorizing and forgetting of patterns during the search process (Aida & Davis, 1994), implying there might be an optimal ratio that allows the system to stay in the chaotic regime for an optimal duration (just enough to be uncorrelated with the previously visited pattern), enabling fast search with a very small probability of being trapped in a bad state for a long time. The timescale of the sensor adaptation can influence the landscape of phase space as well as the neutrality of convergence. A test using the 2D swimmer showed that when was decreased by 1/2, a new stable pattern appeared where the two arms moved with large amplitudes, whereas the movements of the other two were small and irregular. However, too fast a timescale caused large fluctuations in parameters, which disturbed stabilization or diminished the diversity of behaviors synchronized with the fast state dynamics. Another factor that influences the system is the amount of bandwidth of the information flow between neural elements mediated by physical embodiment, which is determined by the design of body-environment interactions. In the case of the 4-fin swimmer presented here, the functional coupling strength between motor units varies with body mass. Increased body mass will result in an increased moment of inertia, which causes less transmission of the hydraulic force from one leg to the others, and vice versa. A similar effect will be caused by decreasing the density of the surrounding fluid or by increasing fin stiffness.

As Kuniyoshi and Sangawa (2006) have stated, completely decoupled CPGs are an extreme model, which might deviate from biological reality. However, some biological studies point out the evidence for functional decoupling of the neural system during certain phases or behaviors. It has been hypothesized that decoupling of locomotor CPGs (as in our system) serves as a potential mechanism for the evolution of novel behaviors (Dubbeldam, 2001). Motion analysis of *Siren lacertina*, an eel-like amphibian (Azizi & Horton, 2004) has found strong evidence that the axial and appendicular CPGs are decoupled during aquatic walking (a pattern somewhere between aquatic and terrestrial locomotion), which supports the hypothesis that the decoupling of CPGs has led to the evolution of this novel behavior. In a broader perspective, Rosslenbroich (2009) pointed out that the locomotor neural processes of more evolved vertebrates are uncoupled from one another so that these parts can act in more differentiated and partly independent ways, which may contribute to the increase in organismic autonomy necessary for evolutionary innovation.

These emergent patterns may be refined and selected at the supraspinal level by reward-based reinforcement, which is thought to be one of the primary functions of the basal ganglia (BG) (Redgrave, Prescott, & Gurney, 1999; Schultz, 2006; Chakravarthy, Joseph, & Bapi, 2010). Recent modeling studies on BG (Sridharan, Prashanth, & Chakravarthy, 2006; Magdoom et al., 2011) hypothesize that the indirect striato-pallidal pathway through the subthalamic nucleus subserves exploratory behavior for goal-directed learning, gated by the dopamine signal from the substantia nigra which serves as the global learning signal for reward prediction. We hypothesize that goal-directed chaotic exploration may possibly take a role in such mechanisms in connection with self-organized behaviors. In this context, it might be possible to use our system to draw some implications about optimal parameters in relation to metalearning and neuromodulation centered around the BG (Doya, 2002).

Recent work has demonstrated the efficacy of morphological change within the context of locomotion behaviors created through an evolutionary search process (Bongard, 2011). An interesting area of future research will be to investigate whether the advantages of growth and development demonstrated in that work carry over to our method.

The system has also been successfully tested on other kinds of robots using identical neural controllers with the quadruped, further demonstrating its generality (see the movies presented in the URL in footnote 1). Although the final movement patterns produced by our work are never poor, they are not always perfectly optimized. Future work will explore the use of slightly more complex evaluation signals in this context. Also, we intend to incorporate adaptation to external perturbations, such as dealing with nonstationary environments. This might be achieved by using another adaptive system on top of the learned locomotion controller, or it may well be possible to develop such behavior within a slightly extended version of the current system. More intelligent and complex locomotion behavior could be achieved by using conventional learning methods or fuzzy control in conjunction with the concepts encapsulated in our system. The novel neuro-robotic system presented in the letter has been shown to be general and effective. The seamless interaction between the exploration and learning processes results in a system that can be thought of as continually self-monitoring in order to maintain an appropriate level of motor function. As well as being an effective means of developing robotic controllers, the method has more general implications for truly autonomous artificial systems, which must maintain their integrity on several levels, including behavioral. The work demonstrates the possibility of the spontaneous emergence of meaningful behaviors in a continuous dynamical system framework, an approach that deviates from conventional learning algorithms making use of repeated trials.

## Appendix

### A.1 Oscillator Learning.

*M*oscillators that are fully connected to each other. We denote the state

*j*of the oscillator

*i*as

*x*and write a compact expression for equations 3.1 and 3.2 with the coupling term

^{j}_{i}*F*, where is the state vector. The sensory input term was regarded as part of the oscillator dynamics to promote sensory influence in the global coordination of the learned oscillator network. Assuming that the oscillators produce sinusoidal waveforms, the phase-locked solution of the state vector of oscillator

*i*and those of the other oscillators can be expressed as a linear relationship, where is the 2×2 phase-lock matrix for the oscillators

*i*and

*k*. Suppose we already have a certain phase relationship between and other oscillators during the exploration process; then we can drive in order to satisfy the equality in equation A.2 by using a simple error feedback to the oscillator using the gradient of an objective function

*E*such that

_{i}*p*represents the (

^{jl}_{ik}*j*,

*l*)th element of the matrix, and

*g*is a feedback gain, which should be set small enough so that the ongoing oscillation is not distorted. Thus, we can rewrite equation A.1 by neglecting the small decay term

*gx*in equation A.4 as We can see that the feedback term represents the coupling term from other oscillators in that

^{j}_{i}*gp*is the coupling connection strength from

^{jl}_{ik}*x*to

^{l}_{k}*x*. The coupling matrix can be obtained using the same gradient descent learning with regard to

^{j}_{i}*p*. In order to eliminate any bias effect, the deviation of signal from its temporal average () was used for learning, where is the adaptive learning rate.

^{jl}_{ik}### A.2 Robot Simulation.

The robot simulations were implemented using open dynamics engine (Smith, 1998). The CPG and other differential equations were integrated using the Runge-Kutta (4th order) method with a step size of 0.0025 sec (the ODE simulation used the same step size). All code was written in .

4-Fin Swimmer . | Quadruped . | ||
---|---|---|---|

Torso dimension (m) | 0.2×0.2×0.2 | Torso dimension (m) | R: 0.05, L:0.9 |

Arm Dimension (m) | 0.075×0.075×0.15 | Leg dimension (m) | R: 0.05, L:0.3 |

Torso weight (Kg) | 1.6 | Torso weight (Kg) | 7.6 |

Arm weight (Kg) | 0.34 (×4) | Leg weight (Kg) | 1.44 (×8) |

Joint range (rad) | 0.25 | Joint range (rad) | Upper: 0.15 |

Fin dimension (m) | 0.2×0.2 | Lower: 0.1 | |

Fin weight (Kg) | 0.375 | Friction coefficient | 1.0 |

Fin stiffness (Nm) | 0.1 | Muscle parameters | |

Fin damping (Nms) | 0.045 | (Nm) | 7.935 |

Fluid density (Kg/m^{3}) | 1000.0 | (Nm) | 1.684 |

Muscle parameters | (Nm) | 20.0 | |

(Nm) | 1.076 | (Nms) | 1.156 |

(Nm) | 0.108 | ||

(Nm) | 20.0 | ||

(Nms) | 0.152 |

4-Fin Swimmer . | Quadruped . | ||
---|---|---|---|

Torso dimension (m) | 0.2×0.2×0.2 | Torso dimension (m) | R: 0.05, L:0.9 |

Arm Dimension (m) | 0.075×0.075×0.15 | Leg dimension (m) | R: 0.05, L:0.3 |

Torso weight (Kg) | 1.6 | Torso weight (Kg) | 7.6 |

Arm weight (Kg) | 0.34 (×4) | Leg weight (Kg) | 1.44 (×8) |

Joint range (rad) | 0.25 | Joint range (rad) | Upper: 0.15 |

Fin dimension (m) | 0.2×0.2 | Lower: 0.1 | |

Fin weight (Kg) | 0.375 | Friction coefficient | 1.0 |

Fin stiffness (Nm) | 0.1 | Muscle parameters | |

Fin damping (Nms) | 0.045 | (Nm) | 7.935 |

Fluid density (Kg/m^{3}) | 1000.0 | (Nm) | 1.684 |

Muscle parameters | (Nm) | 20.0 | |

(Nm) | 1.076 | (Nms) | 1.156 |

(Nm) | 0.108 | ||

(Nm) | 20.0 | ||

(Nms) | 0.152 |

*x*) and a simplified muscle stretch reflex (

*s*) according to the following canonical formulas based on the literature (Prochazka, 1999; Yakovenko, Gritsenko, & Prochazka, 2004):

_{m}*k*=0.1 is a constant, and is a denominator that normalizes the angle and the angular velocity of torsional muscle by the unit of its resting angle, and it is set to the maximum available joint angle () by assuming that the angle of the torsional muscle is stretched twice as much as its resting angle when the joint is at its neutral position. Although several types of proprioceptive feedback mechanisms, including groups Ia, Ib, II, and cutaneous afferents, operate on the spinal reflex system and their collective interaction accounts for the regulation of ongoing locomotor activities (Grillner & Wallen, 1985; Hiebert & Pearson, 1999; Pearson, Ekeberg, & Büschges, 2006; Rossignol, Dubuc, & Gossard, 2006), it would be sufficient to support the mechanical stability of muscles using a minimal model for a basic reflex loop since the group Ia pathway is the most sensitive of all. From the viewpoint of the global system, even the muscular-motoneuronal reflex loop can be broadly considered as part of the intact anatomical properties that may vary across the different robotic designs, which should be covered by the exploration process.

_{m}#### A.2.1 4-Fin Swimmer

*x*-

*y*plane, so that it effectively undergoes 2D dynamics. Each fin of the 2D swimmer was modeled as a nonlinear damped torsional spring, which is subject to simulated hydrodynamics (Shim & Kim, 2006), and its bending angle () was fed to the corresponding motor unit. The fin angle implements the stretch receptor at each side of the fin, so the afferent inputs

*s*in equations 3.1 to 3.4 were defined as and . By assuming that a fin sensor reflects the output difference of the oscillator pair in the corresponding motor unit (i.e.,

*s*

_{l,r}=

*f*(

*x*

_{r,l}−

*x*

_{l,r})), we use the following slightly reformulated CPG equations for the 4-fin swimmer. Thus, the reference neural signal for sensory adaptation in equation 3.6 should also be changed to

*n*=−

_{l}*n*=

_{r}*x*−

_{r}*x*. The time constant and the maximum bifurcation parameter used were and . All other parameters are as defined in equations 3.1 to 3.4.

_{l}#### A.2.2 Quadruped

### A.3 Simulation Parameters.

The detailed parameters for robots and physical simulation are described in Table 2.

## Acknowledgments

This work was funded by a departmental scholarship (Graduate Teaching Assistantship) and an ORSAS (Overseas Research Students Awards Scheme) award. Thanks to two anonymous reviewers for helpful comments on an earlier version of this letter.

## Note

^{1}

The flash streaming (FLV) as well as downloadable AVI files of the movies in this work are available online at http://www.informatics.sussex.ac.uk/research/groups/ccnr/movies/yssmovie.html. Videos 1–10 show the behaviors of 4-fin Swimmer and quadruped, and videos 11–15 show other kind of robots that use controllers identical to the quadruped's except the number of motor units (video 14).

## References

*Siren lacertina*