Abstract

Gain tuning is a crucial part of controller design and depends not only on an accurate understanding of the system in question, but also on the designer's ability to predict what disturbances and other perturbations the system will encounter throughout its operation. This letter presents ANUBIS (artificial neuromodulation using a Bayesian inference system), a novel biologically inspired technique for automatically tuning controller parameters in real time. ANUBIS is based on the Bayesian brain concept and modifies it by incorporating a model of the neuromodulatory system comprising four artificial neuromodulators. It has been applied to the controller of EchinoBot, a prototype walking rover for Martian exploration. ANUBIS has been implemented at three levels of the controller; gait generation, foot trajectory planning using Bézier curves, and foot trajectory tracking using a terminal sliding mode controller. We compare the results to a similar system that has been tuned using a multilayer perceptron. The use of Bayesian inference means that the system retains mathematical interpretability, unlike other intelligent tuning techniques, which use neural networks, fuzzy logic, or evolutionary algorithms. The simulation results show that ANUBIS provides significant improvements in efficiency and adaptability of the three controller components; it allows the robot to react to obstacles and uncertainties faster than the system tuned with the MLP, while maintaining stability and accuracy. As well as advancing rover autonomy, ANUBIS could also be applied to other situations where operating conditions are likely to change or cannot be accurately modeled in advance, such as process control. In addition, it demonstrates one way in which neuromodulation could fit into the Bayesian brain framework.

1.  Introduction

As greater reliance is placed on automated control in tasks such as navigation, scheduling, and system monitoring, it becomes more crucial that computer systems make decisions based on uncertain or novel data and that the decisions be made quickly and accurately enough to maintain nominal operation of the system. However, the design of conventional controllers relies on the engineer having an accurate model of the characteristics of both the system and the environment it is likely to be operating in; uncertainties or novelty are typically taken into account by adding variables to the model, which can have the unwanted effect of reducing accuracy, or by updating the model online, which increases computational load. In contrast to these controllers, biological systems are extremely successful at reacting to novel situations, especially when danger is present. The capacity of organisms to act on uncertain or novel inputs and learn from past experience has inspired the category of intelligent controllers, which includes neural networks, fuzzy logic, and evolutionary algorithms. While extremely successful at working with uncertainty or novelty, they are often impossible to analyze mathematically, making them unpopular in industry. More recently research has combined these biologically inspired techniques with more conventional control. This approach retains the interpretability of conventional control while using computational intelligence techniques to adaptively tune parameters.

One extreme example of a case where prior knowledge about the environment that the system will be working in is limited, and no physical human interaction with the system will be possible once the mission starts, is that of a walking robot for planetary exploration. In this case, it is desirable that the requirement for human interaction with the rover be kept to a minimum—for example, being restricted to high-level planning tasks such as identifying objects of interest for study—and the controller itself should handle responses to both external perturbations such as obstacles and internal perturbations such as parametric uncertainties. This letter introduces ANUBIS (artificial neuromodulation using a Bayesian inference system), a novel tuning mechanism for controller parameters, intended for applications where robustness and adaptability are paramount. ANUBIS has been applied to the three modules of the walking controller for EchinoBot, a hexapod rover for Martian exploration: gait generation, plotting a trajectory for each foot, and tracking that trajectory. The system has been simulated in three different operating environments, with results displaying both improved adaptability and reduced power and torque requirements. Unlike parameter optimization techniques such as genetic algorithms or reinforcement learning algorithms, convergence is guaranteed without requiring stochasticity due to the use of the (VB) variational Bayes algorithm (Beal, 2003), making the algorithm more predictable and reliable. Additionally, since the variables of interest are modeled as probability distributions, any inherent stochasticity (e.g., sensor noise) can be modeled and taken into account. The performance of ANUBIS is compared to a four-layer multilayer perceptron (MLP), which tunes the controller parameters (Passino, 2004); this tuning technique was selected as a comparison since a number of researchers have used neural networks for parameter tuning in robots (Ertugrul & Kaynak, 2000; Pardo, Angulo, Moral, & Catal, 2009; Parsa, Daniali, & Ghaderi, 2010) and since both neural networks and the artificial neuromodulatory system attempt to model the function of the brain. Other AI techniques have been used to tune SMCs, in particular, evolutionary algorithms and fuzzy logic; however, evolutionary algorithms tend to be too slow to be useful for real-time tuning, and fuzzy logic is not as well suited to optimization or learning as neural networks (Yu & Kaynak, 2009).

ANUBIS also attempts to answer two questions posed in Friston (2009): “What is the computational role of neuromodulation?” and “Does avoiding surprise suppress salient information?” In ANUBIS, neuromodulation changes what the system's perception of the true state is, encoding such values as saliency of stimuli and valence. In turn, the process of inference updates the uncertainty in sensed data, which can have an effect on neuromodulator levels.

This letter has six sections. Section 2 discusses the background of the project, covering the Bayesian brain and current understanding of neuromodulation. Section 3 introduces ANUBIS—the update rules for the artificial neuromodulators and the modified VB (variational Bayes) algorithm. Section 4 gives an overview of the controller and how it is integrated with ANUBIS. Section 5 presents the simulation environments and the results of the simulations, and discusses these results. Finally, section 6 concludes.

2.  Background

2.1.  Computational Models of Brain Function.

The brain's capacity for decision making is limited by its ability to infer an appropriate model of its environment and situation from its sensor data. The more accurate the model is, the more likely an action will have the desired consequence; however, forming an accurate model requires time and attentional effort. While neural networks can provide an accurate analog to the structure of the brain and produce behavior similar to that of biological systems, less is known about the actual processes by which the brain assesses and reacts to uncertainty and novelty and, hence, how it learns. Two of the main areas that must be addressed are representation (how the brain encodes variables such as uncertainty, novelty, and danger) and processing (how these variables are used).

Although the most commonly used model of biological learning is reinforcement learning, another technique, based on Bayesian probability, is growing in popularity. Building on the premise of the Bayesian brain, Friston, Daunizeau, and Kiebel (2009) propose that the brain uses the VB approach to inference rather than reinforcement learning. The VB algorithm is a generalization of the EM (expectation maximization) algorithm (Attias, 2000); it has the advantage that it can be used to estimate the parameters of a probability distribution rather than a point estimate over the variables; however, it retains the EM algorithm's property of guaranteed convergence if the distribution being estimated is of the exponential family and is unimodal (Wu, 1983). The following example illustrates the VB algorithm:

2.1.1.  Conventional VB Approach.

Consider a piece of sensor data ; these data can be modeled as having been drawn from a gaussian distribution with mean and standard deviation , where is the reported value from the sensor and is a measure of the uncertainty in the sensor due to noise. In the conventional VB algorithm, each of these parameters is estimated as an exponential distribution (commonly a gaussian distribution for and a gamma distribution for ), described by its own set of parameters (known as hyperparameters or metaparameters). Thus:
formula
2.1
From Bayes’ theorem , the values of and that maximize will also maximize . These are found by maximizing the free energy F with respect to the estimates and alternately. This minimizes the Kullback-Liebler distance between the estimate and the true distribution:
formula
2.2
Using variational calculus, it can be shown that the maximum free energy is found when
formula
2.3
These two values are updated alternately until the algorithm converges, similar to the iterative steps of the EM algorithm. The final result will be invariant to the starting estimates, being based purely on the true value of the sensor data.

Friston et al. (2009) relate free energy to “surprise,” where surprise is a measure of how well an item of sensor data agrees with an organism's current internal model. Organisms attempt to minimize surprise through either updating their model or affecting their environment to make it agree better with the internal model. An organism uses an algorithm similar to VB to estimate the uncertainties and true values of sensor data, giving it an accurate probabilistic model of its current environment. Actions are defined as hidden states whose values are found through inference; thus, the optimal action is one that minimizes surprise. This approach has been successfully applied to problems such as the mountain car problem (Friston et al., 2009) and categorization of bird song (Friston & Kiebel, 2009). In both cases, the correct result to which the system should converge is well defined. However, this is not always the case for an organism (or robot), since the response to a stimulus can be different depending on a wide range of internal and external factors. For example, animals may be more willing to take risks such as approaching predators if food or water is in short supply. Organisms must therefore select between long-term and short-term goals and establish the saliency of competing stimuli. It is thought that one mechanism for doing this is neuromodulation—the release of chemicals in the central nervous system that regulate the responses of neurons. The best known of these are DA (dopamine), 5-HT (serotonin), NA (noradrenaline), and ACh (Acetylcholine), summarized in Table 1 (Doya, 2002; Krichmar, 2008; Yu & Dayan, 2005).

Table 1:
Properties of the Best-Known Neuromodulators.
NameOriginStimulated by
DA Substantia nigra, ventral tegmental area Unexpected reward, the expectation of a reward 
5-HT Raphe nuclei Injury, attack, satiety 
NA Locus coeruleus Novelty, unexpected uncertainties 
ACh Basal forebrain Variance between model and sensor readings 
NameOriginStimulated by
DA Substantia nigra, ventral tegmental area Unexpected reward, the expectation of a reward 
5-HT Raphe nuclei Injury, attack, satiety 
NA Locus coeruleus Novelty, unexpected uncertainties 
ACh Basal forebrain Variance between model and sensor readings 

Unlike neurotransmitters, neuromodulators do not transmit signals; rather they determine how the signal is processed, for example, how much attention should be applied to one stimulus over another. They therefore play an important role in decision making, in particular in the trade-off between exploring a variety of different options or exploiting a single option. This trade-off is at the heart of adaptability; knowing when to switch to a different behavior is as important as knowing which behavior to switch to. Fellous (1998) reviews a number of computational models of neuromodulation but concludes that models up to that point consider neuromodulation as external to the system it was acting on rather than an integral part, and they treat each neuromodulator as having independent actions rather than interacting. Later research (Yu & Dayan, 2005; Tan & Bullock, 2008) suggests that these assumptions are inaccurate. More recent models, such as that suggested in Krichmar (2008), take this into account. Krichmar postulates that although neuromodulators are stimulated by different events and are produced in different areas of the brain, they all have the effect of selecting between distracted and decisive behavior. The author then proposes a framework for using neuromodulation for the control of an agent where DA is associated with “wanting”: 5-HT is associated with threat assessment, ACh is associated with attentional effort, and NA is associated with novelty and saliency.

A second theoretical example of a computational model of neuromodulation is found in Doya (2000), where it is proposed that neuromodulators can be seen as setting the values of the metaparameters in a reinforcement learning algorithm. DA represents the TD (temporal difference) error, which is a measure of how well the current state achieves the predicted level of reward; 5-HT encodes the discount factor, which assigns how much weight is given to short-term rewards versus long-term rewards; NA controls the level of stochasticity when selecting a behavior; and ACh sets the rate at which the system learns.

A number of projects used neuromodulator-inspired techniques in robotics controllers, in particular, in tuning central pattern generators (CPGs; Ishiguro, Fujii, & Hotz, 2003; Gurney, Prescott, Gonza, Humphries, & Redgrave, 2006); however, only one neuromodulator is modeled. One example of this is found in Sporns and Alexander (2002); a robot was placed in an arena containing various objects, some of which it needed to avoid and some of which it needed to manipulate with its gripper. A neural network was trained to simulate the changes in DA levels in response to encounters with different objects and select behaviors based on these levels. In testing, the robot learned to associate red with rewards, so that if it detected a red object, it would anticipate a reward, activating the release of DA and moving toward the object, however, if the object was removed before it could reach it, the association between the color and the reward was weakened. Thus, the robot was not only able to determine whether a reward would be recieved, it could also encode how likely this reward was.

3.  A Neuromimetic Model for Parameter Optimization

This section presents ANUBIS, a novel tuning mechanism for controller parameters. Unlike previous neuromodulation-inspired tuning algorithms, ANUBIS uses four ANm (artificial neuromodulators), , , , and , which are analogs to DA, 5-HT, NA, and ACh, respectively. Following the lead of the models proposed by Krichmar (2008) and Doya (2002), the actions and update rules for each ANm have been abstracted from those of their natural counterparts. This is necessary since the same neuromodulator can have different effects at different sites in the brain, and some simplification is required to be able to run the model fast enough to tune the controller in real time. Additionally, the robot does not have the same needs as a biological organism. It does not need to eat or mate, for example; instead, it reacts to a target position as an appetitive stimulus.

These ANm do not encode actual variables; rather they change depending on the state of the system. Levels of are set by the VB algorithm itself and thus are a measure of the estimated uncertainty, while levels of are increased by environmental changes at a rate proportional to the level. This is based on the work of Yu and Dayan on the role of neuromodulation in learning (Yu & Dayan, 2002, 2005), where it is proposed that ACh represents levels of expected or modeled uncertainty, while NA represents novelty or unexpected uncertainty.

ACh levels in the brain are high when the internal model is inaccurate or not well known and reduce as the model is learned. Similarly, the levels of are high when the estimate given by the VB algorithm has a high standard deviation (i.e., there is more uncertainty about its most likely value). Conversely, changes in NA occur when the data observed by the organism suggest that a change in the model being used needs to be made. This is simulated with by having the levels increase with stimuli that are not consistent with the current model, for example, errors greater than estimated or a change in the number of detected obstacles. The rate of change is proportional to the level of ; this means that the more uncertain the system is, the more attention is paid to novel stimuli. This has parallels with the role of ACh in maintaining attentional effort (Sarter, Gehring, & Kozak, 2006), and also means that both ACh and NA are implicated in the selection of salient stimuli, as suggested in Yu and Dayan (2005) and corroborated in Avery, Nitz, Chiba, and Krichmar (2012).

Levels of , which is comparable to DA, are increased by success, such as being closer to the target state, and are decreased by getting farther from the target state. This is based on the fact that certain DA neurons are stimulated by receiving rewards and inhibited by aversive stimuli (Matsumoto & Hikosaka, 2009). The rate of change is proportional to ; this represents how DA is most strongly stimulated by unexpected reward and most strongly inhibited by the withdrawal of an expected reward (Schultz, 1997). Daw, Kakade, and Dayan (2002) proposed that DA and 5-HT act as opponents. This is incorporated into the system by making the adjustment of partially based on the inverse of ; it is decreased by being closer to the target state and increased by getting farther from the target state. Once again the rate of these changes is proportional to levels of , so that the more novel a stimulus is, the greater is its effect. However, in nature, there is asymmetry between appetitive and aversive stimuli; therefore, is also increased by more immediate dangers such as collisions. This is based on the fact that 5-HT has an important role in aversion (Dayan & Huys, 2009). These four ANm interact to adjust the true distribution that the VB algorithm is converging to; this can be illustrated in the modified VB approach.

The main difference between this modified approach and the conventional VB approach is that instead of finding the true value of the sensed data, the aim is to find a solution that will improve the state of the system, and therefore both the sensor data and the current state of the system should be taken into account. However, since the result of the conventional VB algorithm is invariant to the initial estimate, it is not enough just to set the initial estimate to the current value of the variable of interest. Instead of the two variables of interest being the mean and standard deviation of the sensor data, they are related to each other through a Markov process and are also related to the sensor data. This means that two estimates are being made: the parameters describing the relationship between the variables and the sensor data and the parameters describing the relationship between the two variables of interest. For example, consider a variable x that is related to the current value y, through an unknown function f(y), as well as to the sensor data . Based on McClure, Gilzenrat, and Cohen (2006) and Kalueff, Jensen, and Murphy (2007), the values of and are used to weight the current model data and sensor data, respectively, resulting in the expression . This is then rearranged so that is the subject: . While the values of and set the value that the algorithm will converge to, and set the standard deviations. encodes the level of unexpected or unmodeled disturbances due to outside influences and is therefore used as the standard deviation for the sensor data, which in most real cases cannot be known exactly, and encodes the level of modeled disturbances and is used as the standard deviation for f(y) (i.e., the internal model). In this case, gaussian distributions are used for both, since the main source of error is likely to be gaussian noise in the sensors:
formula
3.1

The same iterative process as the conventional VB algorithm can then be used to find the value of x. Due to the ANm updates described above, the final value will incorporate both sensor data and model data to varying extents depending on the levels of and , which in turn are affected by changes in and ; thus, all the ANm play an important part in the final result.

4.  Controller Design for the EchinoBot Robot

ANUBIS is one component of a wider project investigating the use of computational intelligence techniques in the design of a Mars exploration rover, known as EchinoBot. Intelligent techniques are unpopular in industry since they often produce results that cannot be analyzed mathematically; this is particularly true of the space industry due to the high risks involved. However, planetary exploration is an area where giving systems a degree of intelligence could be especially advantageous, allowing them to deal with uncertainty and adapt to new situations. The project therefore combines conventional engineering with AI techniques. First, the design of the robot's chassis was produced using evolutionary algorithms (Smith, Saaj, & Allouis, 2010), and second, the controller is tuned in real time using ANUBIS.

The controller used to demonstrate ANUBIS is made up of three components: a combined gait generation and path planning algorithm, which aims to move EchinoBot to a defined target while avoiding collisions with obstacles; a foot trajectory plotting algorithm that designs a trajectory for each foot to move along; and a foot trajectory tracking algorithm that ensures that each foot follows the desired trajectory. An architecture for the closed-loop control system is shown in Figure 1. The variables used in this diagram are defined in Table 2 and discussed in more detail later in this section.

Figure 1:

Block diagram of the EchinoBot control system.

Figure 1:

Block diagram of the EchinoBot control system.

Table 2:
Variables Used in Control Diagram.
Variable NameExplanation
Pt Position of target relative to leg 
P0 Initial position of leg 
v Movement vector of leg 
 Desired end foot position 
q Vector of desired joint positions 
 Vector of desired joint velocities 
U Control (torque) signal 
e Position error 
 Velocity error 
qc Joint angles after a collision 
 Joint velocities after a collision 
 Joint accelerations after a collision 
P6 Final leg position 
Variable NameExplanation
Pt Position of target relative to leg 
P0 Initial position of leg 
v Movement vector of leg 
 Desired end foot position 
q Vector of desired joint positions 
 Vector of desired joint velocities 
U Control (torque) signal 
e Position error 
 Velocity error 
qc Joint angles after a collision 
 Joint velocities after a collision 
 Joint accelerations after a collision 
P6 Final leg position 

A computer-aided design (CAD) rendition of the EchinoBot rover is shown in Figure 2; the prototype is currently under construction. A radially symmetric configuration was selected due to the fact that it has no defined front and rear and therefore can move equally well in any direction, as well as allowing a modular design to be used. The rover is able to sense obstacles using a stereo camera carried on the body and collision sensors on each leg segment.

Figure 2:

CAD drawing of EchinoBot, the ANUBIS test bed vehicle.

Figure 2:

CAD drawing of EchinoBot, the ANUBIS test bed vehicle.

4.1.  Gait Generation and Path Planning.

One disadvantage of using a radially symmetric design is that it becomes harder to apply animal gaits to the robot, since animals that have radial symmetry tend to be more primitive creatures that do not walk in directly analogous ways to legged machines. The algorithm discussed here is inspired by the coordination between starfish legs, an example from nature of a distributed, radially symmetric system. Starfish have no central nervous system; rather, each ray communicates with the others via a circumoral nerve ring to activate or inhibit its neighbors (Dale, 1999). For example if the starfish smells food, the ray that can detect the odor most strongly becomes the leader, and the rest of the rays follow it. This is similar to a control technique known as implicit leadership used for distributed swarm systems (Yu, Werfel, & Nagpal, 2010), where each agent calculates a confidence based on factors such as distance to the target. Agents will follow the movements of other agents that have a higher confidence than them. This has the advantage that they need to communicate only with the agents closest to them; as they move closer to the target, their confidence will increase. Thus, the movement toward the target will be propagated through the swarm. In the algorithm discussed here, each leg calculates a confidence value based on a combination of its distance to the target (target confidence) and its position relative to any obstacles (obstacle confidence). A movement vector is then calculated that takes into account this confidence and the confidence of the neighboring legs. The leg with the highest confidence becomes the leader, and the other legs move in the same direction.

The gait generation and path planning algorithm proceeds as depicted in Figure 3. First, the target confidence (i.e., detection level) for each foot was calculated. Target confidence is related to distance from the target and the previous value of target confidence through the equation , where gk is the value of target confidence at time k and is the measured distance to the target. Next, the obstacle confidence for each foot is calculated, based on the distance to obstacles, the relative angle between the obstacle and the target, and the previous value of obstacle confidence. These are related through the equation , where is the value of obstacle confidence at time k, is the angle between the obstacle and the target, and is the distance to the obstacle. The total confidence of the leg is .

Figure 3:

Flow diagram of the gait generation and path planning process.

Figure 3:

Flow diagram of the gait generation and path planning process.

Once the confidence of the legs is known, the activity levels for each leg are found as
formula
4.1
where i= 1 to n, m(i) is the activity of foot i; C(i−1), C(i) and C(i+1) are the confidence of the previous foot, the current foot, and the next foot, respectively; and a and b are positive constants. The desired position based on the target and obstacle positions is then calculated as
formula
4.2
where v(i) is the desired movement vector, p is a constant, is the maximum step length of the robot, cd=C(i)/C(i−1)+C(i)/C(i+1), and is the desired movement angle for a leg.

One important aspect of path planning is escaping from local minima. These are points at which the relative levels of obstacle and target confidence are such that the robot cannot move any farther; for example, if it takes a step backward, its target confidence will suppress the obstacle confidence, causing it to step forward, which then causes its obstacle confidence to suppress its target confidence, so that it steps backward. The result is that the robot becomes trapped between these two points. By intelligently selecting values for target and obstacle confidence, the robot's capability for escaping local minima can be greatly enhanced.

Target confidence and obstacle confidence were estimated using both ANUBIS and an MLP as a comparison. The perceptron attempted to find a value for target confidence that minimized the distance and the angle to the target and a value for the obstacle confidence that maximized the distance and angle to any obstacles. The equations defining the ANUBIS updates are described in section 4.3.1.

As well as reaching the target, the robot should maintain a stable posture; the robot will become unstable if the line connecting two adjacent feet intersects the body. It should be noted that strictly speaking, instability occurs when the center of mass falls on or outside the support polygon; however, since this algorithm attempts to generate stable walking with a distributed system where communication between feet is limited, a function for stability was determined using only data from adjacent legs. The robot is also likely to become unstable if one or more of the foot positions is underneath the body. This became particularly important in the case of obstacle avoidance; the movement away from the obstacle can cause a foot to move too close to the edge of the body, making the robot unstable. The algorithm for maintaining a stable posture is similar to the artificial potential field (APF) technique commonly used for path planning (Khatib, 1986). However, here it is used to select between two desired foot positions: the desired foot position based on the target and obstacle confidence, and the most stable foot position. The most stable foot position is a point halfway between the farthest distance the foot can reach and the minimum distance to the body to achieve stability. The final foot position is calculated by the weighted sum of the two foot positions. A higher weighting for targeting position means that the position of the foot moves closer to the desired position based on the target, and vice versa.

4.2.  Foot Trajectory Generation and Tracking.

Once the start and desired foot positions have been determined, it is necessary to plot a trajectory between them, taking into account constraints such as the desired time for the step to take place, the desired shape of the path that the foot should take, and velocities and accelerations at different points. Although the path planning algorithm attempts to move around large obstacles, it is likely to be impossible to avoid collision with every single obstacle in natural environments. Therefore it is desirable that the robot should have some behavior to react to an unexpected obstacle, such as lifting the leg over it. This requires real-time adaptation of the leg trajectory. A flow diagram of the trajectory generation and tracking process is shown in Figure 4.

Figure 4:

Flow diagram of the trajectory generation and tracking process.

Figure 4:

Flow diagram of the trajectory generation and tracking process.

Bézier curves are commonly used in many fields of computing and engineering when a smooth, scalable curve is required (Thompson & Patel, 1987). The equation of a Bézier curve of degree j is a polynomial of order j, where the coefficients of the polynomial are multiples of the control points of the curve. This is generalized in
formula
4.3
where is the binomial coefficient, t is time, and Pi is the coordinate of the ith control point. The Bézier curve is contained by the convex hull defined by the control points and passes through the control points P0 and Pj. By considering the Bézier curve as a Hermite polynomial of the same degree, it is possible to express the control points as functions of the start and end positions and their derivatives. If q0 is the start point and qj is the end point, then , , , and . Thus, in order to specify initial and final velocities and accelerations, j must be at least 5, but if a 5 degree Bézier curve is used, all the points must be fixed in order to achieve the desired initial and final conditions. It is possible to add further control points between P2 and Pj−2 that can be freely moved to change the shape of the graph. However, increasing the order of the polynomial increases the computational expense of the calculation, and adding more control points increases the number of inverse kinematic calculations that must be done. For this reason, it was decided to use a 6 degree Hermite polynomial. Since it is desirable that the initial and final joint velocities and accelerations should be 0 rads−1 and 0 rads−2, respectively, the control points of the 6 degree curve should be defined as P2=P1=P0 and P4=P5=P6, where P0 is a vector of joint angles for the initial position of the foot and P6 is a vector of joint angles of the final position. However, when the leg has collided with an obstacle, the starting velocity would be nonzero. Although it would be possible to reset the velocity and acceleration to zero at this point, this would require a significant change in torque, which would need to be almost instantaneous. This could cause damage to actuators and is therefore undesirable. Therefore, instead, and . If there is no collision, and will be zero and there is a collision, they will take on the values at the time of collision. The center control point, P3, can be freely defined as a point in Cartesian space and then converted to joint angles using inverse kinematics. This is preferable to defining the point in joint space, as it means that the x and y coordinates can be set to the midpoint between the start and end foot positions and the z coordinate can be changed to set the height of the step. The actual z coordinate of the highest point on the path of the foot is a function of the control points rather than the exact coordinates of P3.

As with the path planning algorithm, P3 was estimated using ANUBIS and an MLP. The equations defining the ANUBIS updates are described in section 4.3.2, while the MLP attempts to keep the step height as close as possible to a defined minimum of 0.015 m.

Once the trajectory has been designed, the inverse kinematics of the leg are used to determine the angles each leg joint should be set to follow the desired path, and a third control module is implemented to ensure that the leg tracks this trajectory. Here ANUBIS is combined with terminal sliding mode control (TSMC), a type of sliding mode control (SMC). SMC is a form of variable structure control that is robust to perturbations whose bounds are known (Yu & Kaynak, 2009). This is achieved by including a discontinuous term in the control law, which constantly forces the system onto a hyperplane (known as the sliding surface) where the desired dynamics are displayed. The disadvantage of this approach is that it often leads to high-frequency oscillations in the control signal known as chattering, which can cause damage to actuators. In order to mitigate this, TSMC uses a nonlinear sliding surface (Venkataraman & Gulati, 1993), which has the added advantage of finite convergence time. TSMC was used in this project due to the ability to specify convergence time, which would mean that the controller would reach the desired state in each time step and because, being a form of SMC, it will be robust to modeled disturbances once the sliding mode is achieved. The dynamic model used here is derived from a system of rigid bodies: the chain of links that makes up a leg.

ANUBIS was applied to a TSMC based on the controller discussed in Yu, Yu, and Stonier (2003), and compared to an identical controller tuned using an MLP. For this contoller, the sliding surface is defined as , the equivalent control as , and the discontinuous control as . Thus, the complete control law is defined as U=UeqUd. In these expressions, Me, Ce, and Ge are the estimated inertia, Coriolis, and gravity matrices; q, , are angle, velocity, and acceleration vectors; e and are position and velocity errors; and , B, , and r are constants. Since both and r must fall within a specific range, which would require more constraints added to the estimation algorithms, the two quantities that were estimated were the gains B and . The MLP attempts to keep the sliding surface s at zero, hence minimizing the position and velocity errors. The equations defining the ANUBIS updates are described in section 4.3.2.

4.3.  Integrating ANUBIS with the Walking Controller.

4.3.1.  Integration with Gait Generation and Path Planning.

Two quantities need to be estimated for the top-level controller: target confidence (the positive confidence due to proximity to the target) and obstacle confidence (the negative confidence due to proximity to obstacles). Target confidence is related to distance to target and the previous value of target confidence through the equation . By rearranging this expression, the pdfs are defined as
formula
4.4
where gkm, gkv are metaparameters describing gk. The initial values of gkm, gkv, and were set as the current target confidence and random values, respectively. Obstacle confidence is calculated using the equation ; therefore, the pdfs are defined as:
formula
4.5

The initial values , , were set at the current value of obstacle confidence and random values, respectively. Since the estimate of obstacle confidence is based on two pieces of sensor data, the obstacle distance and the obstacle angle, a multivariate normal distribution is used; however, since both data sets are taken from the same sensor (i.e., the stereo camera) and are measured simultaneously, it is assumed that they are both subject to the same level of gaussian noise and therefore have the same standard deviation. While is updated, the rest of the ANm are updated in the main controller program in response to stimuli as outlined in Table 3. In addition to the updates discussed in section 3, is increased when there is a small change in confidence and decreased when there is a large change, since if there is no change in confidence, this suggests that the robot is stuck, and thus the current strategy needs to be changed.

Table 3:
ANm Updates for Gait Generation and Path Planning.
NameStimulated byDepressed byRateInitial Value
 Increase in overall confidence Decrease in overall confidence  
 Decrease in overall confidence, number of obstacles in sensor range Increase in overall confidence  (change in confidence), constant (number of obstacles) 
 Change in number of obstacles in sensor range, small change in confidence Large change in confidence  (change in confidence), constant (number of obstacles) 0.5 
 VB algorithm VB algorithm NA 0.5 
NameStimulated byDepressed byRateInitial Value
 Increase in overall confidence Decrease in overall confidence  
 Decrease in overall confidence, number of obstacles in sensor range Increase in overall confidence  (change in confidence), constant (number of obstacles) 
 Change in number of obstacles in sensor range, small change in confidence Large change in confidence  (change in confidence), constant (number of obstacles) 0.5 
 VB algorithm VB algorithm NA 0.5 

4.3.2.  Integration with Foot Trajectory Generation and Tracking.

Only one quantity needed to be estimated for the foot trajectory plotting algorithm: relative step height. If the leg collides with an obstacle, it should lift its foot higher to attempt to pass over it. Conversely, if there are no collisions, the step height should be kept as low as possible so as to minimize the torque requirements. The step height was adjusted by updating the z component of P3. The function used was , so the pdfs were defined as
formula
4.6
where z0, z6, and z3 are the z coordinates of points P0, P6, and P3, respectively. The initial values of z0b, z3m, and z0c were set as the current value of z0, the current value of z3, and 0.5, respectively. ANm updates are outlined in Table 4.
Table 4:
ANm Updates for Trajectory Generation.
NameStimulated byDepressed byRateInitial Value
 No collision Collision  
 Collision No collision  
 Collision No collision Constant rates for collision(fast) and no collision (slow) 0.5 
 VB algorithm VB algorithm N/A 0.1 
NameStimulated byDepressed byRateInitial Value
 No collision Collision  
 Collision No collision  
 Collision No collision Constant rates for collision(fast) and no collision (slow) 0.5 
 VB algorithm VB algorithm N/A 0.1 
For the TSMC parameters, the following conditions must be satisfied to avoid singularity: and 0<r<1 (Yu et al., 2003). Since the legs of the robot have 3 degrees of freedom, , where is a 1 × 3 vector of positive numbers, and where is a 1 × 3 vector. The initial values of the quantities to be estimated, B and , were set as
formula
4.7
where T is the desired convergence time and i= 1 to 3 is the joint number. These were selected since B can be used to determine how fast the controller will converge, and will control how much torque will be required. Similar to the estimation for obstacle confidence, two pieces of sensor data were used: the position error (e) and velocity error (). The function
formula
4.8
was used to relate the quantities being estimated to the sensor data. In order to simplify the update calculations, the actual data used were the velocity error, , and the quantity, . In spite of this simplification, equation 4.8 cannot be rearranged to linear functions of or . Therefore, first-order Taylor series expansions were used to express the sensor data in terms of and :
formula
4.9
The pdfs were then defined as
formula
4.10
where is a metaparameter describing , and and are metaparameters describing . The initial values of , , , were set as , and a random value, respectively. ANm updates are outlined in Table 5.
Table 5:
ANm Updates for Trajectory Tracking.
NameStimulated byDepressed byRateInitial Value
 Decrease in |sIncrease in |s 
 Increase in |sDecrease in |s 
   Constant 0.5 
 VB algorithm VB algorithm NA 0.1 
NameStimulated byDepressed byRateInitial Value
 Decrease in |sIncrease in |s 
 Increase in |sDecrease in |s 
   Constant 0.5 
 VB algorithm VB algorithm NA 0.1 

5.  Simulations and Results

5.1.  Testing Gait Generation and Path Planning.

Three simulation environments were designed for testing the gait generation and path planning algorithm; these environments were inspired by the collection of benchmarks compiled by the Algorithms and Applications group at Texas A&M University (Parasol Lab, 2010) and were intended to produce local minima. The positions of obstacles in Figures 5 and 6 are shown by filled circles, and the position of the target is shown by the white circle. The controller should be able to guide the robot to the target without colliding with any obstacles, getting stuck in local minima, or falling over. The number of iterations should also be minimized to conserve power. Figure 5 compares the tuning performance of the MLP and ANUBIS for the gait generation and path planning algorithm over the three different environments. The number of iterations is shown at the top of each panel. The robot continues moving until the center of its body is within one unit of the target center.

Figure 5:

Robot paths through environments 1, 2, and 3 with the controller tuned using the MLP and ANUBIS.

Figure 5:

Robot paths through environments 1, 2, and 3 with the controller tuned using the MLP and ANUBIS.

Figure 6:

Body paths and changes in leadership and ANm values as the robot moves through environment 2.

Figure 6:

Body paths and changes in leadership and ANm values as the robot moves through environment 2.

There are a number of similarities in performance between the two systems: the two systems are both able to escape local minima and reach the target, and they follow similar paths to the target. Due to the echinoderm-inspired mechanism used by the underlying algorithm, it can be seen that as the robot moves, leadership is transferred between the legs, which has the result that the robot does not have a clearly defined front or rear, instead following the leg with the highest confidence. The frequency of this change of leadership is defined by the tuning algorithm used, since it is dependent on the relative confidence of the legs. The most significant difference between the two sets of results is that the system tuned with ANUBIS moves much faster than the MLP-tuned system, and the feet are positioned more evenly around the body, resulting in a more stable posture. Figure 6 compares the changes in leadership between the two tuning methods as the robot moves through environment 2. Also shown is the position of the center of the robot's body; the legs and feet are not shown for clarity. This figure shows one reason that the system tuned with the MLP is so much slower than the other system: leadership frequently switches between different legs, causing the path of the robot to meander rather than move in a straight line. The high number of changes in leadership can be explained by examining the changes in confidence over time; tuning with the MLP results in periods of high-frequency changes in confidence; during these periods, leadership switches between adjacent legs, which have similar confidence. Although the leg confidence produced by ANUBIS is much closer to each other, the rate at which confidence changes is much slower, so leadership is not transferred as often. This parallels the finding in Migita (2006) that starfish move most efficiently when a single lead leg is dominant rather than when leadership is shared between two legs.

One possible reason for the two techniques producing these differing effects is that the MLP relies purely on sensor data, which change almost constantly, resulting in a frequent change in confidence. For ANUBIS, the importance of sensor data is adjusted by the levels of artificial neuromodulators. If the robot is compared to a living system, the interplay of and could be understood as follows. As the robot approaches an obstacle, it becomes more cautious (since high increases obstacle confidence and decreases target confidence); if new obstacles are spotted, caution increases at a higher rate (i.e., as increases, the rate of change of and increases). If the number of obstacles remains constant—for example, if the robot is stationary in a local minimum—then the rate of increase of caution will decrease, until the changes in and are dominated by the change in improvement. At this point, the danger represented by being stuck becomes a priority over the danger of the obstacles, so any change in improvement will motivate the robot, regardless of danger presented by obstacles. The result of this in terms of the confidence values is that the higher is, the more effect sensor data will have on the estimated value, and the higher is, the more effect the previously estimated confidence value will have. This capability to ignore sensor data in some situations also results in improved ability to escape local minima compared to the MLP. In Figure 6, a high concentration of plotted points can be seen around [1,0], showing that the robot made a number of small movements to attempt to escape the local minimum. The proximity between plotted points between the obstacles shows that the robot moved slowly along a meandering path as it reacts to whichever obstacles are closest. The behavior of the robot using ANUBIS is significantly different; there is a small concentration of plotted points just as the robot reaches the minimum, corresponding to the robot's remaining in approximately the same position; however, over time, the danger of being stuck dominates the danger of the obstacles as described above, and the robot moves quickly toward the target.

These actions have parallels in nature. Here and have an antagonistic relationship, similar to that outlined in Daw et al. (2002); increases target confidence and decreases obstacle confidence, and does the opposite; however, also selects between short-term goals, such as collision avoidance, and long-term goals, such as moving to the target, similar to the action suggested in Schweighofer et al. (2008). Additionally, the relative levels of and determine whether sensed data or the previous state have a greater influence on the legs’ confidence, in essence selecting between top-down and bottom-up information. The rate of change of these values is determined by and , similar to the proposed actions of ACh and NA in Yu and Dayan (2005); higher levels of and result in a fast increase in near obstacles, suppressing top-down information. An example of these changes in ANm levels is illustrated in Figure 6e. The levels of and mirror each other, with increasing as the robot approaches the target and decreasing. increases over time since the robot is constantly moving, while remains around a value of 0.024, suggesting that few changes to the internal model are required. From Figure 6f, it can be seen how the rates of change of the ANm depend on environment and each other; while the robot is moving between the obstacles (approximately iterations 50–300), the rates of change of and generally increase and decrease with the rate of change of ; however, at a minimum, there is a clear decrease in the rates of change of and while the rate of change of levels off and the rate of change of becomes positive. This suggests that becoming stuck in a local minimum requires changes to the internal model and, hence, a switch to top-down attention. Although the robot is not moving, and thus its environment is not changing, the increased rate of change of affects the levels of and hence and , allowing the robot to escape the minimum. The reverse effect can be seen toward the end of the route (approximately iteration 350 onwards), where the robot has a clear path to the target. Here the rates of changes of both and decrease, while there is a significant increase in the rates of change of and , suggesting that once there are few external stimuli and the internal model does not require updating, the rates of change are more dependent on bottom-up information.

5.2.  Testing Foot Trajectory Generation and Tracking.

The leg should move from the initial point to the final point smoothly. If there is a collision, the leg should lift higher to avoid the obstacle. Initial and final velocities and accelerations should also be zero so that after each step, the robot is in a stable position.

Five consecutive steps were simulated with the same start and end positions so as to make comparison fairer. From Figure 7, it can be seen that when ANUBIS is used, there is a significant difference in the step heights over time. If there is no collision, then increases and decreases, resulting in an overall decrease in step height. The rate of this change is proportional to the level of , which increases quickly if there is a collision and decreases slowly if there is no collision. Thus, a collision results in a large change in and , and no collision results in a smaller change. If the MLP is used, it quickly converges after two steps to a step height of 0.01505 m, extremely close to the desired value of 0.015 m. There is little variation in the step heights over time.

Figure 7:

Foot trajectories and leg dynamics over the course of five steps.

Figure 7:

Foot trajectories and leg dynamics over the course of five steps.

Figure 8 shows the case where there is a collision during step 3. When ANUBIS is used, step 2 is lower than step 1 due to the higher ratio of to , and step 3 starts off even lower. However, when the collision occurs, the level of increases significantly, and the level of decreases significantly due to the large change in . This means that the leg lifts higher so as to try to avoid the obstacle. Step 4 is higher, since the level of is still high, but since there is no collision in step 4, increases and decreases, so step 5 is lower than step 4. This is based on the fact that obstacles are not usually found individually, but rather make up part of an area of difficult terrain (Golombek et al., 2003). This means that if the leg collides with a rock, the robot is likely to be in an area where further collisions are possible. By making the next few steps after the collision higher than usual, it is possible to reduce the number of collisions. If the MLP is used, a similar result occurs, with the foot lifting higher after the collision; however, the convergence back to the desired step height occurs much faster, within the next step. The main difference between ANUBIS and the MLP in this scenario is that the MLP is attempting to minimize the error between the desired step height and the actual step height, while ANUBIS selects a step height based on the ANm levels but has no information about a desired value. This means that the step heights change more gradually.

Figure 8:

Foot trajectories and leg dynamics over the course of five steps. A collision with an obstacle occurs in step 3.

Figure 8:

Foot trajectories and leg dynamics over the course of five steps. A collision with an obstacle occurs in step 3.

The controller should converge to each desired point along the joint position and velocity trajectories using minimum torque in the desired time, and high-frequency chattering should be minimized. It is also desirable that convergence should be fast so as to reduce reaction time and power requirements. The controller should be able to react to both modeled parametric uncertainties and random unmodeled disturbances, maintaining good levels of accuracy while minimizing chattering and convergence time. Computational load was measured using the Matlab stopwatch function: the lower the computational load of the algorithm, the less time would be required for it to run. Three situations were tested: nominal conditions (no disturbances or uncertainties), parametric uncertainties (Me=0.5M, Ce=0.5C, Ge=0.5G, where M, C, and G are the true inertia, Coriolis, and gravity matrices, and Me, Ce, Ge are defined as previously), and a step disturbance. The trajectory being tracked was between the points [0.1, 0.5, 0.0] and [0.5, 0.1, 0.0]. Its time period was 20 s, with each interval between points 0.5 s. This would represent the foot moving at 1.43 cm/s, quite fast in realistic terms, leading to discontinuities in the control signal, since the difference between consecutive desired values along the trajectory is quite large. This can be mitigated by increasing the time taken to make the movement and increasing the number of time steps the trajectory is divided into, thus reducing the required change in torque between steps.

Table 6 compares the average torques, sliding surface (s), computational loads (i.e., run times tr), and iterations until convergence (tc) for both tuning techniques, while Figures 9 to 11 show plots of torque, joint angles and velocities, and sliding surfaces. Results for the coxa, femur, and tibia joints are labeled J1, J2, and J3, respectively. The average torque requirement is approximately the same for both tuning techniques; however, from Figure 9, it can be seen that the peak torques are greater for the controller tuned with the MLP. Both Table 6 and Figure 9 show that the sliding surface magnitude is lower using the MLP, signifying greater tracking accuracy; however, this comes at the expense of a significant increase in both tr and tc compared to the controller tuned with ANUBIS. This is because with ANUBIS, the rate of convergence to the sliding surface increases with error; therefore, if the error is high, the controller will converge fast, slowing as it approaches the surface. Another effect of this is that the controller is less likely to overshoot the surface. The result is that the controller does not display any high-frequency oscillations (chattering), only the lower-frequency discontinuities due to the difference between consecutive desired values along the trajectory. Conversely Figure 9 shows that chattering occurs in the torque signal when the MLP is used. Figure 10 shows the effect of parametric uncertainties; there is no change in the MLP result because it does not use a model to estimate the gain values. When ANUBIS is used, there is a slight increase in average value of s in joint 3; however, the torque requirement is decreased since the estimated values for the controller parameters are lower. Once again, the computational load and convergence time were lower when the controller was tuned using ANUBIS.

Table 6:
Comparison Between TSMC with the MLP and ANUBIS.
NominalParametric UncertaintyStep Disturbance
Torque (Nm)s × 10−4tr (s)tcTorque (Nm)s × 10−4tr (s)tcTorque (Nm)s × 10−4tr (s)tc
MLP J1 0.0005 −0.13   0.0005 −0.13   0.0001 −0.08   
 J2 1.1279 0.030 45.4 135.5 1.1279 0.030 45.4 135.5 1.1284 −2.32 42.6 535.8 
 J3 0.2888 −0.04   0.28883 −0.04   0.2893 −0.09   
ANUBIS J1 0.0001 −0.9   0.0007 0.6   −0.0008 0.0   
 J2 1.1280 1.1 5.50 45.2 0.5638 1.0 2.98 34.4 1.1256 0.014 5.80 93.2 
 J3 0.2885 −1.1   0.1442 −1.5   0.2888 −0.001   
NominalParametric UncertaintyStep Disturbance
Torque (Nm)s × 10−4tr (s)tcTorque (Nm)s × 10−4tr (s)tcTorque (Nm)s × 10−4tr (s)tc
MLP J1 0.0005 −0.13   0.0005 −0.13   0.0001 −0.08   
 J2 1.1279 0.030 45.4 135.5 1.1279 0.030 45.4 135.5 1.1284 −2.32 42.6 535.8 
 J3 0.2888 −0.04   0.28883 −0.04   0.2893 −0.09   
ANUBIS J1 0.0001 −0.9   0.0007 0.6   −0.0008 0.0   
 J2 1.1280 1.1 5.50 45.2 0.5638 1.0 2.98 34.4 1.1256 0.014 5.80 93.2 
 J3 0.2885 −1.1   0.1442 −1.5   0.2888 −0.001   
Figure 9:

TSMC simulated with no perturbations.

Figure 9:

TSMC simulated with no perturbations.

Figure 10:

TSMC simulated with modeling inaccuracies.

Figure 10:

TSMC simulated with modeling inaccuracies.

When a step disturbance is introduced, the difference between the two tuning techniques becomes slightly more pronounced. Figure 11 shows that when the disturbance occurs, the system tuned with ANUBIS experiences a large divergence between the desired and actual velocities and a small divergence in position; however, after 11 steps, it converges back to the desired velocity value, and after 5 steps, it converges back to the desired position value. This results in a spike in the sliding surface and a corresponding spike in the torque signal. On other hand, the system tuned with the MLP has very little divergence from the desired velocity values, but never truly converges back to tracking the desired position values, resulting in higher values for s in Table 6. Chattering can also be observed when MLP is used, while it is not present when ANUBIS is used, and although the average torques are similar, once again the peak torque values are higher with the MLP.

Figure 11:

TSMC simulated with a step disturbance.

Figure 11:

TSMC simulated with a step disturbance.

The two biggest advantages to using ANUBIS are significantly reducing the computational load and run time and elimination of chattering. Rather than just rejecting disturbances as the MLP does, ANUBIS incorporates them into its model, which results in its converging more reliably. It is also able to minimize peak torques because the values of uncertainties ( and ) change dynamically over time based on the measured errors, thus adjusting the bounds over time and allowing the appropriate values of parameters to be recalculated as necessary. Additionally, the and updates mean that the region to which the controller attempts to converge also changes dynamically over time, based on the measured errors and thus adjusting the bounds over time, allowing the appropriate values of parameters to be recalculated as necessary. Additionally the and updates mean that the region to which the controller attempts to converge also changes over time. This reduces control activity when close to the desired value, reducing chattering and torque requirements, and it means that convergence time is significantly reduced.

ANUBIS demonstrates one way in which neuromodulation can be integrated with the Bayesian brain framework and provides one possible mechanism for dealing with saliency when surprise is suppressed. Attention switches between top down (i.e., a greater weighting is given to the current internal model parameters) and bottom up (i.e., a greater weighting is given to sensor data) based on relative levels of and . However, the update rates of these quantities are based on the level of , meaning that novel stimuli will have a greater effect than previously detected stimuli. This prevents a single stimulus from dominating the robot's decision-making process, since as its novelty decreases, its influence on and will also be reduced. Attention can therefore also switch between multiple stimuli, and two similar stimuli can have different effects depending on the current state of the system. The uncertainty of the current model is taken into account by basing the update rate of on the level of ; thus, a high model uncertainty will lead to faster changes in the level of , which leads to more dramatic changes in the levels of and , in essence increasing the robot's distractibility as its attention switches between top down and bottom up and between multiple stimuli, and resulting in more exploratory behavior. If model uncertainty is low, the levels of , and hence and , change more slowly, resulting in more exploitative behavior. Thus the interactions between individual ANm result in complex emergent behaviors that take into account the model uncertainty and the novelty of sensed data.

The update schemes for the ANm are based on recent research into neuromodulation, and their interactions produce behavior similar to nature, such as caution or exploration. They also make the VB algorithm more adaptable without the loss of guaranteed convergence. Lesioning of parts of the artificial neuromodulatory system by disabling their updates also results in behaviors that parallel nature; some examples of this are shown in Figure 12. If is lesioned, the robot keeps farther away from the obstacles, to the extent that it cannot reach the target and instead becomes stuck in the local minimum. If is lesioned, the robot is able to reach the target faster, taking 398 iterations instead of 420; however, the cost of this is that it passes closer to the obstacles and, in the case of obstacle 6, collides with it. If is lesioned, the robot displays similar behavior to when is lesioned; it reaches the target quickly, but there are multiple collisions with obstacles. It can also be seen that the path the robot follows is more meandering, similar to that followed by the controller tuned by the MLP in Figure 6. This suggests that without the capability to distinguish novel stimuli, the robot just reacts to whichever obstacle is closest. Lesioning meant that the robot became stuck in the local minimum at [1,0] for 198 iterations, but it was then able to escape.

Figure 12:

Effects of lesioning parts of the artificial neuromodulatory system.

Figure 12:

Effects of lesioning parts of the artificial neuromodulatory system.

In nature, lesioning the DA neurons in the nucleus accumbens results in more risk-averse behavior, even if this is disadvantageous in the long term (Sugam, Day, Wightman, & Carelli, 2011). Conversely, reduced levels of 5-HT are associated with greater impulsivity and risk taking (Homberg et al., 2007; Harrison, Everitt, & Robbins, 1997) showed that rats with 5-HT lesions collected rewards faster than nonlesioned rats did. The controller tuned with ANUBIS demonstrates corresponding behavior; while was lesioned, it was unable even to get close to the obstacles, and it was more concerned with getting to the target than avoiding obstacles when was lesioned. Lesions of the NA system result in impaired accuracy when the task environment changes, while lesions of the ACh system result in a reduction in baseline accuracy (Robbins et al., 1998). Similarly, the robot is less able to accurately respond to stimuli such as obstacles as it moves and its environment changes.

6.  Conclusions

This letter introduced ANUBIS, a new biologically inspired parameter tuning technique, and discussed its application to three different algorithms: gait generation and path planning, foot trajectory, and tracking. ANUBIS is based on the variational Bayes approach, combined with metaparameters that were updated in a way inspired by neuromodulation. By including these metaparameters, it is possible to adjust the final results of the tuning algorithm, unlike the usual result of the VB method, which always converges to the distribution expressed by the sensor data. In this way, it is possible to take into account both explicit data (e.g., the actual sensor readings) and implicit data (e.g., whether the robot is improving its situation over time) when tuning the parameters. This gives it an advantage over tuning techniques that rely solely on sensor data, such as the MLP. The result of tuning with ANUBIS is that the three algorithms are made more adaptable and are better able to respond to risks such as obstacles. ANUBIS is also able to adapt to low-risk environments, for example, by lowering step height so as to reduce power and torque requirements and increasing step height in higher risk environments. Although the MLP is able to keep step height to a minimum, it does not take previous obstacles into account. Additionally, ANUBIS is also able to reduce time required for convergence and the torque required for trajectory tracking compared to the controller tuned with the MLP. Since the parameter updates are based on sensor data or previous values based on the current levels of accuracy and uncertainty, sudden changes in angle or velocity, which would require high torques, are avoided; high-frequency oscillations are also eliminated from the control signal.

Work is currently ongoing into constructing a prototype of the EchinoBot rover so that the controller discussed in this letter can be implemented and tested in hardware. While this robot is intended as a prototype rover for planetary exploration, ANUBIS could also have applications in other areas where a trade-off between exploration and exploitation, or distractibility and decisiveness must be made. This includes areas as diverse as dynamic networking or treatment selection in clinical trials, as well as computational intelligence and robotics. The integration of neuromodulation with the Bayesian brain could also be developed further by using a more complex and realistic model of the various neuromodulatory systems or by using a hierarchical inference model.

References

Attias
,
H.
(
2000
).
A variational Bayesian framework for graphical models
. In
S. A. Solla, T. K. Leen, & K. Muller
(Eds.),
Advances in neural information processing systems, 12
(pp.
209
215
).
Cambridge, MA
:
MIT Press
.
Avery
,
M. C.
,
Nitz
,
D. A.
,
Chiba
,
A. A.
, &
Krichmar
,
J. L.
(
2012
).
Simulation of cholinergic and noradrenergic modulation of behavior in uncertain environments
.
Frontiers in Computational Neuroscience
,
6
.
Beal
,
M. J.
(
2003
).
Variational algorithms for approximate Bayesian inference
.
Unpublished doctoral dissertation, University College London
.
Dale
,
J.
(
1999
).
Coordination of chemosensory orientation in the starfish Asterias forbesi
.
Marine and Freshwater Behaviour and Physiology
,
32
(
1
),
57
71
.
Daw
,
N. D.
,
Kakade
,
S.
, &
Dayan
,
P.
(
2002
).
Opponent interactions between serotonin and dopamine
.
Neural Networks
,
15
,
603
616
.
Dayan
,
P.
, &
Huys
,
Q.J.M.
(
2009
).
Serotonin in affective control
.
Annual Review of Neuroscience
,
32
,
95
126
.
Doya
,
K.
(
2000
).
Metalearning, neuromodulation, and emotion
. In
G. Hatano, N. Okada, & H. Tanabe
(Eds.),
Affective minds
(pp.
101
104
).
Amsterdam
:
Elsevier
.
Doya
,
K.
(
2002
).
Metalearning and neuromodulation
.
Neural Networks
,
15
(
4–6
),
495
506
.
Ertugrul
,
M.
, &
Kaynak
,
O.
(
2000
).
Neuro sliding mode control of robotic manipulators
.
Mechatronics
,
10
,
239
263
.
Fellous
,
J.-M.
(
1998
).
Computational models of neuromodulation
.
Neural Computation
,
805
,
771
805
.
Friston
,
K.
(
2009
).
The free-energy principle: A rough guide to the brain?
Trends in Cognitive Sciences
,
13
,
293
301
.
Friston
,
K.
,
Daunizeau
,
J.
, &
Kiebel
,
S. J.
(
2009
).
Reinforcement learning or active inference?
PLoS One
,
4
(
7
).
Friston
,
K.
, &
Kiebel
,
S.
(
2009
).
Cortical circuits for perceptual inference
.
Neural Networks
,
22
(
8
),
1093
1104
.
Golombek
,
M. P.
,
Haldemann
,
A. F. C.
,
Forsberg-Taylor
,
N. K.
,
DiMaggio
,
E. N.
,
Schroeder
,
R. D.
,
Kakosky
,
B. M.
, et al
(
2003
).
Rock size–frequency distributions on mars and implications for Mars exploration rover landing safety and operations
.
Journal of Geophysical Research
,
108
(
12
),
ROV27.1
ROV27.23
.
Gurney
,
K.
,
Prescott
,
T. J.
,
Gonza
,
F. M. M.
,
Humphries
,
M. D.
, &
Redgrave
,
P.
(
2006
).
A robot model of the basal ganglia: Behavior and intrinsic processing
.
Neural Networks
,
19
,
31
61
.
Harrison
,
A. A.
,
Everitt
,
B. J.
, &
Robbins
,
T. W.
(
1997
).
Doubly dissociable effects of median- and dorsal-raph lesions on the performance of the five-choice serial reaction time test of attention in rats
.
Behavioural Brain Research
,
89
(
1–2
),
135
149
.
Homberg
,
J. R.
,
Pattij
,
T.
,
Janssen
,
M. C. W.
,
Ronken
,
E.
,
Boer
,
S. F. D.
,
Schoffelmeer
,
A.N.M.
, et al
(
2007
).
Serotonin transporter deficiency in rats improves inhibitory control but not behavioural flexibility
.
Neuroscience
,
26
,
2066
2073
.
Ishiguro
,
A.
,
Fujii
,
A.
, &
Hotz
,
P. E.
(
2003
).
Neuromodulated control of bipedal locomotion using a polymorphic CPG circuit
.
Adaptive Behavior
,
11
(
1
),
7
17
.
Kalueff
,
A. V.
,
Jensen
,
C. L.
, &
Murphy
,
D. L.
(
2007
).
Locomotory patterns, spatiotemporal organization of exploration and spatial memory in serotonin transporter knockout mice
.
Brain Research
,
1169
,
87
97
.
Khatib
,
O.
(
1986
).
Real-time obstacle avoidance for manipulators and mobile robots
.
International Journal of Robotics Research
,
5
(
1
),
90
99
.
Krichmar
,
J. L.
(
2008
).
The neuromodulatory system: A framework for survival and adaptive behaviour in a challenging world
.
Adaptive Behaviour
,
16
(
6
),
385
399
.
Matsumoto
,
M.
, &
Hikosaka
,
O.
(
2009
).
Two types of dopamine neuron distinctly convey positive and negative motivational signals
.
Nature
,
459
(
7248
),
837
841
.
McClure
,
S.
,
Gilzenrat
,
M.
, &
Cohen
,
J.
(
2006
).
An exploration-exploitation model based on norepinepherine and dopamine activity
. In
Y. Weiss, B. Schölkopf, & J. Platt
(Eds.),
Advances in neural information processing systems, 18
(pp.
867
874
).
Cambridge, MA
:
MIT Press
.
Migita
,
M.
(
2006
).
Starfish behavior as an anticipatory system: Its flexibility in obstacle avoidance
. In
Proc. 7th International Conference on Computing Anticipatory Systems
(pp.
534
540
).
Liège, Belgium
:
CHAOS
.
Parasol Lab
,
Texas A&M University
. (
2010
).
Motion planning benchmarks: Algorithms and applications group
. http://parasol-www.cs.tamu.edu/groups/amatogroup/benchmarks/mp/
Pardo
,
D.
,
Angulo
,
C.
,
Moral
,
S.
, &
Catal
,
A.
(
2009
).
Emerging motor behaviors: Learning joint coordination in articulated mobile robots
.
Neurocomputing
,
72
,
3624
3630
.
Parsa
,
S. S.
,
Daniali
,
H. M.
, &
Ghaderi
,
R.
(
2010
).
Optimization of parallel manipulator trajectory for obstacle and singularity avoidances based on neural network
.
International Journal of Advanced Manufacturing Technology
,
811
816
.
Passino
,
K. M.
(
2004
).
Biomimcry for optimization, control and automation
.
New York
:
Springer
.
Robbins
,
T. W.
,
Granon
,
S.
,
Muir
,
J. L.
,
Durantou
,
F.
,
Harrison
,
A.
, &
Everitt
,
B. J.
(
1998
).
Neural systems underlying arousal and attention: Implications for drug abuse
.
Annals of the New York Academy of Sciences
,
846
(
1
),
222
237
.
Sarter
,
M.
,
Gehring
,
W. J.
, &
Kozak
,
R.
(
2006
).
More attention must be paid: The neurobiology of attentional effort
.
Brain Research Reviews
,
51
(
2
),
145
160
.
Schultz
,
W.
(
1997
).
Dopamine neurons and their role in reward mechanisms
.
Current Opinion in Neurobiology
,
7
(
2
),
191
197
.
Schweighofer
,
N.
,
Bertin
,
M.
,
Shishida
,
K.
,
Okamoto
,
Y.
,
Tanaka
,
S. C.
,
Yamawaki
,
S.
, et al
(
2008
).
Low-serotonin levels increase delayed reward discounting in humans
.
Journal of Neuroscience
,
28
(
17
),
4528
4532
.
Smith
,
B.
,
Saaj
,
C.
, &
Allouis
,
E.
(
2010
).
Evolving legged robots using biologically inspired optimization strategies
. In
Proc. 2010 IEEE International Conference on Robotics and Biomimetics
(pp.
1335
1340
).
Piscataway, NJ
:
IEEE
.
Sporns
,
O.
, &
Alexander
,
W. H.
(
2002
).
Neuromodulation and plasticity in an autonomous robot
.
Neural Networks
,
15
,
761
774
.
Sugam
,
J. A.
,
Day
,
J. J.
,
Wightman
,
R. M.
, &
Carelli
,
R. M.
(
2011
,
November
).
Phasic nucleus accumbens dopamine encodes risk-based decision-making behavior
.
Biological Psychiatry
,
71
,
199
205
.
Tan
,
C. O.
, &
Bullock
,
D.
(
2008
).
A dopamine acetylcholine cascade: Simulating learned and lesion-induced behavior of striatal cholinergic interneurons
.
Journal of Neurophysiology
,
100
,
2409
2421
.
Thompson
,
S. E.
, &
Patel
,
R. V.
(
1987
).
Formulation of joint trajectories for industrial robots using b-splines
.
IEEE Transactions on Industrial Electronics
,
34
(
2
),
192
199
.
Venkataraman
,
S. T.
, &
Gulati
,
S.
(
1993
).
Control of nonlinear systems using terminal sliding modes
.
Journal of Dynamic Systems, Measurement, and Control
,
115
(
3
),
554
560
.
Wu
,
C. F. J.
(
1983
).
On the convergence properties of the EM algorithm
.
Annals of Statistics
,
11
(
1
),
95
103
.
Yu
,
A. J.
, &
Dayan
,
P.
(
2002
).
Acetylcholine in cortical inference
.
Neural Networks
,
15
,
719
730
.
Yu
,
A. J.
, &
Dayan
,
P.
(
2005
).
Uncertainty, neuromodulation, and attention
.
Neuron
,
46
,
681
692
.
Yu
,
C. H.
,
Werfel
,
J.
, &
Nagpal
,
R.
(
2010
).
Collective decision-making in multi-agent systems by implicit leadership
. In
Proc. of the 9th International Conference on Autonomous Agents and Multiagent Systems 3
(pp.
1189
1196
).
N.p.
:
International
Foundation for Autonomous Agents and Multiagent Systems.
Yu
,
S. H.
,
Yu
,
X. H.
, &
Stonier
,
R.
(
2003
).
Continuous finite time control for robotic manipulators with terminal sliding modes
. In
Proc. 6th IEEE International Conference of Information Fusion
(pp.
1433
1440
).
Piscataway, NJ
:
IEEE.
.
Yu
,
X.
, &
Kaynak
,
O.
(
2009
).
Sliding-mode control with soft computing: A survey
.
IEEE Transactions on Industrial Electronics
,
56
(
9
),
3275
3285
.