Abstract
Taking inspiration from the navigation ability of humans, this study investigated a method of providing robotic controllers with a basic sense of position. It incorporated robotic simulators into robotic controllers to provide them with a mechanism to approximate the effects their actions had on the robot. Controllers with and without internal simulators were tested and compared. The proposed controller architecture was shown to outperform the regular controller architecture. However, the longer an internal simulator was executed, the more inaccurate it became. Thus, the performance of controllers with internal simulators reduced over time unless their internal simulator was periodically corrected.
1 Introduction
Evolutionary robotics (ER) is a machine learning paradigm that aims to automatically create robot topologies (hardware) and robot controllers (software) through the use of optimization algorithms inspired by biological evolution. ER topologies and controllers are created with minimal human intervention: humans typically only have to specify a fitness function, which measures how well the evolved robots solve the goal task. Since the demonstration of the viability of ER in the late 1990s [15, 22, 28], ER has been shown to be able to produce unexpected and ingenious robot designs and behaviors for complex tasks that human engineers may have found challenging [13, 31, 47]. ER has also been shown to discover solutions to problems that mimic behavior in the biological world [12, 14, 22].
Robot designs are often hampered by the availability of environmental information and the quality of sensors that provide information about the environment [17]. One of the strengths of ER is that it can be used to produce intelligent (and often surprising) controllers to overcome the physical limitations of the hardware and environment [5, 26]. This article investigates how ER can be employed to create controllers that make use of something similar to the unconscious perception of self-location that humans possess. Additional controller inputs will be generated by a simulator, which will provide the robot with information about its location without actually obtaining real location information from the environment. As a blindfolded human is still partially able to navigate in a familiar space through their sense of position, the goal of this research is to show that simulated location data will enable the development of more effective controllers as well as behaviors that would normally not be possible. A notable feature of this approach is that the only changes to the ER process it requires are software-related. It is proposed that controllers can be provided with an approximation of the robot's state without the need of additional hardware tracking or sensors. This approach is tested by performing various tests in the double T maze to compare the performances of regular and augmented controllers. These experiments demonstrated the superior navigation abilities and behaviors of the augmented controllers.
The rest of this article is structured as follows: Section 2 discusses knowledge relevant to this study, Section 3 introduces the approach explored in it, Section 4 discusses how the study was carried out and the equipment used, Sections 5–9 provide the details of all the experiments performed, and finally Section 10 provides a conclusion on the results of the study.
2 Related Work
In the following subsections, work relevant to this study is discussed. Section 2.1 explains the topic of ER, and Section 2.2 continues this discussion with specific focus on navigation. A brief summary of a concept related to this work, proprioception, is provided in Section 2.3. Finally, Section 2.4 discusses works related to ER and proprioception.
2.1 Evolutionary Robotics
Evolutionary robotics (ER) [28] is a field concerned with developing robotic controllers and architectures through the use of evolutionary techniques. A typical example of ER would be using an evolutionary algorithm (EA) [49] to develop a robotic controller to perform a task. This would involve evaluating a population of randomly created robotic controllers, selecting a subset of the controllers based on their performance, and using these to create a new population. This process continues until a satisfactory controller is created or until another stopping criterion is met.
Due to the nature of the ER process, its duration is highly dependent on the duration of evaluations. If evaluations require supervision and/or human intervention, this will further lengthen the duration. Additionally, while controllers are being evaluated, it is possible that they will attempt actions that are damaging to the robot [36]. For these and other reasons, it can be beneficial to perform evaluations in simulation. That is, as opposed to issuing commands to an actual robot and recording the result, a simulator can be used to predict this result. This allows evaluations to be conducted faster. Furthermore, issues surrounding the robot—for example, mechanical wear or power sources—can be ignored. However, simulation presents new issues. To execute a robot in simulation, one will require a simulator. Furthermore, no simulator will predict the outcome of a robot's actions with complete accuracy. This can mislead the ER process and lead to the situation where controllers perform well in simulation but fail in the real world [29]. This problem is referred to as the reality gap.
Research has been conducted [36, 37, 38, 48] that has attempted to simplify and improve the process of creating simulators. As opposed to using a mathematics- and physics-based approach, these studies used neural networks (NNs) to create simulators. Physical robots were given various commands, and the outcomes of these commands were recorded. This data was used to train NNs, which then acted as simulators. These NNs were termed simulator neural networks (SNNs). Such simulators have been used and demonstrated to be effective [39, 40, 47]. This process of simulator creation requires less knowledge of a robot's dynamics than a mathematics-based approach. Furthermore, these SNNs are considered lightweight NNs. That is, they do not consist of many neurons and can be executed quickly with basic computer hardware.
2.2 Navigation in Evolutionary Robotics
Navigational tasks are typical of evolutionary robotics [11, 21, 27], as the ability to navigate within an environment, based on cues, is a common requirement of many robots. A common navigation benchmark involves an environment known as the double T maze (DTM). The typical method of using this maze as a benchmark is to place the robot at the starting point and then instruct it to navigate to a specific corner. The DTM, and variants of it, have been used in a similar manner as benchmarks in behavioral psychology [7, 43] and ER [1, 4, 6, 19, 34] in the past. In ER, one of the complexities of this task is the requirement that a single controller must be able to navigate from the start to any of the four corners on command. Additionally, the robot's actions are nondeterministic, and the sensors it uses to detect walls are noisy. This task and variants of it have been solved previously in ER with varying levels of success.
Previous studies have shown how controllers can solve the task by exploiting basic behaviors. Carvalho and Nolfi [4] showed how ER controllers may exploit their environment to reduce the cognitive requirements of the task. They demonstrated that, when possible, controllers tended to place themselves in positions and orientations that informed their later decisions, as opposed to remembering the necessary information. This behavior was referred to as cognitive offloading. When the environment in which the task was conducted was adapted so that these controllers could not exploit the environment as they did previously, their performance was significantly reduced. Introducing noise into the robot's position and angle also prevented the development of successful solutions. However, when the added noise was only present in a portion of trials, it was shown that controller performance improved. These controllers still developed cognitive offloading strategies, but also displayed a cognitive ability that allowed them to solve the task even in the presence of noise.
Duarte et al. [6] solved a variant of the DTM task through hierarchical evolution. A process in which a task is broken down into subtasks and controllers was developed for each. While there are advantages to this approach, it strays from ER's idea of automatic controller generation, requires human intervention, and does not allow the development of completely emergent behaviors. Blynel and Floreano [1] used incremental evolution to solve a variant of the DTM task by seeding a population with controllers that had solved a similar but simpler one. This process was used because direct evolution failed to produce successful controllers. Controllers were not told which corner they were intended to reach. Instead, if a controller navigated to a corner, the evaluation was stopped and the controller was informed if the corner it reached was the correct one or not. The controller was then repeatedly executed and was expected to recall, from its first evaluation, which corner was the correct corner. There was however the exception that only the top left or top right corner could be selected as correct. When attempting to switch which of the corners was correct during controller trials, successful controllers were not produced.
Paine and Tani [33] used two fully connected continuous-time recurrent neural networks as robotic controllers. The bottom network was intended to encode primitive motor commands, such as turning left or right or moving in a straight line. The top network was responsible for controlling the activities of the bottom network through a set of control neurons. The top network's behavior was controlled by two task neurons. The internal values of these neurons were initialized with different values to have the network generate, via the lower-level network, different patterns of primitive motor commands. The bottom network was first trained in a single T maze. Then, using the trained bottom network, the top network, as well as the initial task neuron values, was trained in a maze similar to the DTM. This maze could be described as a DTM where each of the goal destinations was split into two corridors for a total of eight goal destinations. Controllers that could reliably reach seven of the eight goals were produced.
Using a standard ER process to train consistently successful controllers to perform the typical DTM task still remains a challenging problem. Furthermore, while Duarte et al. [6] have solved a task that involves a controller returning to the beginning of the DTM, no previous study has investigated a less idealized and more realistic version of this task. This study investigates such a scenario, the typical DTM task and a more complex variant of it.
2.3 Proprioception and Path Integration
Proprioception is the sense of body position and movement [45]. Proprioceptive sensations are the sensations of stimuli that arise from within an organism [24]. This information allows organisms to, among other things, better coordinate their physical actions. Proprioception is also part of the reason why one knows where the parts of one's body are, relative to one another. For example, humans know where their feet are, even while not directly observing them. This, in part, is what allows humans to climb stairs without focusing on the task.
While not the focus of this subsection, it is worth defining the term exteroceptive to differentiate it from proprioceptive. While proprioceptive senses relate to stimuli internal to an organism, exteroceptive senses relate to those that are external [23]. Examples of exteroceptive senses would be the traditional five: sight, hearing, touch, smell, and taste. Proprioception does rely on exteroceptive senses, such as ocular and tactile feedback. However, as its name suggests, proprioception also relies on proprioceptive senses. Populations of sensory neurons embedded in muscles, tendons, and joints provide information about the body's movement and position [45]. These proprioceptors, as they are called, appear in mammals as well as insects. They encode, among other information, muscle load, muscle length, and rate of change of muscle length [45]. The benefits this information provides are largely subconscious and easily taken for granted.
Proprioception helps humans to produce a representation of the physical state of their bodies. While proprioceptors do provide information on the body's physical state, this information is insufficient to determine the absolute location of the parts of the body [41]. Body schemas, and images, are internal representations of the body that humans develop and refine over time, using proprioceptive and other sensory information. These are what provide humans with a sense of limb position. How much of these representations are available to consciousness and how they are constructed are still not entirely clear [3, 41]. However, these representations assist humans in the unconscious movement and control of their bodies [41].
Path integration (PI) refers to the use of self-generated cues to maintain a representation of position and orientation in relation to a starting location [2]. One of these cues is the proprioceptive feedback received from locomotion. Typically, it is assumed that navigation involves the use of landmarks to calculate position and orientation relative to a goal. However, humans (and other animals) are able to begin a journey at a starting point and return to that same point without the use of external references [9]. This allows humans to continue navigation even in the absence of external cues. However, the use of external cues improves navigation, in that without these cues, the use of PI alone results in an accumulation of errors in both distance and direction [10].
2.4 Evolutionary Robotics and Proprioception
Previous research in ER includes investigations where proprioceptive information has been used. A selection of these works is discussed in Sections 2.4.1–2.4.4, and their relation to this study is discussed in Section 2.4.5.
2.4.1 Object Categorization
Nolfi and Marocco tasked a robotic arm to categorize objects based on coarse touch sensor data [30]. The arm consisted of three segments, which together provided six degrees of freedom. A touch sensor was attached to the end of the arm. By moving the arm, controllers were required to categorize an object placed beneath the arm.
NNs were used as robotic controllers in this experiment, and the touch sensors formed part of the inputs of the NNs. However, due to how coarse the sensory information was, it alone would likely not be enough to categorize objects. Thus, controllers also received proprioceptive information regarding the physical pose of the robotic arm. This information was generated by sensors on the arm itself.
The NNs produced subsequent motor commands as well as an output that represented the category of the object it was classifying. A NN's commands were executed until a stopping condition was met, and at this point, the object category provided by the NN was considered its final decision. The produced solutions displayed the behaviors where the proprioceptive motor information was used, in conjunction with the coarse sensory information, to correctly classify objects.
2.4.2 Reach-and-Grasp Development
Savastano and Nolfi performed an experiment in which it was attempted to develop reaching and grasping behaviors, similar to those of an infant, in a robot [42]. An iCub humanoid robot [25] was used in the experiment. It consisted of 53 motors, which could provide proprioceptive information regarding their physical state, as well as several external sensors.
The robot was placed in an upright position with a ball suspended in front of its head. The ball's position was varied, but was required to be within a certain angular range of the robot's head. The ultimate task assigned to the robot was for it to extend its right arm and grasp the ball. However, the training process was divided into phases. The task associated with each phase was progressively more difficult than the one before it. The task of the final phase was the ultimate task. These phases were intended to more closely resemble how an infant would go about learning this reaching and grasping behavior. This process was shown to produce better results than when using a single training phase.
The robotic controllers used were NNs in which both proprioceptive and exteroceptive information made up the inputs. The proprioceptive information consisted of the angular position of the motors within the right arm, torso, and head. The exteroceptive information consisted of inputs from the camera and the tactile sensors of the right hand of the robot.
The results of the experiment showed that it was possible to train this complex behavior in a robot and that this behavior closely resembled that of an infant. The experiment was performed in simulation; however, planned extensions to the work included testing trained controllers on physical hardware.
2.4.3 Bodily Awareness in Soft Robots
The complex nature of the body of soft robots makes the task of describing the state of their body more difficult than in the case of hard robots [44]. Research has been attempted by Soter et al. to provide a soft robot, in particular a fabricated octopus-inspired arm, with a form of bodily awareness [44].
The arm could be described as a sandwich of flexible silicone with four bend sensors in the middle. The arm was connected, at its base, to a carriage that could move along a single axis. The motion of the carriage was used to excite the arm, and it moved according to a function that ensured that there was no repetition of motion during trials. Finally, the arm was suspended in clear water in a transparent container, which faced a camera.
The camera was used to capture the physical state of the arm at a specific point in time. Corresponding camera and sensor information was captured for several trials. Attempts to reproduce images based on sensor values were unsuccessful. This was due to the large number of pixels that made up the images, compared to the number of bend sensor values. Thus, a convolution neural network (CNN) encoder-and-decoder pair was trained. The encoder was used to reduce the dimensionality of the images. The decoder reproduced the original image from the low-dimensionality representation. A recurrent neutral network (RNN) was then trained to map the bend sensor values to the low-dimensionality representation. The output of the RNN was then fed into a decoder to produce an image. This approach proved successful.
The accuracy of the results of the experiment suggested that this approach was effective for providing a soft robot with a form of body awareness. Furthermore, this suggested that soft robots, despite their complexity, do have a low-dimensionality representation. However, this method of encoding a robot's state was not used to perform any task or enhance any existing robotic controller.
2.4.4 The Double T Maze and Simulated Location Data
Preliminary research conducted by the authors is closely related to the current study [35]. The study attempted to provide a wheeled robot with an approximation of its own position and determine the benefit this provided. This information was provided to the robot by a SNN that produced simulated location data during the robot's evaluations.
The benchmark chosen to use for comparison was the double T maze. At the beginning of a trial, the robot was instructed to navigate to one of the four corners. If the robot took too long to complete its task, or navigated to the wrong corner, the trial was terminated. Robotic controllers that did and did not incorporate the additional information were evolved. These controllers were evaluated in both simulation and the real world. The augmented controllers outperformed their counterparts in both simulation and the real world. However, the performance difference observed between controller types in the real world was larger than the one observed in simulation. This suggested that the simulated location information may have been useful in reducing the effect of the reality gap.
The study did, however, use a basic NN architecture for its controllers and only performed a single experiment. It is possible that these factors could have biased performance in favor of the augmented controllers.
2.4.5 Limitations of Existing Work
In the experiments described in Sections 2.4.1–2.4.3, proprioceptive information was provided directly from sensors on the robot. This study instead proposes that information of this kind could be generated by a robotic simulator, this approach thus being more akin to path integration in humans. Furthermore, none of these studies attempted to evaluate the usefulness of proprioceptive information in robotic controllers.
In the study described in Section 2.4.4, a SNN was incorporated into robotic controllers and used to approximate the robot's physical state. The information provided by the simulator was completely based on the robot's previous actions. No additional sensors were necessary to produce this information. This approach is what the current study will further investigate.
3 Proposed Approach
During the ER process, evolution is typically performed in simulation. These simulators take in a robot's state together with a command and produce the state that will result after the execution of that command. The premise of this research is that the simulator can also be used in the real world to approximate a non-simulated robot's state, thereby providing additional useful input to a robotic controller. For example, consider a robot that is placed in a known state and asked to execute a command. While this command is being executed, a simulator can approximate the resulting state of the robot. Unless the robot had sensors in place to sense its location, it would not otherwise have this information. This approximation can be provided as input into controllers and can allow more informed decisions to be made regarding the robot's subsequent commands. Furthermore, this information can also facilitate the evolution of more advanced controllers. To avoid any confusion with existing terms, the term location perception (LP) is used in this article to refer to this process of a robot approximating its own state by means of a robotic simulator. Furthermore, controllers that do not use this approach are called non-location-perception (NLP) controllers.
The suggested approach will require that robots be equipped with processors that will execute the simulator in order to calculate the approximate state. Furthermore, these processors will need to be powerful enough to execute the simulator fast enough such that a state can be approximated in less time than it takes to execute the same command in the real world. The goal is to provide the robot with LP information without forcing it to operate at a pace slower than usual. This suggests that in this case a SNN would be a good choice of simulator, as it has been shown in previous research that SNNs tend to be less computationally expensive than more typical mathematically based simulators [39]. If more typical types of simulators were used, it might not be possible to execute the simulator in real time on robotic hardware.
A potential shortfall of this approach concerns the accuracy of the LP information. A robot will execute a command, and the SNN will approximate the resulting state. However, the robot will not have the required sensors to confirm whether the approximated state is entirely correct or not. Despite this, the approximated state will need to be used in the calculation of the following approximation. This could result in compounding errors as more and more commands are executed by the robot. In reality, robotic commands are typically nondeterministic, and their results cannot be predicted exactly. Thus, even the most accurate simulator will suffer from this problem of compounding errors. However, the more accurate the simulator, the less severe this problem will be. This again suggests that SNNs would be a good choice of simulator, as they have been shown to be highly accurate [39]. Despite this accuracy, it is expected that over time the LP information will diverge from the real-world position to a point where the information will no longer be useful. An idea, investigated in this study, to address this problem involves periodically resetting the LP information when possible.
A motivation for the approach suggested in this study is that its implementation requires very little additional effort. The task that the robot needs to accomplish needn't be adapted, nor do any additional sensors need to be installed or used. The SNN will not need to be specifically sourced or created, as it would have already been required in order to carry out the ER process.
This article presents a comparison drawn between robotic controllers that receive LP information and those that do not. The additional information that will be provided to LP controllers robots will be simulated physical location information, namely, the robot's position and angle within an environment. It is hypothesized that the controllers that receive the LP information will develop more advanced, and ultimately superior, behaviors in comparison to their counterparts. These behaviors could include, but are not limited to, solving a task with a more efficient strategy, navigating more effectively, or solving a task that would be impossible without LP information.
Simultaneous localization and mapping (SLAM) involves placing a robot in an unknown environment, and having it build a map of this environment and simultaneously determine its position within it [8]. While the ability of a robot to accomplish this is no doubt useful in certain scenarios, such scenarios are not the focus of this research. This research does suggest a method of providing what could be considered a form of localization. However, it is not concerned with the mapping of an environment. Instead, its contribution lies in the benefit it can offer navigation solutions as well as the simplicity of its implementation.
4 Methodology
The goals of this research involve determining the usefulness of the simulated location information as well as determining if this information remains useful as task complexity is increased. Furthermore, it was mentioned in Section 3 that, due to the approach taken in this study, the LP information is expected to drift over time. The effect of this drift, and the potential methods to rectify it, will also be investigated.
This study will take an iterative approach to experimentation. The results of each experiment will influence the development of the following experiment. A total of five investigations were conducted, and their details are presented in Sections 5–9. Section 4.1 discusses the equipment used. Sections 4.2 and 4.3 specify details of the implementations of the EA and NNs used in this study. Section 4.4 states what data were collected during experimentation.
4.1 Experimental Setup
The Khepera III (Figure 1), a differentially wheeled robot, was the robot used to carry out the experiments [20]. The robot is equipped with nine infrared sensors that surround its body and measure the distance between the robot and its surroundings. The values these sensors produced were approximately in the range [0, 4000]. These sensor values were normalized to a range of [0, 1] before being used by controllers.
All experiments were conducted within a double T maze (DTM) (see Figure 2). To improve the consistency of the infrared sensors, the inner walls of the maze were lined with reflective tape. Furthermore, the robot was operated on a non-slip surface. These conditions were also simulated, to allow the ER process to be conducted in simulation. The simulator used had been developed and successfully used in previous research [38]. Noise was added to the outputs of the simulator to encourage controllers to learn more robust behaviors in anticipation of real world trials. The amount of noise applied was tweaked until satisfactory transfer of controllers from simulation to the real world was observed.
To collect the necessary data from experiments, the robot's actions during evaluations needed to be recorded. In simulation, the robot's exact state was known at all times. In the real world, this data was captured by a camera that overlooked the DTM. Image processing software was used to detect markers that were placed on top of the robot to determine its position [32]. However, this information was used only to record robot evaluations; actual location information was not provided to controllers.
4.2 Evolutionary Process
A standard EA was used to drive the evolutionary process. A population size of 100 was used. Tournament selection, with tournament sizes of 10, uniform crossover, and Gaussian mutation, with a standard deviation of 0.2 and mutation probability of 20%, were the chosen genetic operators. Elitism was incorporated, saving the best two chromosomes in each population. Chromosomes represented robotic controllers, which were simply long short-term memory (LSTM) networks. The number of generations used in the EA varied in each experiment, and these values are provided with each experiment's details.
4.3 Neural Network Architectures
All the controllers used in this study were neural-network-based controllers, specifically LSTM networks. LSTM networks were originally introduced by Hochreiter and Schmidhuber [18]. These networks were specifically designed to store information over many iterations. The LSTM architecture used in this study is similar to the one described by Gers and Schmidhuber [16].
Two controller architectures were compared, one that received LP information and one that did not (see Figure 3). Both architectures were LSTM networks with a single hidden layer. While not pictured in Figure 3, the hidden units were recurrent, as is typical in LSTM networks. Additionally, both architectures shared the characteristic that the motor commands they produced would be received as inputs the next time the network was fired. The only difference between these two architectures was that one received four additional inputs representing the LP information. The four inputs corresponded to the robot's x position, y position, and angle. However, the angular information was divided into two inputs, namely the sine and cosine of the angle. This normalization was performed to assist the NNs to better interpret the angle. The size of the hidden layer was varied for each experiment to allow for task complexity. In certain experiments, additional inputs were given to the networks; the details of these are provided with the experiment's own details (Sections 5–9). All NN weights were initialized using a uniform distribution with a standard deviation of 0.28 and a mean of 0.
4.4 Analysis
Controllers were evaluated in both simulation and the real world. The number of successful evaluations and the duration of the evaluations were recorded. This data allowed a quantitative comparison between the performance of the LP and NLP controllers. Furthermore, the paths taken by the robot in the real world were recorded. This allowed controller behaviors to be examined and compared qualitatively.
5 Investigation 1
This investigation offers an initial comparison between LP and NLP controllers. In the experiment of this section, robotic controllers were required to navigate the DTM in the typical manner. More specifically, controllers started at the beginning of the maze and were instructed, for the duration of their trials, to navigate toward a specific corner.
5.1 Task
The robot was placed at the starting point of the maze (see Figure 2) and was instructed to reach one of the four corners. Controllers were informed which corner they were expected to navigate toward, via additional inputs into their NN. Controllers received four additional inputs, each corresponding to a corner. To instruct a controller to navigate to a chosen corner, the corner's corresponding input was set to a value of 1, while the rest were set to 0. These values did not change for the duration of the controller's evaluation. Once any of the four corners was reached, the evaluation was stopped, and a fitness score was calculated. Each corner had a 12-cm-radius zone associated with it, and if the robot entered this zone, the corner was considered reached.
5.2 Fitness Function
The robotic controllers were trained, in simulation, with an EA. The walls of the maze were simulated from a sensory point of view, but not physically. What is meant by that is that the infrared sensors on the robot were simulated as if the walls of the maze were present. However, the motion of the robot did not take account of the presence of the walls. This significantly simplifies the task of simulation. However, it meant that, in simulation, the robot could move through walls. This simplification was reasonable in that for a controller to efficiently solve this task it would have to avoid the walls; thus, collision need not be simulated.
Checkpoints were set at the intersections in the maze. A controller would need to pass through these checkpoints before reaching the instructed destination for the execution to be considered a success. In Figure 2 there are three intersections, namely the left, middle, and right intersections.
Thus, in order to optimize this fitness function, controllers would need to navigate to the intended goal, stay within the bounds of the maze, minimize their number of motor commands, and keep their distance from the walls.
5.3 Evaluation Criteria
A total of five controllers of each standard architecture (see Section 4.3) were evolved. The hidden layer of these controllers consisted of 20 LSTM cells. These controllers were evaluated in both simulation and the real world. Controller results were analyzed in terms of their number of successful trials. Furthermore, on successful trials, the number of motor commands necessary to complete the task was also analyzed. An evaluation was considered successful if a controller navigated to the indicated corner within a threshold number of motor commands. The motor command limit for this experiment was 150, and controllers were evolved for 3,000 generations.
5.4 Results
In simulation, each of the five controllers was evaluated 250 times per corner, for a total of 1,000 evaluations. In the real world each was evaluated twice per corner, for a total of eight evaluations. The results of these evaluations are given in Table 1. Figures 4–8 present five composite images of controller evaluations in the real world. The first four figures display successful evaluations of LP controllers. Figure 8 displays an unsuccessful evaluation of a NLP controller.
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 917 | 837 | 960 | 692 | 725 | 4131/5000 | 83% |
LP | 949 | 888 | 918 | 935 | 919 | 4609/5000 | 92% | ||
Real world | NLP | 5 | 8 | 6 | 5 | 6 | 30/40 | 75% | |
LP | 8 | 8 | 8 | 8 | 8 | 40/40 | 100% | ||
. | Average . | ||||||||
Average number of motor commands | Simulated | NLP | 77 | 77 | 76 | 78 | 76 | 77 | |
LP | 74 | 75 | 76 | 77 | 75 | 76 | |||
Real world | NLP | 72 | 76 | 77 | 81 | 74 | 76 | ||
LP | 75 | 73 | 76 | 77 | 78 | 75 |
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 917 | 837 | 960 | 692 | 725 | 4131/5000 | 83% |
LP | 949 | 888 | 918 | 935 | 919 | 4609/5000 | 92% | ||
Real world | NLP | 5 | 8 | 6 | 5 | 6 | 30/40 | 75% | |
LP | 8 | 8 | 8 | 8 | 8 | 40/40 | 100% | ||
. | Average . | ||||||||
Average number of motor commands | Simulated | NLP | 77 | 77 | 76 | 78 | 76 | 77 | |
LP | 74 | 75 | 76 | 77 | 75 | 76 | |||
Real world | NLP | 72 | 76 | 77 | 81 | 74 | 76 | ||
LP | 75 | 73 | 76 | 77 | 78 | 75 |
From Table 1, overall, the LP controllers were more successful than their counterparts in both simulation and the real world. The performance difference in the real world was larger than the one in simulation. The LP controllers achieved a 100% success rate in the real world, while the NLP controllers only achieved 75%. This strongly supports the hypothesis that LP information enables the development of superior controllers. Furthermore, the success rates between NLP controllers fluctuated more than those of the LP controllers. For example, the difference in successful simulated evaluations between the best and worst NLP controllers was 268, or 26.8% of a controller's evaluations. The same statistic for LP controllers was 61, or 6.1%. This suggests that LP information allows for more consistent production of controllers.
The lower half of Table 1 shows the average number of motor commands performed by the controllers during an evaluation, if only successful evaluations are considered. The results within this portion of the table do not appear to show any significant difference between controller types.
6 Investigation 2
The previous investigation demonstrated that the proposed approach of providing simulated location information to a real-world robot does result in a more effective and robust controller. However, the argument can be made that the controller itself should be able to learn to produce the information provided by the simulator (especially since the simulator in this case is also a NN), thus making the suggested technique unnecessary. This section investigates the possibility that NLP controllers could learn their own form of location perception. The NLP architecture was extended to contain additional recurrent connections and also explicitly encouraged to simulate its own location information.
6.1 Extended Architectures
In order to allow NLP controllers to learn their own form of location perception, a new NN architecture was required (Figure 9). This architecture extended the one in Figure 3(a) by adding additional recurrent connections between the output and input layer. These connections were initialized with the robot's initial position. Furthermore, each of these outputs was accumulated before being passed to the input layer (with the exception of controller type E2, introduced below). That is, the value of one of these input neurons, on the controller's nth execution, would be equal to the sum of the previous n − 1 outputs of its corresponding output neuron. Thus, controllers were required to simulate the change in LP parameters and not the actual values.
Four types of controllers used this architecture; they are referred to as E1, E2, E3, and E4. As was the case in investigation 1, the hidden layer of all controllers consisted of 20 LSTM cells.
The first experiment served to prove that the controller architecture was in fact capable of simulating the controller outputs. Controller type E1 was created for this purpose. The outputs of this controller type were treated in the same way as the SNN used in this study. That is, this network had three additional outputs. These outputs corresponded to the x position, y position, and angle of the robot. However, the sine and cosine functions were applied to this angle before passing it as an input to the controller. Thus, this network still had four additional inputs. Since this experiment was only concerned with the question of whether the architecture could simulate robotic movements or not, the controller did not have any motor outputs. Instead, a LP controller from investigation 1 was executed repeatedly in the DTM. All the inputs that this controller experienced were passed to the E1 controller. The E1 controller was then required to output the changes in x position, y position, and angle that the robot would experience.
Controller type E2 was required to produce location information and motor commands and was the most basic and naive implementation tested. This controller did not have its outputs accumulated on its behalf, nor was its fitness function adjusted in any way. This network simply had four recurrent inputs and outputs, which were initialized with the robot's initial position. Thus, the network was not forced to simulate accurate location information in order to optimize the fitness function. This experiment served to determine if controllers would naturally develop their own form of location simulation. The same fitness function used in investigation 1 was used for E2.
6.2 Results
All controllers were tested in simulation. In the upper half of Table 2, the success rates of 1,000 simulated evaluations are shown for five controllers of each type. The results of the NLP controllers of investigation 1 have been included in the table for comparison. As E1 controllers acted only as simulators, no success rates appear for this controller type. The lower half of Table 2 presents the average simulation fitness, Q. A lower number represents better accuracy. Since NLP and E2 controllers did not incorporate simulation fitness, no results appear for these controller types.
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 917 | 837 | 960 | 692 | 725 | 4131/5000 | 83% |
E2 | 610 | 649 | 746 | 712 | 752 | 3469/5000 | 69% | ||
E3 | 738 | 881 | 708 | 645 | 0 | 2972/5000 | 59% | ||
E4 | 825 | 837 | 600 | 0 | 927 | 3189/5000 | 64% | ||
. | Average . | ||||||||
Average fitness | Simulated | E1 | 337 | 324 | 302 | 264 | 357 | 316 | |
E3 | 396 | 373 | 416 | 401 | 137 | 345 | |||
E4 | 362 | 402 | 399 | 198 | 344 | 341 |
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 917 | 837 | 960 | 692 | 725 | 4131/5000 | 83% |
E2 | 610 | 649 | 746 | 712 | 752 | 3469/5000 | 69% | ||
E3 | 738 | 881 | 708 | 645 | 0 | 2972/5000 | 59% | ||
E4 | 825 | 837 | 600 | 0 | 927 | 3189/5000 | 64% | ||
. | Average . | ||||||||
Average fitness | Simulated | E1 | 337 | 324 | 302 | 264 | 357 | 316 | |
E3 | 396 | 373 | 416 | 401 | 137 | 345 | |||
E4 | 362 | 402 | 399 | 198 | 344 | 341 |
Figures 10 and 11 present two paths of E1 controllers. The black line represents the actual path of the controller, the red line represents the SNN output, and the blue line represents the path simulated by the controller. These figures indicate that the controller architecture was capable of simulating the robot's movements to a degree; thus the NN architecture is not fundamentally incapable of performing this task.
The imperfect location simulation is not surprising when considering that the SNN used in this work was trained using a large set of training data gathered from random robot movements [38]. During the evolution of E1 controllers, only a small set of movements are available to train the simulator (the movements resulting from the robot navigating a single path), thus providing very limited training data, which can easily lead to overfitting.
Controller type E2, which had no fitness function adjustments applied to it, did not perform as well as the regular NLP controllers of investigation 1. Visualizing the values of the additional recurrent outputs shows that these controllers did not use the outputs for the purpose of simulation. Figures 12 and 13 present two paths taken by E2 controllers. Thus, simply providing controllers with the necessary recurrent outputs did not result in them learning to simulate their own location information. In fact, it reduced the performance of controllers, possibly due to the more complex search space created as a result of the recurrent connections.
The first notable point about the E3 and E4 results was that a controller of each type achieved a success rate considerably less than E2 and the NLP controllers. Upon examining the behavior of these controllers, it became clear that the evolutionary algorithm (EA), which drove the training process, became stuck in a suboptimal area of the search space. During the initial stages of the evolution process the location simulation component of the fitness function is most easily minimized by the controller providing the robot with motor commands that result in minimal movement. So, as long as the controller's simulated outputs are small, the controller receives a near-optimal location simulation score. However, this prevents the development of controllers that could actually solve the task. The two controllers that achieved a success rate of 0 both displayed this suboptimal behavior.
The second notable point about the E3 and E4 results was that the simulation score of these controllers appeared to be similar to those of E1 controllers. This suggested that these controllers managed to (at least partially) simulate their actions; however, the success rates of these controllers were lower than those of the NLP controllers. Viewing the paths of these controllers offered an explanation for these results. Figures 14 and 15 show paths of E3 and E4 controllers that achieved success rates greater than 0. These controllers learnt to simulate a general path that would always receive a moderate simulation score, but did not actually represent the actual path of the robot. The success rates of these controllers were worse than those of NLP and E2 controllers, which suggested that even though controllers learnt to solve the task without simulation, forcing them to simulate their outputs ultimately reduced their ability to perform the actual task.
The results of the E1 controllers showed that the architecture was capable of simulating robotic movement. However, the results of controller types E3 and E4 showed that controller and simulator behaviors did not train simultaneously. This was a result of the EA exploiting simplistic, but suboptimal, solutions.
Preliminary experiments explored different weightings of the controller and simulator components of the fitness function. However, when one component was favored too heavily, only one behavior was optimized, and in turn it prevented the other behavior from evolving. If the simulation behavior was favored, this promoted the development of controllers that produced minimal movement. However, if controller behavior was favored, the simulation fitness component was overshadowed by the controller component. This led to controllers producing inaccurate simulation results. In this case, since the simulation outputs of the network did not benefit the controller, they ultimately acted as a distraction to the EA. However, the weighting of each fitness component was not the primary reason why these behaviors could not be evolved simultaneously; rather, the interdependence of the two components prevented simultaneous independent evolution.
A solution could be to train these two components separately: first the simulation component and then the controller component. This could be achieved by assigning a portion of the NN's weights to simulation and freezing these once trained, or creating a NN for each component. However, such an approach would be similar to the one explored in investigation 1.
7 Investigation 3
While the first investigation did indicate that location perception was beneficial to controllers, it was not obvious to what extent the controllers were using the LP data. This experiment involved a more complex task in which positional information would be more beneficial to controllers.
7.1 Task
Controllers would be started at a random location in the left or right vertical corridor of the DTM. From this point the controllers would be required to navigate to the starting point of the DTM. If controllers were to solve this task as efficiently as possible, they would be required to understand their starting position and follow the shortest path to the starting location.
Due to the nature of this task, using the standard NLP architecture (Section 4.3) would not have allowed a fair comparison, the reason being that for LP controllers to function, they must have an initial point upon which to base their LP calculations. If LP controllers receive this information, then to allow fair comparison, NLP controllers should as well. For this reason, the NLP architecture used was the one described in Section 6.1. Considering the results of investigation 2, where controller type E1 performed worse than the standard NLP architecture, this may be seen as a poor choice of controller type. However, a preliminary experiment was conducted that compared the performance of regular NLP controllers with that of the type E1 controllers. It was confirmed that, for this experiment, controllers of type E1 performed better than controllers of the standard NLP architecture.
7.2 Fitness Function
The fitness function used in this experiment was similar to the one used in investigation 1 (Section 5). With reference to the previously used fitness function, the only terms that changed were D and P. The checkpoints that were used to calculate the values of these terms were adjusted to follow a path appropriate to this task.
7.3 Evaluation Criteria
Five controllers of each architecture were evolved in simulation. The hidden layer of these controllers consisted of 16 LSTM cells. Controller performance was analyzed in the same terms as previously, specifically, the number of successful trials and the number of motor commands on successful trials. The motor command limit of this experiment was 200, and controllers were evolved for 4,000 generations.
7.4 Results
In simulation, 500 evaluations were performed when starting in the left corridor and a further 500 evaluations when starting in the right corridor, totaling 1,000 evaluations per controller. In the real world, this was reduced to four evaluations per corridor, totaling eight evaluations per controller. The results of these evaluations appear in Table 3.
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 233 | 532 | 302 | 561 | 510 | 2138/5000 | 43% |
LP | 997 | 957 | 909 | 971 | 917 | 4751/5000 | 95% | ||
Real world | NLP | 0 | 0 | 5 | 4 | 4 | 13/40 | 33% | |
LP | 4 | 2 | 7 | 6 | 6 | 25/40 | 63% | ||
. | Average . | ||||||||
Average number of motor commands | Simulated | NLP | 137 | 121 | 133 | 123 | 118 | 126 | |
LP | 74 | 66 | 66 | 68 | 69 | 69 | |||
Real world | NLP | – | – | 138 | 117 | 120 | 125 | ||
LP | 103 | 62 | 70 | 71 | 79 | 77 |
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 233 | 532 | 302 | 561 | 510 | 2138/5000 | 43% |
LP | 997 | 957 | 909 | 971 | 917 | 4751/5000 | 95% | ||
Real world | NLP | 0 | 0 | 5 | 4 | 4 | 13/40 | 33% | |
LP | 4 | 2 | 7 | 6 | 6 | 25/40 | 63% | ||
. | Average . | ||||||||
Average number of motor commands | Simulated | NLP | 137 | 121 | 133 | 123 | 118 | 126 | |
LP | 74 | 66 | 66 | 68 | 69 | 69 | |||
Real world | NLP | – | – | 138 | 117 | 120 | 125 | ||
LP | 103 | 62 | 70 | 71 | 79 | 77 |
According to Table 3, in both simulation and the real world, LP controllers significantly outperformed their counterparts on both performance measures. Figures 16 and 17 present a selection of paths taken by controllers in the real world. Figure 16 shows paths of NLP controllers, and Figure 17 shows paths of LP controllers.
These figures show that the LP controllers managed to generalize their behavior so that they could navigate directly toward the beginning of the maze, regardless of the side of the maze they started from. However, considering the top left and bottom right paths of Figure 16, certain NLP controllers learnt a simple wall-following behavior. This behavior led to a successful result when starting on the left side of the maze. When starting on the right side, however, the evaluation was unsuccessful. The reason is that following a wall may, depending on your starting point, lead you to the opposite corridor and not the start of the maze. In these cases, controllers reached the maximum number of motor commands and the evaluation was considered a failure. In the other paths pictured in Figure 16, certain NLP controllers had more efficient behaviors where they did not simply follow a wall on a certain side. However, these paths were jagged and indecisive compared to the paths in Figure 17. Furthermore, Table 3 shows that NLP controllers always used, on average, more motor commands than LP controllers. This is evidence that the LP information enabled LP controllers to develop superior behaviors.
8 Investigation 4
At this point, LP controllers had been able to complete all investigated tasks to an acceptable degree. This experiment looked to test the limits of the controllers. Not only was the task of this experiment more complicated, it was also longer in duration than the previous tasks. As was previously highlighted, it was expected that LP information would diverge over time. This experiment served to test the usefulness of location perception over a longer duration.
8.1 Task
Controllers were required to visit all four corners of the DTM, in a specified order, in one evaluation. However, to further complicate the task, each controller was required to navigate four distinct orders of corners. The four corner patterns were as follows:
- •
Bottom left, top right, top left, bottom right
- •
Bottom right, top left, top right, bottom left
- •
Top left, bottom right, bottom left, top right
- •
Top right, bottom left, bottom right, top left
Once a controller had navigated to the correct corner in a pattern, it would then be allowed to navigate to the next one. If a controller navigated to an incorrect corner, the evaluation would stop.
8.2 Fitness Function
The fitness function used a checkpoint system similar to the one described in Section 7.2. Checkpoints were set at the intersections or corners in an order that controllers were expected to follow. For example, consider the first pattern of corners in Section 8.1. The order of checkpoints would be: middle intersection, left intersection, bottom left corner, left intersection, middle intersection, right intersection, and so on. Besides this, the factors of the fitness function remain unchanged.
8.3 Setup
Controllers were informed of which pattern they were expected to follow via four additional inputs into their NNs. Each input corresponded to a different pattern. The values of these inputs did not change for the duration of the execution. Since the initial position and orientation of the robot in this experiment were not randomized as they were in investigation 3, none of the extended architectures discussed in Section 6.1 were used. Thus, the only two network types that were trained and tested were those discussed in Section 4.3.
8.4 Evaluation Criteria
As was the case previously, controller performance was examined in terms of successful evaluations and number of motor commands on successful evaluations. However, due to the complexity of this task, it was not expected that controllers would succeed at the task often. To address this, the number of correct corners reached during evaluations was also recorded. The motor command limit of this experiment was 500, and controllers were evolved for 5,000 generations. The hidden layer of these controllers consisted of 36 LSTM cells.
8.5 Results
In simulation, each controller was required to perform each pattern 250 times, totaling 1,000 evaluations per controller. In the real world, this was reduced to twice per pattern, totaling eight evaluations per controller.
From Table 4, the successful evaluation and motor command metrics do not describe controller performance as accurately as previously. This is because most controllers only learnt to solve the task partially. But since the first two metrics of Table 4 do not award partial success, it appears that every controller was a failure. The complexity of the task required of these controllers should be stressed. Controllers were required to learn four patterns as well as navigate these patterns on demand, using a robot that behaved nondeterministically while avoiding walls, which were detected by noisy sensors. Thus, the poor performance of controllers is not surprising.
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 0 | 0 | 0 | 0 | 0 | 0/5000 | 0% |
LP | 23 | 40 | 8 | 8 | 43 | 122/5000 | 2% | ||
Real world | NLP | 0 | 0 | 0 | 0 | 0 | 0/40 | 0% | |
LP | 0 | 0 | 0 | 0 | 0 | 0/40 | 0% | ||
. | Average . | ||||||||
Average number of motor commands | Simulated | NLP | – | – | – | – | – | – | |
LP | 488 | 462 | 453 | 462 | 459 | 465 | |||
Real world | NLP | – | – | – | – | – | – | ||
LP | – | – | – | – | – | – | |||
. | Total . | Average . | |||||||
Corners reached | Simulated | NLP | 1423 | 341 | 1212 | 901 | 1233 | 5110/20000 | 26% |
LP | 1803 | 2127 | 1789 | 1757 | 1935 | 9411/20000 | 47% | ||
Real world | NLP | 11 | 4 | 8 | 0 | 11 | 34/160 | 21% | |
LP | 17 | 11 | 6 | 12 | 15 | 61/160 | 38% |
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | NLP | 0 | 0 | 0 | 0 | 0 | 0/5000 | 0% |
LP | 23 | 40 | 8 | 8 | 43 | 122/5000 | 2% | ||
Real world | NLP | 0 | 0 | 0 | 0 | 0 | 0/40 | 0% | |
LP | 0 | 0 | 0 | 0 | 0 | 0/40 | 0% | ||
. | Average . | ||||||||
Average number of motor commands | Simulated | NLP | – | – | – | – | – | – | |
LP | 488 | 462 | 453 | 462 | 459 | 465 | |||
Real world | NLP | – | – | – | – | – | – | ||
LP | – | – | – | – | – | – | |||
. | Total . | Average . | |||||||
Corners reached | Simulated | NLP | 1423 | 341 | 1212 | 901 | 1233 | 5110/20000 | 26% |
LP | 1803 | 2127 | 1789 | 1757 | 1935 | 9411/20000 | 47% | ||
Real world | NLP | 11 | 4 | 8 | 0 | 11 | 34/160 | 21% | |
LP | 17 | 11 | 6 | 12 | 15 | 61/160 | 38% |
Neither controller type demonstrated any ability to solve the task entirely. However, while NLP controllers had no successful evaluations at all, LP controllers did have rare successful evaluations in simulation.
By considering the number of corners reached, it is possible to better compare the two controller types. This metric indicates that controllers did partially solve the task. In this case, the NLP controllers were, again, outperformed by their LP counterparts.
As mentioned previously, as evaluations progress, LP information becomes less reliable, as it bases its approximations on the most recent approximation. This experiment also served to test the usefulness of LP data over a longer duration. In Figure 18, the actual path of the robot is shown by the black line, and the LP path shown by the red line. From this figure, it can be seen that, in these cases, the LP data diverged from the real-world position to the point where it was likely no longer useful to the controller. The LP information started out accurately but became worse as more motor commands were executed, as was expected. Consider the top right path of Figure 18. In the beginning of this path, the LP data was relatively accurate. However, after the second turn, the LP path was, at times, outside the bounds of the maze, not near any possible location the robot could have been at, and not in an accurate orientation. In this case, it is likely that the LP information was more misleading than beneficial to the controller. As time passes and more motor commands are executed, the LP data diverges and the advantage that LP controllers have over their counterparts diminishes. This is an inherent limitation of this approach. Investigation 5 introduces a method of reducing this divergence and examines the performance of LP controllers under these conditions.
9 Investigation 5
In investigation 4, no controller was able to consistently solve the entire task. While the LP controllers outperformed their counterparts, they still did not entirely solve the task. Upon inspection of the LP data, it was seen that as evaluations progressed, the data diverged to the point where it became arguably useless or even misleading. This divergence is an unfortunate result of the nondeterministic nature of the robot actions. Motor commands do not always have the same effect on the robot, and infrared sensors do not always produce the same values in the same conditions. Since the SNN bases its calculations on its past approximations, any errors it makes compound on one another.
The approach to reducing this divergence that this study will investigate involves periodically correcting the LP information. An example of such an approach would be if one had to imagine navigating around a room while blindfolded. As time passes and one moves around, one's sense of position relative to the room diverges. However, briefly removing the blindfold and seeing the surrounding room would reposition one's internal sense of position. An example more relevant to a robot would be a robot navigating around a factory. At points in the factory where the robot's exact position is crucial, there could be sensors in place that could detect the robot's position. This information could be used to override the robot's LP position. But between these points in the factory, where precise positioning is not necessary, the robot could rely on just its LP position.
9.1 Task
This investigation carried out the same experiment as that of investigation 4. However, only LP controllers were examined, and furthermore, these controllers had their LP data periodically reset. This was accomplished by marking each corner of the maze as a reset zone (the gray areas in Figure 19). When controllers were in these reset zones, their LP position would be overridden by their actual position. The real-world position of the robot was obtained from a camera that overlooked the environment in which the controllers were evaluated. The remaining details of this experiment did not differ from investigation 4.
9.2 Results
Compared to the results of investigation 4, Table 5 shows an improvement in all regards. It can be seen from the number of successful evaluations that not all controllers managed to solve the task entirely. Previously, however, no controller managed to do so. Furthermore, even the controllers that did not solve the task consistently, or at all, performed better, on the number of corners reached, than did the controllers in investigation 4. In Figure 20, two successful real-world evaluations are pictured. In this figure, the LP corrections are depicted by the blue arrows.
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | LP | 407 | 371 | 127 | 0 | 175 | 1080/5000 | 22% |
Real world | LP | 2 | 3 | 1 | 0 | 3 | 9/40 | 23% | |
. | Average . | ||||||||
Avg. motor commands | Simulated | LP | 432 | 453 | 436 | – | 454 | 444 | |
Real world | LP | 442 | 461 | 420 | – | 440 | 441 | ||
. | Total . | Average . | |||||||
Corners reached | Simulated | LP | 3134 | 2830 | 2098 | 2001 | 2666 | 12729/20000 | 64% |
Real world | LP | 21 | 17 | 18 | 18 | 26 | 100/160 | 63% |
. | . | . | Controller . | Total . | Average . | ||||
---|---|---|---|---|---|---|---|---|---|
1 . | 2 . | 3 . | 4 . | 5 . | |||||
Successful evaluations | Simulated | LP | 407 | 371 | 127 | 0 | 175 | 1080/5000 | 22% |
Real world | LP | 2 | 3 | 1 | 0 | 3 | 9/40 | 23% | |
. | Average . | ||||||||
Avg. motor commands | Simulated | LP | 432 | 453 | 436 | – | 454 | 444 | |
Real world | LP | 442 | 461 | 420 | – | 440 | 441 | ||
. | Total . | Average . | |||||||
Corners reached | Simulated | LP | 3134 | 2830 | 2098 | 2001 | 2666 | 12729/20000 | 64% |
Real world | LP | 21 | 17 | 18 | 18 | 26 | 100/160 | 63% |
Considering Figure 20, the corrections made to the LP data were considerable in several cases, for example, the top left correction of the second path. However, these LP paths resemble the actual path of the robot more closely than those shown in Figure 18. Furthermore, these corrections made the LP information more reliable, thus allowing the evolution of controllers that relied more on this information, enabling them to solve this task.
10 Conclusion
This study demonstrated, in a number of navigation scenarios, that evolved controllers can make effective use of simulated location information to control a real-world robot. Providing a sense of location resulted in more consistent production of superior controllers in a simple double T maze scenario. As long as a light-weight simulator is used, more advanced controllers can be created with few additional processing requirements and minimal human intervention. Various experiments also demonstrated that controllers were unable either to learn the navigation task or to simulate their own location information simultaneously.
The benefits of the proposed approach were further demonstrated by increasing the complexity of the navigation task. The robot was started at random locations in the maze and was required to return to the beginning. The location perception controllers consistently outperformed their counterparts. A most interesting fact is that location perception controllers developed intelligent behaviors not seen in normal controllers and were also able to complete the task in less time. Once again it was encouraging that the benefits of the proposed approach were also evident in the real-world experiments, not just in simulation.
The most obvious potential problem with using simulated inputs is that the real and the simulated environment will never be exactly the same. Differences will also compound over time, thus rendering simulated inputs less and less useful. The limits of the proposed approach were tested by further increasing the complexity of the navigation task. The robot now had to visit all corners of the double T maze in an order specified at run time. This is a challenging task, which has not been attempted in evolutionary robotics before. This task highlighted the divergence of the location perception data over time. In certain cases, the location perception data diverged to the point that it no longer remotely represented the robot's position or orientation. This result indicated that the proposed approach would only be useful for short-term tasks. However, it was shown that by periodically resetting simulated information at checkpoints in the maze, controllers could be evolved that could successfully complete the task. Regular controllers appeared unable to effectively solve this task, which is possibly why it does not appear in previous literature.
Attempts at improving the evolutionary robotics process often involve altering the evolutionary procedure or controller architectures. These changes can be both time-consuming and incompatible with one another. Provided one has access to a simulator (preferably a simulator neural network), the process of adding location perception information to controllers does not necessarily involve much effort. Furthermore, this implementation does not result in any significant change to the typical evolutionary robotics process. Thus, this should allow this approach to be compatible with any previously investigated improvement(s) to the evolutionary robotics process. Finally, while this study did make use of an evolutionary robotics approach to create controllers, this approach to generating location perception information need not be limited to the evolutionary robotics field. For example, the field of simultaneous localization and mapping, while not directly related to evolutionary robotics, could also make use of simulator neural networks to approximate the effects of a controller's commands on a robot's position.
Acknowledgments
This work is based on research supported wholly by the National Research Foundation (NRF) of South Africa (grant number: 117128).