## Abstract

Taking inspiration from the navigation ability of humans, this study investigated a method of providing robotic controllers with a basic sense of position. It incorporated robotic simulators into robotic controllers to provide them with a mechanism to approximate the effects their actions had on the robot. Controllers with and without internal simulators were tested and compared. The proposed controller architecture was shown to outperform the regular controller architecture. However, the longer an internal simulator was executed, the more inaccurate it became. Thus, the performance of controllers with internal simulators reduced over time unless their internal simulator was periodically corrected.

## 1 Introduction

Evolutionary robotics (ER) is a machine learning paradigm that aims to automatically create robot topologies (hardware) and robot controllers (software) through the use of optimization algorithms inspired by biological evolution. ER topologies and controllers are created with minimal human intervention: humans typically only have to specify a fitness function, which measures how well the evolved robots solve the goal task. Since the demonstration of the viability of ER in the late 1990s [15, 22, 28], ER has been shown to be able to produce unexpected and ingenious robot designs and behaviors for complex tasks that human engineers may have found challenging [13, 31, 47]. ER has also been shown to discover solutions to problems that mimic behavior in the biological world [12, 14, 22].

Robot designs are often hampered by the availability of environmental information and the quality of sensors that provide information about the environment [17]. One of the strengths of ER is that it can be used to produce intelligent (and often surprising) controllers to overcome the physical limitations of the hardware and environment [5, 26]. This article investigates how ER can be employed to create controllers that make use of something similar to the unconscious perception of self-location that humans possess. Additional controller inputs will be generated by a simulator, which will provide the robot with information about its location without actually obtaining real location information from the environment. As a blindfolded human is still partially able to navigate in a familiar space through their sense of position, the goal of this research is to show that simulated location data will enable the development of more effective controllers as well as behaviors that would normally not be possible. A notable feature of this approach is that the only changes to the ER process it requires are software-related. It is proposed that controllers can be provided with an approximation of the robot's state without the need of additional hardware tracking or sensors. This approach is tested by performing various tests in the double T maze to compare the performances of regular and augmented controllers. These experiments demonstrated the superior navigation abilities and behaviors of the augmented controllers.

The rest of this article is structured as follows: Section 2 discusses knowledge relevant to this study, Section 3 introduces the approach explored in it, Section 4 discusses how the study was carried out and the equipment used, Sections 59 provide the details of all the experiments performed, and finally Section 10 provides a conclusion on the results of the study.

## 2 Related Work

In the following subsections, work relevant to this study is discussed. Section 2.1 explains the topic of ER, and Section 2.2 continues this discussion with specific focus on navigation. A brief summary of a concept related to this work, proprioception, is provided in Section 2.3. Finally, Section 2.4 discusses works related to ER and proprioception.

### 2.1 Evolutionary Robotics

Evolutionary robotics (ER) [28] is a field concerned with developing robotic controllers and architectures through the use of evolutionary techniques. A typical example of ER would be using an evolutionary algorithm (EA) [49] to develop a robotic controller to perform a task. This would involve evaluating a population of randomly created robotic controllers, selecting a subset of the controllers based on their performance, and using these to create a new population. This process continues until a satisfactory controller is created or until another stopping criterion is met.

Due to the nature of the ER process, its duration is highly dependent on the duration of evaluations. If evaluations require supervision and/or human intervention, this will further lengthen the duration. Additionally, while controllers are being evaluated, it is possible that they will attempt actions that are damaging to the robot [36]. For these and other reasons, it can be beneficial to perform evaluations in simulation. That is, as opposed to issuing commands to an actual robot and recording the result, a simulator can be used to predict this result. This allows evaluations to be conducted faster. Furthermore, issues surrounding the robot—for example, mechanical wear or power sources—can be ignored. However, simulation presents new issues. To execute a robot in simulation, one will require a simulator. Furthermore, no simulator will predict the outcome of a robot's actions with complete accuracy. This can mislead the ER process and lead to the situation where controllers perform well in simulation but fail in the real world [29]. This problem is referred to as the reality gap.

Research has been conducted [36, 37, 38, 48] that has attempted to simplify and improve the process of creating simulators. As opposed to using a mathematics- and physics-based approach, these studies used neural networks (NNs) to create simulators. Physical robots were given various commands, and the outcomes of these commands were recorded. This data was used to train NNs, which then acted as simulators. These NNs were termed simulator neural networks (SNNs). Such simulators have been used and demonstrated to be effective [39, 40, 47]. This process of simulator creation requires less knowledge of a robot's dynamics than a mathematics-based approach. Furthermore, these SNNs are considered lightweight NNs. That is, they do not consist of many neurons and can be executed quickly with basic computer hardware.

### 2.2 Navigation in Evolutionary Robotics

Navigational tasks are typical of evolutionary robotics [11, 21, 27], as the ability to navigate within an environment, based on cues, is a common requirement of many robots. A common navigation benchmark involves an environment known as the double T maze (DTM). The typical method of using this maze as a benchmark is to place the robot at the starting point and then instruct it to navigate to a specific corner. The DTM, and variants of it, have been used in a similar manner as benchmarks in behavioral psychology [7, 43] and ER [1, 4, 6, 19, 34] in the past. In ER, one of the complexities of this task is the requirement that a single controller must be able to navigate from the start to any of the four corners on command. Additionally, the robot's actions are nondeterministic, and the sensors it uses to detect walls are noisy. This task and variants of it have been solved previously in ER with varying levels of success.

Previous studies have shown how controllers can solve the task by exploiting basic behaviors. Carvalho and Nolfi [4] showed how ER controllers may exploit their environment to reduce the cognitive requirements of the task. They demonstrated that, when possible, controllers tended to place themselves in positions and orientations that informed their later decisions, as opposed to remembering the necessary information. This behavior was referred to as cognitive offloading. When the environment in which the task was conducted was adapted so that these controllers could not exploit the environment as they did previously, their performance was significantly reduced. Introducing noise into the robot's position and angle also prevented the development of successful solutions. However, when the added noise was only present in a portion of trials, it was shown that controller performance improved. These controllers still developed cognitive offloading strategies, but also displayed a cognitive ability that allowed them to solve the task even in the presence of noise.

Duarte et al. [6] solved a variant of the DTM task through hierarchical evolution. A process in which a task is broken down into subtasks and controllers was developed for each. While there are advantages to this approach, it strays from ER's idea of automatic controller generation, requires human intervention, and does not allow the development of completely emergent behaviors. Blynel and Floreano [1] used incremental evolution to solve a variant of the DTM task by seeding a population with controllers that had solved a similar but simpler one. This process was used because direct evolution failed to produce successful controllers. Controllers were not told which corner they were intended to reach. Instead, if a controller navigated to a corner, the evaluation was stopped and the controller was informed if the corner it reached was the correct one or not. The controller was then repeatedly executed and was expected to recall, from its first evaluation, which corner was the correct corner. There was however the exception that only the top left or top right corner could be selected as correct. When attempting to switch which of the corners was correct during controller trials, successful controllers were not produced.

Paine and Tani [33] used two fully connected continuous-time recurrent neural networks as robotic controllers. The bottom network was intended to encode primitive motor commands, such as turning left or right or moving in a straight line. The top network was responsible for controlling the activities of the bottom network through a set of control neurons. The top network's behavior was controlled by two task neurons. The internal values of these neurons were initialized with different values to have the network generate, via the lower-level network, different patterns of primitive motor commands. The bottom network was first trained in a single T maze. Then, using the trained bottom network, the top network, as well as the initial task neuron values, was trained in a maze similar to the DTM. This maze could be described as a DTM where each of the goal destinations was split into two corridors for a total of eight goal destinations. Controllers that could reliably reach seven of the eight goals were produced.

Using a standard ER process to train consistently successful controllers to perform the typical DTM task still remains a challenging problem. Furthermore, while Duarte et al. [6] have solved a task that involves a controller returning to the beginning of the DTM, no previous study has investigated a less idealized and more realistic version of this task. This study investigates such a scenario, the typical DTM task and a more complex variant of it.

### 2.3 Proprioception and Path Integration

Proprioception is the sense of body position and movement [45]. Proprioceptive sensations are the sensations of stimuli that arise from within an organism [24]. This information allows organisms to, among other things, better coordinate their physical actions. Proprioception is also part of the reason why one knows where the parts of one's body are, relative to one another. For example, humans know where their feet are, even while not directly observing them. This, in part, is what allows humans to climb stairs without focusing on the task.

While not the focus of this subsection, it is worth defining the term exteroceptive to differentiate it from proprioceptive. While proprioceptive senses relate to stimuli internal to an organism, exteroceptive senses relate to those that are external [23]. Examples of exteroceptive senses would be the traditional five: sight, hearing, touch, smell, and taste. Proprioception does rely on exteroceptive senses, such as ocular and tactile feedback. However, as its name suggests, proprioception also relies on proprioceptive senses. Populations of sensory neurons embedded in muscles, tendons, and joints provide information about the body's movement and position [45]. These proprioceptors, as they are called, appear in mammals as well as insects. They encode, among other information, muscle load, muscle length, and rate of change of muscle length [45]. The benefits this information provides are largely subconscious and easily taken for granted.

Proprioception helps humans to produce a representation of the physical state of their bodies. While proprioceptors do provide information on the body's physical state, this information is insufficient to determine the absolute location of the parts of the body [41]. Body schemas, and images, are internal representations of the body that humans develop and refine over time, using proprioceptive and other sensory information. These are what provide humans with a sense of limb position. How much of these representations are available to consciousness and how they are constructed are still not entirely clear [3, 41]. However, these representations assist humans in the unconscious movement and control of their bodies [41].

Path integration (PI) refers to the use of self-generated cues to maintain a representation of position and orientation in relation to a starting location [2]. One of these cues is the proprioceptive feedback received from locomotion. Typically, it is assumed that navigation involves the use of landmarks to calculate position and orientation relative to a goal. However, humans (and other animals) are able to begin a journey at a starting point and return to that same point without the use of external references [9]. This allows humans to continue navigation even in the absence of external cues. However, the use of external cues improves navigation, in that without these cues, the use of PI alone results in an accumulation of errors in both distance and direction [10].

### 2.4 Evolutionary Robotics and Proprioception

Previous research in ER includes investigations where proprioceptive information has been used. A selection of these works is discussed in Sections 2.4.12.4.4, and their relation to this study is discussed in Section 2.4.5.

#### 2.4.1 Object Categorization

Nolfi and Marocco tasked a robotic arm to categorize objects based on coarse touch sensor data [30]. The arm consisted of three segments, which together provided six degrees of freedom. A touch sensor was attached to the end of the arm. By moving the arm, controllers were required to categorize an object placed beneath the arm.

NNs were used as robotic controllers in this experiment, and the touch sensors formed part of the inputs of the NNs. However, due to how coarse the sensory information was, it alone would likely not be enough to categorize objects. Thus, controllers also received proprioceptive information regarding the physical pose of the robotic arm. This information was generated by sensors on the arm itself.

The NNs produced subsequent motor commands as well as an output that represented the category of the object it was classifying. A NN's commands were executed until a stopping condition was met, and at this point, the object category provided by the NN was considered its final decision. The produced solutions displayed the behaviors where the proprioceptive motor information was used, in conjunction with the coarse sensory information, to correctly classify objects.

#### 2.4.2 Reach-and-Grasp Development

Savastano and Nolfi performed an experiment in which it was attempted to develop reaching and grasping behaviors, similar to those of an infant, in a robot [42]. An iCub humanoid robot [25] was used in the experiment. It consisted of 53 motors, which could provide proprioceptive information regarding their physical state, as well as several external sensors.

The robot was placed in an upright position with a ball suspended in front of its head. The ball's position was varied, but was required to be within a certain angular range of the robot's head. The ultimate task assigned to the robot was for it to extend its right arm and grasp the ball. However, the training process was divided into phases. The task associated with each phase was progressively more difficult than the one before it. The task of the final phase was the ultimate task. These phases were intended to more closely resemble how an infant would go about learning this reaching and grasping behavior. This process was shown to produce better results than when using a single training phase.

The robotic controllers used were NNs in which both proprioceptive and exteroceptive information made up the inputs. The proprioceptive information consisted of the angular position of the motors within the right arm, torso, and head. The exteroceptive information consisted of inputs from the camera and the tactile sensors of the right hand of the robot.

The results of the experiment showed that it was possible to train this complex behavior in a robot and that this behavior closely resembled that of an infant. The experiment was performed in simulation; however, planned extensions to the work included testing trained controllers on physical hardware.

#### 2.4.3 Bodily Awareness in Soft Robots

The complex nature of the body of soft robots makes the task of describing the state of their body more difficult than in the case of hard robots [44]. Research has been attempted by Soter et al. to provide a soft robot, in particular a fabricated octopus-inspired arm, with a form of bodily awareness [44].

The arm could be described as a sandwich of flexible silicone with four bend sensors in the middle. The arm was connected, at its base, to a carriage that could move along a single axis. The motion of the carriage was used to excite the arm, and it moved according to a function that ensured that there was no repetition of motion during trials. Finally, the arm was suspended in clear water in a transparent container, which faced a camera.

The camera was used to capture the physical state of the arm at a specific point in time. Corresponding camera and sensor information was captured for several trials. Attempts to reproduce images based on sensor values were unsuccessful. This was due to the large number of pixels that made up the images, compared to the number of bend sensor values. Thus, a convolution neural network (CNN) encoder-and-decoder pair was trained. The encoder was used to reduce the dimensionality of the images. The decoder reproduced the original image from the low-dimensionality representation. A recurrent neutral network (RNN) was then trained to map the bend sensor values to the low-dimensionality representation. The output of the RNN was then fed into a decoder to produce an image. This approach proved successful.

The accuracy of the results of the experiment suggested that this approach was effective for providing a soft robot with a form of body awareness. Furthermore, this suggested that soft robots, despite their complexity, do have a low-dimensionality representation. However, this method of encoding a robot's state was not used to perform any task or enhance any existing robotic controller.

#### 2.4.4 The Double T Maze and Simulated Location Data

Preliminary research conducted by the authors is closely related to the current study [35]. The study attempted to provide a wheeled robot with an approximation of its own position and determine the benefit this provided. This information was provided to the robot by a SNN that produced simulated location data during the robot's evaluations.

The benchmark chosen to use for comparison was the double T maze. At the beginning of a trial, the robot was instructed to navigate to one of the four corners. If the robot took too long to complete its task, or navigated to the wrong corner, the trial was terminated. Robotic controllers that did and did not incorporate the additional information were evolved. These controllers were evaluated in both simulation and the real world. The augmented controllers outperformed their counterparts in both simulation and the real world. However, the performance difference observed between controller types in the real world was larger than the one observed in simulation. This suggested that the simulated location information may have been useful in reducing the effect of the reality gap.

The study did, however, use a basic NN architecture for its controllers and only performed a single experiment. It is possible that these factors could have biased performance in favor of the augmented controllers.

#### 2.4.5 Limitations of Existing Work

In the experiments described in Sections 2.4.12.4.3, proprioceptive information was provided directly from sensors on the robot. This study instead proposes that information of this kind could be generated by a robotic simulator, this approach thus being more akin to path integration in humans. Furthermore, none of these studies attempted to evaluate the usefulness of proprioceptive information in robotic controllers.

In the study described in Section 2.4.4, a SNN was incorporated into robotic controllers and used to approximate the robot's physical state. The information provided by the simulator was completely based on the robot's previous actions. No additional sensors were necessary to produce this information. This approach is what the current study will further investigate.

## 3 Proposed Approach

During the ER process, evolution is typically performed in simulation. These simulators take in a robot's state together with a command and produce the state that will result after the execution of that command. The premise of this research is that the simulator can also be used in the real world to approximate a non-simulated robot's state, thereby providing additional useful input to a robotic controller. For example, consider a robot that is placed in a known state and asked to execute a command. While this command is being executed, a simulator can approximate the resulting state of the robot. Unless the robot had sensors in place to sense its location, it would not otherwise have this information. This approximation can be provided as input into controllers and can allow more informed decisions to be made regarding the robot's subsequent commands. Furthermore, this information can also facilitate the evolution of more advanced controllers. To avoid any confusion with existing terms, the term location perception (LP) is used in this article to refer to this process of a robot approximating its own state by means of a robotic simulator. Furthermore, controllers that do not use this approach are called non-location-perception (NLP) controllers.

The suggested approach will require that robots be equipped with processors that will execute the simulator in order to calculate the approximate state. Furthermore, these processors will need to be powerful enough to execute the simulator fast enough such that a state can be approximated in less time than it takes to execute the same command in the real world. The goal is to provide the robot with LP information without forcing it to operate at a pace slower than usual. This suggests that in this case a SNN would be a good choice of simulator, as it has been shown in previous research that SNNs tend to be less computationally expensive than more typical mathematically based simulators [39]. If more typical types of simulators were used, it might not be possible to execute the simulator in real time on robotic hardware.

A potential shortfall of this approach concerns the accuracy of the LP information. A robot will execute a command, and the SNN will approximate the resulting state. However, the robot will not have the required sensors to confirm whether the approximated state is entirely correct or not. Despite this, the approximated state will need to be used in the calculation of the following approximation. This could result in compounding errors as more and more commands are executed by the robot. In reality, robotic commands are typically nondeterministic, and their results cannot be predicted exactly. Thus, even the most accurate simulator will suffer from this problem of compounding errors. However, the more accurate the simulator, the less severe this problem will be. This again suggests that SNNs would be a good choice of simulator, as they have been shown to be highly accurate [39]. Despite this accuracy, it is expected that over time the LP information will diverge from the real-world position to a point where the information will no longer be useful. An idea, investigated in this study, to address this problem involves periodically resetting the LP information when possible.

A motivation for the approach suggested in this study is that its implementation requires very little additional effort. The task that the robot needs to accomplish needn't be adapted, nor do any additional sensors need to be installed or used. The SNN will not need to be specifically sourced or created, as it would have already been required in order to carry out the ER process.

This article presents a comparison drawn between robotic controllers that receive LP information and those that do not. The additional information that will be provided to LP controllers robots will be simulated physical location information, namely, the robot's position and angle within an environment. It is hypothesized that the controllers that receive the LP information will develop more advanced, and ultimately superior, behaviors in comparison to their counterparts. These behaviors could include, but are not limited to, solving a task with a more efficient strategy, navigating more effectively, or solving a task that would be impossible without LP information.

Simultaneous localization and mapping (SLAM) involves placing a robot in an unknown environment, and having it build a map of this environment and simultaneously determine its position within it [8]. While the ability of a robot to accomplish this is no doubt useful in certain scenarios, such scenarios are not the focus of this research. This research does suggest a method of providing what could be considered a form of localization. However, it is not concerned with the mapping of an environment. Instead, its contribution lies in the benefit it can offer navigation solutions as well as the simplicity of its implementation.

## 4 Methodology

The goals of this research involve determining the usefulness of the simulated location information as well as determining if this information remains useful as task complexity is increased. Furthermore, it was mentioned in Section 3 that, due to the approach taken in this study, the LP information is expected to drift over time. The effect of this drift, and the potential methods to rectify it, will also be investigated.

This study will take an iterative approach to experimentation. The results of each experiment will influence the development of the following experiment. A total of five investigations were conducted, and their details are presented in Sections 59. Section 4.1 discusses the equipment used. Sections 4.2 and 4.3 specify details of the implementations of the EA and NNs used in this study. Section 4.4 states what data were collected during experimentation.

### 4.1 Experimental Setup

The Khepera III (Figure 1), a differentially wheeled robot, was the robot used to carry out the experiments [20]. The robot is equipped with nine infrared sensors that surround its body and measure the distance between the robot and its surroundings. The values these sensors produced were approximately in the range [0, 4000]. These sensor values were normalized to a range of [0, 1] before being used by controllers.

Figure 1.

The Khepera III robot.

Figure 1.

The Khepera III robot.

All experiments were conducted within a double T maze (DTM) (see Figure 2). To improve the consistency of the infrared sensors, the inner walls of the maze were lined with reflective tape. Furthermore, the robot was operated on a non-slip surface. These conditions were also simulated, to allow the ER process to be conducted in simulation. The simulator used had been developed and successfully used in previous research [38]. Noise was added to the outputs of the simulator to encourage controllers to learn more robust behaviors in anticipation of real world trials. The amount of noise applied was tweaked until satisfactory transfer of controllers from simulation to the real world was observed.

Figure 2.

The double T maze.

Figure 2.

The double T maze.

To collect the necessary data from experiments, the robot's actions during evaluations needed to be recorded. In simulation, the robot's exact state was known at all times. In the real world, this data was captured by a camera that overlooked the DTM. Image processing software was used to detect markers that were placed on top of the robot to determine its position [32]. However, this information was used only to record robot evaluations; actual location information was not provided to controllers.

### 4.2 Evolutionary Process

A standard EA was used to drive the evolutionary process. A population size of 100 was used. Tournament selection, with tournament sizes of 10, uniform crossover, and Gaussian mutation, with a standard deviation of 0.2 and mutation probability of 20%, were the chosen genetic operators. Elitism was incorporated, saving the best two chromosomes in each population. Chromosomes represented robotic controllers, which were simply long short-term memory (LSTM) networks. The number of generations used in the EA varied in each experiment, and these values are provided with each experiment's details.

### 4.3 Neural Network Architectures

All the controllers used in this study were neural-network-based controllers, specifically LSTM networks. LSTM networks were originally introduced by Hochreiter and Schmidhuber [18]. These networks were specifically designed to store information over many iterations. The LSTM architecture used in this study is similar to the one described by Gers and Schmidhuber [16].

Two controller architectures were compared, one that received LP information and one that did not (see Figure 3). Both architectures were LSTM networks with a single hidden layer. While not pictured in Figure 3, the hidden units were recurrent, as is typical in LSTM networks. Additionally, both architectures shared the characteristic that the motor commands they produced would be received as inputs the next time the network was fired. The only difference between these two architectures was that one received four additional inputs representing the LP information. The four inputs corresponded to the robot's x position, y position, and angle. However, the angular information was divided into two inputs, namely the sine and cosine of the angle. This normalization was performed to assist the NNs to better interpret the angle. The size of the hidden layer was varied for each experiment to allow for task complexity. In certain experiments, additional inputs were given to the networks; the details of these are provided with the experiment's own details (Sections 59). All NN weights were initialized using a uniform distribution with a standard deviation of 0.28 and a mean of 0.

Figure 3.

The standard neural network architectures.

Figure 3.

The standard neural network architectures.

### 4.4 Analysis

Controllers were evaluated in both simulation and the real world. The number of successful evaluations and the duration of the evaluations were recorded. This data allowed a quantitative comparison between the performance of the LP and NLP controllers. Furthermore, the paths taken by the robot in the real world were recorded. This allowed controller behaviors to be examined and compared qualitatively.

## 5 Investigation 1

This investigation offers an initial comparison between LP and NLP controllers. In the experiment of this section, robotic controllers were required to navigate the DTM in the typical manner. More specifically, controllers started at the beginning of the maze and were instructed, for the duration of their trials, to navigate toward a specific corner.

The robot was placed at the starting point of the maze (see Figure 2) and was instructed to reach one of the four corners. Controllers were informed which corner they were expected to navigate toward, via additional inputs into their NN. Controllers received four additional inputs, each corresponding to a corner. To instruct a controller to navigate to a chosen corner, the corner's corresponding input was set to a value of 1, while the rest were set to 0. These values did not change for the duration of the controller's evaluation. Once any of the four corners was reached, the evaluation was stopped, and a fitness score was calculated. Each corner had a 12-cm-radius zone associated with it, and if the robot entered this zone, the corner was considered reached.

### 5.2 Fitness Function

The robotic controllers were trained, in simulation, with an EA. The walls of the maze were simulated from a sensory point of view, but not physically. What is meant by that is that the infrared sensors on the robot were simulated as if the walls of the maze were present. However, the motion of the robot did not take account of the presence of the walls. This significantly simplifies the task of simulation. However, it meant that, in simulation, the robot could move through walls. This simplification was reasonable in that for a controller to efficiently solve this task it would have to avoid the walls; thus, collision need not be simulated.

Checkpoints were set at the intersections in the maze. A controller would need to pass through these checkpoints before reaching the instructed destination for the execution to be considered a success. In Figure 2 there are three intersections, namely the left, middle, and right intersections.

Consider the example where a controller was instructed to reach the top left corner of the maze. The controller would first need to pass through the middle intersection, followed by the left intersection, before finally navigating to the destination. This was worked into the fitness function through the terms D and P. Let D be the smallest distance (in centimeters) achieved between the robot and the current unreached intersection. In the case that all intersections had been reached, D would be the smallest distance achieved between the robot and the final destination. Let P be the sum of the distances (in centimeters) between all remaining checkpoints. For example, consider if a controller was instructed to reach the top right corner but instead did not move the robot at all and simply remained at its starting point. Then D would be equal to the distance between the robot and the middle intersection. P would be the distance between the middle intersection and the right intersection as well as the distance between the right intersection and the top right corner. Let R be the number of motor commands that were executed while outside the bounds of the maze. Recall that in simulation the robot could navigate through the walls of the maze. R was included in the fitness function to discourage this behavior. Let C be the total number of motor commands executed during a trial. This term was included to discourage controllers from moving slowly. Trained controllers typically required approximately 70 to 80 motor commands to navigate to an assigned corner. Additionally, trials were stopped if the robot navigated to any corner (even if the corner was incorrect) or if the number of motor commands executed exceeded a limit. To further discourage the controllers from colliding with the walls of the maze, K was included in the fitness function. After the execution of each motor command, the nine sensor values of the robot were read. Considering the tth motor command, these values were S1(t), S2(t), …, S9(t). Sensor values were approximately in the range of [0, 4000], where the larger the number, the closer the sensor was to an object. The following function was used to normalize these values:
$νSit=0,Sit<3000,Sit−30001000otherwise.$
Assuming nc motor commands were executed,
$K=∑t=1ncνmax1≤i≤9Sit.$
The final fitness function, to be minimized, was as follows:
$F=D+P+2R+C+2K.$

Thus, in order to optimize this fitness function, controllers would need to navigate to the intended goal, stay within the bounds of the maze, minimize their number of motor commands, and keep their distance from the walls.

### 5.3 Evaluation Criteria

A total of five controllers of each standard architecture (see Section 4.3) were evolved. The hidden layer of these controllers consisted of 20 LSTM cells. These controllers were evaluated in both simulation and the real world. Controller results were analyzed in terms of their number of successful trials. Furthermore, on successful trials, the number of motor commands necessary to complete the task was also analyzed. An evaluation was considered successful if a controller navigated to the indicated corner within a threshold number of motor commands. The motor command limit for this experiment was 150, and controllers were evolved for 3,000 generations.

### 5.4 Results

In simulation, each of the five controllers was evaluated 250 times per corner, for a total of 1,000 evaluations. In the real world each was evaluated twice per corner, for a total of eight evaluations. The results of these evaluations are given in Table 1. Figures 48 present five composite images of controller evaluations in the real world. The first four figures display successful evaluations of LP controllers. Figure 8 displays an unsuccessful evaluation of a NLP controller.

Table 1.
Result summary of controller evaluations for investigation 1.
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 917 837 960 692 725 4131/5000 83%
LP 949 888 918 935 919 4609/5000 92%
Real world NLP 30/40 75%
LP 40/40 100%
Average
Average number of motor commands Simulated NLP 77 77 76 78 76 77
LP 74 75 76 77 75 76
Real world NLP 72 76 77 81 74 76
LP 75 73 76 77 78 75
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 917 837 960 692 725 4131/5000 83%
LP 949 888 918 935 919 4609/5000 92%
Real world NLP 30/40 75%
LP 40/40 100%
Average
Average number of motor commands Simulated NLP 77 77 76 78 76 77
LP 74 75 76 77 75 76
Real world NLP 72 76 77 81 74 76
LP 75 73 76 77 78 75
Figure 4.

LP controller path I.

Figure 4.

LP controller path I.

Figure 5.

LP controller path II.

Figure 5.

LP controller path II.

Figure 6.

LP controller path III.

Figure 6.

LP controller path III.

Figure 7.

LP controller path IV.

Figure 7.

LP controller path IV.

Figure 8.

NLP controller path.

Figure 8.

NLP controller path.

From Table 1, overall, the LP controllers were more successful than their counterparts in both simulation and the real world. The performance difference in the real world was larger than the one in simulation. The LP controllers achieved a 100% success rate in the real world, while the NLP controllers only achieved 75%. This strongly supports the hypothesis that LP information enables the development of superior controllers. Furthermore, the success rates between NLP controllers fluctuated more than those of the LP controllers. For example, the difference in successful simulated evaluations between the best and worst NLP controllers was 268, or 26.8% of a controller's evaluations. The same statistic for LP controllers was 61, or 6.1%. This suggests that LP information allows for more consistent production of controllers.

The lower half of Table 1 shows the average number of motor commands performed by the controllers during an evaluation, if only successful evaluations are considered. The results within this portion of the table do not appear to show any significant difference between controller types.

## 6 Investigation 2

The previous investigation demonstrated that the proposed approach of providing simulated location information to a real-world robot does result in a more effective and robust controller. However, the argument can be made that the controller itself should be able to learn to produce the information provided by the simulator (especially since the simulator in this case is also a NN), thus making the suggested technique unnecessary. This section investigates the possibility that NLP controllers could learn their own form of location perception. The NLP architecture was extended to contain additional recurrent connections and also explicitly encouraged to simulate its own location information.

### 6.1 Extended Architectures

In order to allow NLP controllers to learn their own form of location perception, a new NN architecture was required (Figure 9). This architecture extended the one in Figure 3(a) by adding additional recurrent connections between the output and input layer. These connections were initialized with the robot's initial position. Furthermore, each of these outputs was accumulated before being passed to the input layer (with the exception of controller type E2, introduced below). That is, the value of one of these input neurons, on the controller's nth execution, would be equal to the sum of the previous n − 1 outputs of its corresponding output neuron. Thus, controllers were required to simulate the change in LP parameters and not the actual values.

Figure 9.

The extended neural network architecture.

Figure 9.

The extended neural network architecture.

Four types of controllers used this architecture; they are referred to as E1, E2, E3, and E4. As was the case in investigation 1, the hidden layer of all controllers consisted of 20 LSTM cells.

The first experiment served to prove that the controller architecture was in fact capable of simulating the controller outputs. Controller type E1 was created for this purpose. The outputs of this controller type were treated in the same way as the SNN used in this study. That is, this network had three additional outputs. These outputs corresponded to the x position, y position, and angle of the robot. However, the sine and cosine functions were applied to this angle before passing it as an input to the controller. Thus, this network still had four additional inputs. Since this experiment was only concerned with the question of whether the architecture could simulate robotic movements or not, the controller did not have any motor outputs. Instead, a LP controller from investigation 1 was executed repeatedly in the DTM. All the inputs that this controller experienced were passed to the E1 controller. The E1 controller was then required to output the changes in x position, y position, and angle that the robot would experience.

A custom fitness function was created for E1 controllers. Let the outputs of the SNN after the tth motor commands be s1(t), s2(t), s3(t). Furthermore, let the values of the controller's three additional recurrent outputs be c1(t), c2(t), c3(t). Finally, assuming nc motor commands were executed, Q was calculated as follows:
$Q=∑t=1nc∑i=13sit−cit.$

Controller type E2 was required to produce location information and motor commands and was the most basic and naive implementation tested. This controller did not have its outputs accumulated on its behalf, nor was its fitness function adjusted in any way. This network simply had four recurrent inputs and outputs, which were initialized with the robot's initial position. Thus, the network was not forced to simulate accurate location information in order to optimize the fitness function. This experiment served to determine if controllers would naturally develop their own form of location simulation. The same fitness function used in investigation 1 was used for E2.

Controller type E3 was created to investigate the performance of a controller that was specifically rewarded by the fitness function for accurately producing location information and had the benefit of having changes in location accumulated. E3 had its location outputs and fitness treated in the same way as type E1. However, this controller was also required to output motor commands, which would be trained in the same fashion as in investigation 1. The fitness function for E3 was the weighted sum of the fitness used in investigation 1, F, and the location fitness used in E1, Q, to give a combined fitness of
$FE3=F+Q/4.$
Preliminary investigations showed that this weighting produced much better results than weighting both components equally.
Finally, controller type E4 was created, which aimed to prevent premature convergence to a suboptimal navigation strategy before accurate location simulation ability is evolved. This controller type was the same in every way as type E3. However, initially the recurrent inputs of these controllers were overridden by the outputs of a separate (non-noisy) SNN. As evaluations progressed, the effect of this override was reduced. Let g be the current evolutionary generation, and G be the total number of generations. The value of the additional network inputs were equal to
$sit×G−g/G+cit×g/G.$
Thus, these controllers would still have to simulate the results of their motor commands in order to minimize the fitness function. However, in the early stages of evolution these controllers would not need to rely on their own undeveloped form of location perception.

### 6.2 Results

All controllers were tested in simulation. In the upper half of Table 2, the success rates of 1,000 simulated evaluations are shown for five controllers of each type. The results of the NLP controllers of investigation 1 have been included in the table for comparison. As E1 controllers acted only as simulators, no success rates appear for this controller type. The lower half of Table 2 presents the average simulation fitness, Q. A lower number represents better accuracy. Since NLP and E2 controllers did not incorporate simulation fitness, no results appear for these controller types.

Table 2.
Result summary of simulated NLP controller evaluations for investigation 2.
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 917 837 960 692 725 4131/5000 83%
E2 610 649 746 712 752 3469/5000 69%
E3 738 881 708 645 2972/5000 59%
E4 825 837 600 927 3189/5000 64%
Average
Average fitness Simulated E1 337 324 302 264 357 316
E3 396 373 416 401 137 345
E4 362 402 399 198 344 341
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 917 837 960 692 725 4131/5000 83%
E2 610 649 746 712 752 3469/5000 69%
E3 738 881 708 645 2972/5000 59%
E4 825 837 600 927 3189/5000 64%
Average
Average fitness Simulated E1 337 324 302 264 357 316
E3 396 373 416 401 137 345
E4 362 402 399 198 344 341

Figures 10 and 11 present two paths of E1 controllers. The black line represents the actual path of the controller, the red line represents the SNN output, and the blue line represents the path simulated by the controller. These figures indicate that the controller architecture was capable of simulating the robot's movements to a degree; thus the NN architecture is not fundamentally incapable of performing this task.

Figure 10.

E1 controller path 1.

Figure 10.

E1 controller path 1.

Figure 11.

E1 controller path 2.

Figure 11.

E1 controller path 2.

The imperfect location simulation is not surprising when considering that the SNN used in this work was trained using a large set of training data gathered from random robot movements [38]. During the evolution of E1 controllers, only a small set of movements are available to train the simulator (the movements resulting from the robot navigating a single path), thus providing very limited training data, which can easily lead to overfitting.

Controller type E2, which had no fitness function adjustments applied to it, did not perform as well as the regular NLP controllers of investigation 1. Visualizing the values of the additional recurrent outputs shows that these controllers did not use the outputs for the purpose of simulation. Figures 12 and 13 present two paths taken by E2 controllers. Thus, simply providing controllers with the necessary recurrent outputs did not result in them learning to simulate their own location information. In fact, it reduced the performance of controllers, possibly due to the more complex search space created as a result of the recurrent connections.

Figure 12.

E2 controller path 1.

Figure 12.

E2 controller path 1.

Figure 13.

E2 controller path 2.

Figure 13.

E2 controller path 2.

The first notable point about the E3 and E4 results was that a controller of each type achieved a success rate considerably less than E2 and the NLP controllers. Upon examining the behavior of these controllers, it became clear that the evolutionary algorithm (EA), which drove the training process, became stuck in a suboptimal area of the search space. During the initial stages of the evolution process the location simulation component of the fitness function is most easily minimized by the controller providing the robot with motor commands that result in minimal movement. So, as long as the controller's simulated outputs are small, the controller receives a near-optimal location simulation score. However, this prevents the development of controllers that could actually solve the task. The two controllers that achieved a success rate of 0 both displayed this suboptimal behavior.

The second notable point about the E3 and E4 results was that the simulation score of these controllers appeared to be similar to those of E1 controllers. This suggested that these controllers managed to (at least partially) simulate their actions; however, the success rates of these controllers were lower than those of the NLP controllers. Viewing the paths of these controllers offered an explanation for these results. Figures 14 and 15 show paths of E3 and E4 controllers that achieved success rates greater than 0. These controllers learnt to simulate a general path that would always receive a moderate simulation score, but did not actually represent the actual path of the robot. The success rates of these controllers were worse than those of NLP and E2 controllers, which suggested that even though controllers learnt to solve the task without simulation, forcing them to simulate their outputs ultimately reduced their ability to perform the actual task.

Figure 14.

E3 controller path.

Figure 14.

E3 controller path.

Figure 15.

E4 controller path.

Figure 15.

E4 controller path.

The results of the E1 controllers showed that the architecture was capable of simulating robotic movement. However, the results of controller types E3 and E4 showed that controller and simulator behaviors did not train simultaneously. This was a result of the EA exploiting simplistic, but suboptimal, solutions.

Preliminary experiments explored different weightings of the controller and simulator components of the fitness function. However, when one component was favored too heavily, only one behavior was optimized, and in turn it prevented the other behavior from evolving. If the simulation behavior was favored, this promoted the development of controllers that produced minimal movement. However, if controller behavior was favored, the simulation fitness component was overshadowed by the controller component. This led to controllers producing inaccurate simulation results. In this case, since the simulation outputs of the network did not benefit the controller, they ultimately acted as a distraction to the EA. However, the weighting of each fitness component was not the primary reason why these behaviors could not be evolved simultaneously; rather, the interdependence of the two components prevented simultaneous independent evolution.

A solution could be to train these two components separately: first the simulation component and then the controller component. This could be achieved by assigning a portion of the NN's weights to simulation and freezing these once trained, or creating a NN for each component. However, such an approach would be similar to the one explored in investigation 1.

## 7 Investigation 3

While the first investigation did indicate that location perception was beneficial to controllers, it was not obvious to what extent the controllers were using the LP data. This experiment involved a more complex task in which positional information would be more beneficial to controllers.

Controllers would be started at a random location in the left or right vertical corridor of the DTM. From this point the controllers would be required to navigate to the starting point of the DTM. If controllers were to solve this task as efficiently as possible, they would be required to understand their starting position and follow the shortest path to the starting location.

Due to the nature of this task, using the standard NLP architecture (Section 4.3) would not have allowed a fair comparison, the reason being that for LP controllers to function, they must have an initial point upon which to base their LP calculations. If LP controllers receive this information, then to allow fair comparison, NLP controllers should as well. For this reason, the NLP architecture used was the one described in Section 6.1. Considering the results of investigation 2, where controller type E1 performed worse than the standard NLP architecture, this may be seen as a poor choice of controller type. However, a preliminary experiment was conducted that compared the performance of regular NLP controllers with that of the type E1 controllers. It was confirmed that, for this experiment, controllers of type E1 performed better than controllers of the standard NLP architecture.

### 7.2 Fitness Function

The fitness function used in this experiment was similar to the one used in investigation 1 (Section 5). With reference to the previously used fitness function, the only terms that changed were D and P. The checkpoints that were used to calculate the values of these terms were adjusted to follow a path appropriate to this task.

### 7.3 Evaluation Criteria

Five controllers of each architecture were evolved in simulation. The hidden layer of these controllers consisted of 16 LSTM cells. Controller performance was analyzed in the same terms as previously, specifically, the number of successful trials and the number of motor commands on successful trials. The motor command limit of this experiment was 200, and controllers were evolved for 4,000 generations.

### 7.4 Results

In simulation, 500 evaluations were performed when starting in the left corridor and a further 500 evaluations when starting in the right corridor, totaling 1,000 evaluations per controller. In the real world, this was reduced to four evaluations per corridor, totaling eight evaluations per controller. The results of these evaluations appear in Table 3.

Table 3.
Result summary of controller evaluations for investigation 3.
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 233 532 302 561 510 2138/5000 43%
LP 997 957 909 971 917 4751/5000 95%
Real world NLP 13/40 33%
LP 25/40 63%
Average
Average number of motor commands Simulated NLP 137 121 133 123 118 126
LP 74 66 66 68 69 69
Real world NLP – – 138 117 120 125
LP 103 62 70 71 79 77
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 233 532 302 561 510 2138/5000 43%
LP 997 957 909 971 917 4751/5000 95%
Real world NLP 13/40 33%
LP 25/40 63%
Average
Average number of motor commands Simulated NLP 137 121 133 123 118 126
LP 74 66 66 68 69 69
Real world NLP – – 138 117 120 125
LP 103 62 70 71 79 77

According to Table 3, in both simulation and the real world, LP controllers significantly outperformed their counterparts on both performance measures. Figures 16 and 17 present a selection of paths taken by controllers in the real world. Figure 16 shows paths of NLP controllers, and Figure 17 shows paths of LP controllers.

Figure 16.

NLP controller paths. S ≡ starting position; G ≡ intended goal.

Figure 16.

NLP controller paths. S ≡ starting position; G ≡ intended goal.

Figure 17.

LP controller paths. S ≡ starting position; G ≡ intended goal.

Figure 17.

LP controller paths. S ≡ starting position; G ≡ intended goal.

These figures show that the LP controllers managed to generalize their behavior so that they could navigate directly toward the beginning of the maze, regardless of the side of the maze they started from. However, considering the top left and bottom right paths of Figure 16, certain NLP controllers learnt a simple wall-following behavior. This behavior led to a successful result when starting on the left side of the maze. When starting on the right side, however, the evaluation was unsuccessful. The reason is that following a wall may, depending on your starting point, lead you to the opposite corridor and not the start of the maze. In these cases, controllers reached the maximum number of motor commands and the evaluation was considered a failure. In the other paths pictured in Figure 16, certain NLP controllers had more efficient behaviors where they did not simply follow a wall on a certain side. However, these paths were jagged and indecisive compared to the paths in Figure 17. Furthermore, Table 3 shows that NLP controllers always used, on average, more motor commands than LP controllers. This is evidence that the LP information enabled LP controllers to develop superior behaviors.

## 8 Investigation 4

At this point, LP controllers had been able to complete all investigated tasks to an acceptable degree. This experiment looked to test the limits of the controllers. Not only was the task of this experiment more complicated, it was also longer in duration than the previous tasks. As was previously highlighted, it was expected that LP information would diverge over time. This experiment served to test the usefulness of location perception over a longer duration.

Controllers were required to visit all four corners of the DTM, in a specified order, in one evaluation. However, to further complicate the task, each controller was required to navigate four distinct orders of corners. The four corner patterns were as follows:

• •

Bottom left, top right, top left, bottom right

• •

Bottom right, top left, top right, bottom left

• •

Top left, bottom right, bottom left, top right

• •

Top right, bottom left, bottom right, top left

Once a controller had navigated to the correct corner in a pattern, it would then be allowed to navigate to the next one. If a controller navigated to an incorrect corner, the evaluation would stop.

### 8.2 Fitness Function

The fitness function used a checkpoint system similar to the one described in Section 7.2. Checkpoints were set at the intersections or corners in an order that controllers were expected to follow. For example, consider the first pattern of corners in Section 8.1. The order of checkpoints would be: middle intersection, left intersection, bottom left corner, left intersection, middle intersection, right intersection, and so on. Besides this, the factors of the fitness function remain unchanged.

### 8.3 Setup

Controllers were informed of which pattern they were expected to follow via four additional inputs into their NNs. Each input corresponded to a different pattern. The values of these inputs did not change for the duration of the execution. Since the initial position and orientation of the robot in this experiment were not randomized as they were in investigation 3, none of the extended architectures discussed in Section 6.1 were used. Thus, the only two network types that were trained and tested were those discussed in Section 4.3.

### 8.4 Evaluation Criteria

As was the case previously, controller performance was examined in terms of successful evaluations and number of motor commands on successful evaluations. However, due to the complexity of this task, it was not expected that controllers would succeed at the task often. To address this, the number of correct corners reached during evaluations was also recorded. The motor command limit of this experiment was 500, and controllers were evolved for 5,000 generations. The hidden layer of these controllers consisted of 36 LSTM cells.

### 8.5 Results

In simulation, each controller was required to perform each pattern 250 times, totaling 1,000 evaluations per controller. In the real world, this was reduced to twice per pattern, totaling eight evaluations per controller.

From Table 4, the successful evaluation and motor command metrics do not describe controller performance as accurately as previously. This is because most controllers only learnt to solve the task partially. But since the first two metrics of Table 4 do not award partial success, it appears that every controller was a failure. The complexity of the task required of these controllers should be stressed. Controllers were required to learn four patterns as well as navigate these patterns on demand, using a robot that behaved nondeterministically while avoiding walls, which were detected by noisy sensors. Thus, the poor performance of controllers is not surprising.

Table 4.
Result summary of controller evaluations for investigation 4.
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 0/5000 0%
LP 23 40 43 122/5000 2%
Real world NLP 0/40 0%
LP 0/40 0%
Average
Average number of motor commands Simulated NLP – – – – – –
LP 488 462 453 462 459 465
Real world NLP – – – – – –
LP – – – – – –
TotalAverage
Corners reached Simulated NLP 1423 341 1212 901 1233 5110/20000 26%
LP 1803 2127 1789 1757 1935 9411/20000 47%
Real world NLP 11 11 34/160 21%
LP 17 11 12 15 61/160 38%
ControllerTotalAverage
12345
Successful evaluations Simulated NLP 0/5000 0%
LP 23 40 43 122/5000 2%
Real world NLP 0/40 0%
LP 0/40 0%
Average
Average number of motor commands Simulated NLP – – – – – –
LP 488 462 453 462 459 465
Real world NLP – – – – – –
LP – – – – – –
TotalAverage
Corners reached Simulated NLP 1423 341 1212 901 1233 5110/20000 26%
LP 1803 2127 1789 1757 1935 9411/20000 47%
Real world NLP 11 11 34/160 21%
LP 17 11 12 15 61/160 38%

Neither controller type demonstrated any ability to solve the task entirely. However, while NLP controllers had no successful evaluations at all, LP controllers did have rare successful evaluations in simulation.

By considering the number of corners reached, it is possible to better compare the two controller types. This metric indicates that controllers did partially solve the task. In this case, the NLP controllers were, again, outperformed by their LP counterparts.

As mentioned previously, as evaluations progress, LP information becomes less reliable, as it bases its approximations on the most recent approximation. This experiment also served to test the usefulness of LP data over a longer duration. In Figure 18, the actual path of the robot is shown by the black line, and the LP path shown by the red line. From this figure, it can be seen that, in these cases, the LP data diverged from the real-world position to the point where it was likely no longer useful to the controller. The LP information started out accurately but became worse as more motor commands were executed, as was expected. Consider the top right path of Figure 18. In the beginning of this path, the LP data was relatively accurate. However, after the second turn, the LP path was, at times, outside the bounds of the maze, not near any possible location the robot could have been at, and not in an accurate orientation. In this case, it is likely that the LP information was more misleading than beneficial to the controller. As time passes and more motor commands are executed, the LP data diverges and the advantage that LP controllers have over their counterparts diminishes. This is an inherent limitation of this approach. Investigation 5 introduces a method of reducing this divergence and examines the performance of LP controllers under these conditions.

Figure 18.

LP controller paths with LP data.

Figure 18.

LP controller paths with LP data.

## 9 Investigation 5

In investigation 4, no controller was able to consistently solve the entire task. While the LP controllers outperformed their counterparts, they still did not entirely solve the task. Upon inspection of the LP data, it was seen that as evaluations progressed, the data diverged to the point where it became arguably useless or even misleading. This divergence is an unfortunate result of the nondeterministic nature of the robot actions. Motor commands do not always have the same effect on the robot, and infrared sensors do not always produce the same values in the same conditions. Since the SNN bases its calculations on its past approximations, any errors it makes compound on one another.

The approach to reducing this divergence that this study will investigate involves periodically correcting the LP information. An example of such an approach would be if one had to imagine navigating around a room while blindfolded. As time passes and one moves around, one's sense of position relative to the room diverges. However, briefly removing the blindfold and seeing the surrounding room would reposition one's internal sense of position. An example more relevant to a robot would be a robot navigating around a factory. At points in the factory where the robot's exact position is crucial, there could be sensors in place that could detect the robot's position. This information could be used to override the robot's LP position. But between these points in the factory, where precise positioning is not necessary, the robot could rely on just its LP position.

This investigation carried out the same experiment as that of investigation 4. However, only LP controllers were examined, and furthermore, these controllers had their LP data periodically reset. This was accomplished by marking each corner of the maze as a reset zone (the gray areas in Figure 19). When controllers were in these reset zones, their LP position would be overridden by their actual position. The real-world position of the robot was obtained from a camera that overlooked the environment in which the controllers were evaluated. The remaining details of this experiment did not differ from investigation 4.

Figure 19.

The double T maze with indicated reset zones.

Figure 19.

The double T maze with indicated reset zones.

### 9.2 Results

Compared to the results of investigation 4, Table 5 shows an improvement in all regards. It can be seen from the number of successful evaluations that not all controllers managed to solve the task entirely. Previously, however, no controller managed to do so. Furthermore, even the controllers that did not solve the task consistently, or at all, performed better, on the number of corners reached, than did the controllers in investigation 4. In Figure 20, two successful real-world evaluations are pictured. In this figure, the LP corrections are depicted by the blue arrows.

Table 5.
Result summary of controller evaluations for investigation 5.
ControllerTotalAverage
12345
Successful evaluations Simulated LP 407 371 127 175 1080/5000 22%
Real world LP 9/40 23%
Average
Avg. motor commands Simulated LP 432 453 436 – 454 444
Real world LP 442 461 420 – 440 441
TotalAverage
Corners reached Simulated LP 3134 2830 2098 2001 2666 12729/20000 64%
Real world LP 21 17 18 18 26 100/160 63%
ControllerTotalAverage
12345
Successful evaluations Simulated LP 407 371 127 175 1080/5000 22%
Real world LP 9/40 23%
Average
Avg. motor commands Simulated LP 432 453 436 – 454 444
Real world LP 442 461 420 – 440 441
TotalAverage
Corners reached Simulated LP 3134 2830 2098 2001 2666 12729/20000 64%
Real world LP 21 17 18 18 26 100/160 63%
Figure 20.

LP controller paths with corrected LP data.

Figure 20.

LP controller paths with corrected LP data.

Considering Figure 20, the corrections made to the LP data were considerable in several cases, for example, the top left correction of the second path. However, these LP paths resemble the actual path of the robot more closely than those shown in Figure 18. Furthermore, these corrections made the LP information more reliable, thus allowing the evolution of controllers that relied more on this information, enabling them to solve this task.

## 10 Conclusion

This study demonstrated, in a number of navigation scenarios, that evolved controllers can make effective use of simulated location information to control a real-world robot. Providing a sense of location resulted in more consistent production of superior controllers in a simple double T maze scenario. As long as a light-weight simulator is used, more advanced controllers can be created with few additional processing requirements and minimal human intervention. Various experiments also demonstrated that controllers were unable either to learn the navigation task or to simulate their own location information simultaneously.

The benefits of the proposed approach were further demonstrated by increasing the complexity of the navigation task. The robot was started at random locations in the maze and was required to return to the beginning. The location perception controllers consistently outperformed their counterparts. A most interesting fact is that location perception controllers developed intelligent behaviors not seen in normal controllers and were also able to complete the task in less time. Once again it was encouraging that the benefits of the proposed approach were also evident in the real-world experiments, not just in simulation.

The most obvious potential problem with using simulated inputs is that the real and the simulated environment will never be exactly the same. Differences will also compound over time, thus rendering simulated inputs less and less useful. The limits of the proposed approach were tested by further increasing the complexity of the navigation task. The robot now had to visit all corners of the double T maze in an order specified at run time. This is a challenging task, which has not been attempted in evolutionary robotics before. This task highlighted the divergence of the location perception data over time. In certain cases, the location perception data diverged to the point that it no longer remotely represented the robot's position or orientation. This result indicated that the proposed approach would only be useful for short-term tasks. However, it was shown that by periodically resetting simulated information at checkpoints in the maze, controllers could be evolved that could successfully complete the task. Regular controllers appeared unable to effectively solve this task, which is possibly why it does not appear in previous literature.

Attempts at improving the evolutionary robotics process often involve altering the evolutionary procedure or controller architectures. These changes can be both time-consuming and incompatible with one another. Provided one has access to a simulator (preferably a simulator neural network), the process of adding location perception information to controllers does not necessarily involve much effort. Furthermore, this implementation does not result in any significant change to the typical evolutionary robotics process. Thus, this should allow this approach to be compatible with any previously investigated improvement(s) to the evolutionary robotics process. Finally, while this study did make use of an evolutionary robotics approach to create controllers, this approach to generating location perception information need not be limited to the evolutionary robotics field. For example, the field of simultaneous localization and mapping, while not directly related to evolutionary robotics, could also make use of simulator neural networks to approximate the effects of a controller's commands on a robot's position.

## Acknowledgments

This work is based on research supported wholly by the National Research Foundation (NRF) of South Africa (grant number: 117128).

## References

1
Blynel
,
J.
, &
Floreano
,
D.
(
2003
).
Exploring the T-Maze: Evolving learning-like robot behaviors using CTRNNS
. In
Workshops on Applications of Evolutionary Computation
(pp.
593
604
).
Berlin, Heidelberg
:
Springer
.
2
Byrne
,
J. H.
(
2017
).
Learning and memory: A comprehensive reference
.
New York
:
.
3
Carruthers
,
G.
(
2008
).
Types of body representation and the sense of embodiment
.
Consciousness and Cognition
,
17
(
4
),
1302
1316
.
4
Carvalho
,
J. T.
, &
Nolfi
,
S.
(
2016
).
.
PLOS ONE
,
11
(
8
),
e0160679
.
5
Cully
,
A.
,
Clune
,
J.
,
Tarapore
,
D.
, &
Mouret
,
J.-B.
(
2015
).
Robots that can adapt like animals
.
Nature
,
521
(
7553
),
503
.
6
Duarte
,
M.
,
Oliveira
,
S.
, &
Christensen
,
A. L.
(
2012
).
Hierarchical evolution of robotic controllers for complex tasks
. In
2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL)
(pp.
1
6
).
New York
:
IEEE
.
7
Dudchenko
,
P. A.
(
2001
).
How do animals actually solve the T maze?
Behavioral Neuroscience
,
115
(
4
),
850
.
8
Durrant-Whyte
,
H.
, &
Bailey
,
T.
(
2006
).
Simultaneous localization and mapping: Part I
.
IEEE Robotics & Automation Magazine
,
13
(
2
),
99
110
.
9
Etienne
,
A. S.
, &
Jeffery
,
K. J.
(
2004
).
Path integration in mammals
.
Hippocampus
,
14
(
2
),
180
192
.
10
Etienne
,
A. S.
,
Maurer
,
R.
, &
Séguinot
,
V.
(
1996
).
Path integration in mammals and its interaction with visual landmarks
.
Journal of Experimental Biology
,
199
(
1
),
201
209
.
11
Fernandez-Leon
,
J. A.
,
Acosta
,
G. G.
, &
Mayosky
,
M. A.
(
2009
).
Behavioral control through evolutionary neurocontrollers for autonomous mobile robot navigation
.
Robotics and Autonomous Systems
,
57
(
4
),
411
419
.
12
Floreano
,
D.
, &
,
F.
(
1996
).
Evolution of homing navigation in a real mobile robot
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
,
26
(
3
),
396
407
.
13
Floreano
,
D.
, &
,
F
. (
1996
).
Evolution of plastic neurocontrollers for situated agents
. In
P.
Maes
,
M.
Mataric
,
J. A.
Meyer
,
J.
Pollack
, &
S. W.
Wilson
(Eds.),
From Animals to Animats, Proceedings of The Fourth International Conference on Simulation of Adaptive Behavior (SAB)
.
Cambridge, MA
:
MIT Press
.
14
Floreano
,
D.
, &
,
F.
(
1998
).
Evolutionary neurocontrollers for autonomous mobile robots
.
Neural Networks
,
11
(
7–8
),
1461
1478
.
15
Floreano
,
D.
, &
Nolfi
,
S.
(
1997
).
God save the Red Queen! Competition in co-evolutionary robotics
. In
J. R.
Koza
,
K.
Deb
,
M.
Dorigo
,
D. B.
Fogel
,
M.
Garzon
,
H.
Iba
, &
R. L.
Rido
(Eds.),
Genetic Programming 1997: Proceedings of the Second Annual Conference
(pp.
398
406
).
San Francisco
:
Morgan Kaufmann
.
16
Gers
,
F. A.
, &
Schmidhuber
,
J.
(
2000
).
Recurrent nets that time and count
. In
Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural computing: New challenges and perspectives for the new millennium
,
vol. 3
(pp.
189
194
).
New York
:
IEEE
.
17
Harvey
,
I.
,
Husbands
,
P.
,
Cliff
,
D.
, et al
(
1992
).
Issues in evolutionary robotics
.
School of Cognitive and Computing Sciences, University of Sussex
.
18
Hochreiter
,
S.
, &
Schmidhuber
,
J.
(
1997
).
Long short-term memory
.
Neural Computation
,
9
(
8
),
1735
1780
.
19
Jakobi
,
N.
(
1997
).
Half-baked, ad-hoc and noisy: Minimal simulations for evolutionary robotics
. In
Fourth European Conference on Artificial Life
,
vol. 4
(p.
348
).
Cambridge, MA
:
MIT Press
.
20
K-Team Corporation
. (
n.d.
).
Khepera III
. .
21
León
,
J. A. F.
,
Tosini
,
M.
, &
Acosta
,
G. G.
(
2004
).
Evolutionary reactive behavior for mobile robots navigation
. In
IEEE Conference on Cybernetics and Intelligent Systems, 2004
,
vol. 1
(pp.
532
537
).
New York
:
IEEE
.
22
Maes
,
P.
,
Mataric
,
M. J.
,
Meyer
,
J.-A.
,
Pollack
,
J.
, &
Wilson
,
S. W.
(
1996
).
Co-evolution of pursuit and evasion II: Simulation methods and results
. In
From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior
(pp.
506
515
).
Cambridge, MA
:
MIT Press
.
23
Merriam-Webster
. (
n.d.
).
Definition of exteroceptive
. .
24
Merriam-Webster
. (
n.d.
).
Definition of proprioceptive
. .
25
Metta
,
G.
,
Sandini
,
G.
,
Vernon
,
D.
,
Natale
,
L.
, &
Nori
,
F.
(
2008
).
The iCub humanoid robot: An open platform for research in embodied cognition
. In
Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems
(pp.
50
56
).
New York
:
ACM
.
26
Mitri
,
S.
,
Floreano
,
D.
, &
Keller
,
L.
(
2009
).
The evolution of information suppression in communicating robots with conflicting interests
.
Proceedings of the National Academy of Sciences of the USA
,
106
(
37
),
15786
15790
.
27
Nelson
,
A. L.
,
Grant
,
E.
,
Galeotti
,
J. M.
, &
Rhody
,
S.
(
2004
).
Maze exploration behaviors using an integrated evolutionary robotics environment
.
Robotics and Autonomous Systems
,
46
(
3
),
159
173
.
28
Nolfi
,
S.
(
1997
).
Evolving non-trivial behaviors on real robots: A garbage collecting robot
.
Robotics and Autonomous Systems
,
22
(
3–4
),
187
198
.
29
Nolfi
,
S.
, &
Floreano
,
D.
(
2000
).
Evolutionary robotics: The biology, intelligence, and technology of self-organizing machines
.
Cambridge, MA
:
MIT Press
.
30
Nolfi
,
S.
, &
Marocco
,
D.
(
2002
).
Active perception: A sensorimotor account of object categorization
. In
From Animals to Animats 7: Proceedings of the VII International Conference on Simulation of Adaptive Behavior
(pp.
266
271
).
Cambridge, MA
:
MIT Press
.
31
Nolfi
,
S.
, &
Parisi
,
D.
(
1996
).
Learning to adapt to changing environments in evolving neural networks
.
,
5
(
1
),
75
98
.
32
OpenCV
. (
n.d.
).
Open source computer vision library
.
Retrieved October 16, 2019, from https://opencv.org/about
.
33
Paine
,
R. W.
, &
Tani
,
J.
(
2004
).
Motor primitive and sequence self-organization in a hierarchical recurrent neural network
.
Neural Networks
,
17
(
8–9
),
1291
1309
.
34
Paine
,
R. W.
, &
Tani
,
J.
(
2005
).
How hierarchical control self-organizes in artificial adaptive systems
.
,
13
(
3
),
211
225
.
35
Phillips
,
A. P.
, &
du Plessis
,
M. C.
(
2019
).
Towards the incorporation of proprioception in evolutionary robotics controllers
. In
International Conference on Robotic Computing
.
New York
:
IEEE
.
36
Pretorius
,
C. J.
,
du Plessis
,
M. C.
, &
Cilliers
,
C.
(
2013
).
Simulating robots without conventional physics: A neural network approach
.
Journal of Intelligent & Robotic Systems
,
71
(
3–4
),
319
348
.
37
Pretorius
,
C. J.
,
du Plessis
,
M. C.
, &
Cilliers
,
C. B.
(
2009
).
Towards an artificial neural network-based simulator for behavioural evolution in evolutionary robotics
. In
Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
(pp.
170
178
).
New York
:
ACM
.
38
Pretorius
,
C. J.
,
du Plessis
,
M. C.
, &
Cilliers
,
C. B.
(
2010
).
A neural network-based kinematic and light-perception simulator for simple robotic evolution
. In
2010 IEEE Congress on Evolutionary Computation (CEC)
(pp.
1
8
).
New York
:
IEEE
.
39
Pretorius
,
C. J.
,
du Plessis
,
M. C.
, &
Gonsalves
,
J. W.
(
2014
).
A comparison of neural networks and physics models as motion simulators for simple robotic evolution
. In
2014 IEEE Congress on Evolutionary Computation (CEC)
(pp.
2793
2800
).
New York
:
IEEE
.
40
Pretorius
,
C. J.
,
du Plessis
,
M. C.
, &
Gonsalves
,
J. W.
(
2017
).
Neuroevolution of inverted pendulum control: A comparative study of simulation techniques
.
Journal of Intelligent & Robotic Systems
,
86
(
3–4
),
419
445
.
41
Proske
,
U.
, &
Gandevia
,
S. C.
(
2012
).
The proprioceptive senses: Their roles in signaling body shape, body position and movement, and muscle force
.
Physiological Reviews
,
92
(
4
),
1651
1697
.
42
Savastano
,
P.
, &
Nolfi
,
S.
(
2012
).
Incremental learning in a 14 DOF simulated iCub robot: Modeling infant reach/grasp development
. In
Conference on Biomimetic and Biohybrid Systems
(pp.
250
261
).
Berlin, Heidelberg
:
Springer
.
43
Skinner
,
D. M.
,
Etchegary
,
C. M.
,
Ekert-Maret
,
E. C.
,
Baker
,
C. J.
,
Harley
,
C. W.
,
Evans
,
J. H.
, &
Martin
,
G. M.
(
2003
).
An analysis of response, direction and place learning in an open field and T maze
.
Journal of Experimental Psychology: Animal Behavior Processes
,
29
(
1
),
3
.
44
Soter
,
G.
,
Conn
,
A.
,
Hauser
,
H.
, &
Rossiter
,
J.
(
2018
).
Bodily aware soft robots: Integration of proprioceptive and exteroceptive sensors
. In
2018 IEEE International Conference on Robotics and Automation (ICRA)
(pp.
2448
2453
).
New York
:
IEEE
.
45
Tuthill
,
J. C.
, &
Azim
,
E.
(
2018
).
Proprioception
.
Current Biology
,
28
(
5
),
R194
R203
.
46
Urzelai
,
J.
,
Floreano
,
D.
,
Dorigo
,
M.
, &
Colombetti
,
M.
(
1998
).
Incremental robot shaping
.
Connection Science
,
10
(
3–4
),
341
360
.
47
Woodford
,
G. W.
, &
du Plessis
,
M. C.
(
2018
).
Robotic snake simulation using ensembles of artificial neural networks in evolutionary robotics
. In
Proceedings of the Genetic and Evolutionary Computation Conference
(pp.
173
180
).
New York
:
ACM
.
48
Woodford
,
G. W.
,
Pretorius
,
C. J.
, &
du Plessis
,
M. C.
(
2016
).
Concurrent controller and simulator neural network development for a differentially-steered robot in evolutionary robotics
.
Robotics and Autonomous Systems
,
76
,
80
92
.
49
Yu
,
X.
, &
Gen
,
M.
(
2010
).
Introduction to evolutionary algorithms
.
New York
: