## Abstract

The possibility of using competitive evolutionary algorithms to generate long-term progress is normally prevented by the convergence on limit cycle dynamics in which the evolving agents keep progressing against their current competitors by periodically rediscovering solutions adopted previously. This leads to local but not to global progress (i.e., progress against all possible competitors). We propose a new competitive algorithm that produces long-term global progress by identifying and filtering out opportunistic variations, that is, variations leading to progress against current competitors and retrogression against other competitors. The efficacy of the method is validated on the coevolution of predator and prey robots, a classic problem that has been used in related researches. The accumulation of global progress over many generations leads to effective solutions that involve the production of articulated behaviors. The complexity of the behavior displayed by the evolving robots increases across generations, although progress in performance is not always accompanied by behavior complexification.

## 1 Introduction

Competitive coevolution—that is, the evolution of populations with coupled fitness—presents important potential advantages.

First, the coevolution of competing species such as predator and prey might favor the synthesis of evolutionary innovations. Indeed, an adaptation in one lineage (e.g., predators) may change the selection pressure on another lineage (e.g., prey), giving rise to a counter-adaptation. If this occurs reciprocally, “an unstable runaway escalation of ‘arms races’ may result” [1, p. 54]. In other words, adaptations on one side call for counter-adaptations on the other side, and the counter-adaptations call for more counter-adaptations and so on, thus producing an escalation process.

Secondly, competitive coevolution can potentially produce a self-regulating incremental process in which the complexity of the adaptive problem is tuned to the ability of the evolving agents and increases across generations. In fact, in competitive scenarios the complexity of the problem depends primarily on the efficacy of the competitors, which become better and better as the skills of the evolving agents increase. Exposing agents to progressively harder conditions and to conditions that match their skill level can facilitate the discovery of progressively better strategies [40].

Thirdly, competitive coevolution methods constitute a natural choice for problems, such as game-playing, in which identifying an absolute quality measure is difficult or not possible [9].

Unfortunately, the occurrence of arms races leading to global progress for a prolonged period of time is only one of the possible outcomes of a competitive coevolutionary dynamics. “One side may drive the other to extinction; one side might reach a definable optimum, thereby preventing the other side from reaching its optimum; both sides might reach a mutual local optimum; or the race may persist in a theoretically endless limit cycle” [15, p. 70]. Moreover, prolonged progress against competitors does not necessarily imply global progress [32], that is, the development of strategies that are more and more effective not only against the current competitors (local progress) but also against ancient competitors (historical progress) and against new competitors (global progress).

Indeed, the competitive coevolutionary experiments carried out to date have not shown evidence of long-term global progress [31, 37]. The most common outcome is a limit cycle dynamic in which qualitatively similar strategies are abandoned and rediscovered over and over again [13, 38, 53]. To understand the nature of this dynamics, suppose that at a certain evolutionary stage population 1 adopts a strategy A that is effective against the strategy B currently adopted by the competing population 2. Imagine now that there is a strategy C (genetically similar to B) that is more effective against the strategy A. Population 2 will abandon the strategy B in favor of C. Imagine now that there is a strategy D (genetically similar to A) that is effective against C. Population 1 will abandon A in favor of D. Finally, imagine that the previous strategy B is effective against the strategy D. Population 2 will abandon C and will return to B. At this point, population 1 will also return to A (because, as explained above, A is effective against B). This implies that the two populations return to their initial strategies A and B and will then keep re-adopting C and D and A and B again and again. Cycling dynamics of this type have been found in natural evolution, for example, in populations of side-blotched lizards (Uta stansburiana) ([47]; for a discussion of less regular forms of cycling see [8]).

In this article we demonstrate for the first time how coevolutionary experiments involving predator and prey robots can lead to long-term global progress. This is achieved by using an anti-opportunistic algorithm that periodically divides the populations into a training and a validation set and uses the validation set to identify and to filter out the variations leading to local progress only, that is, to progress against the training opponents accompanied by retrogression against the validation opponents. Moreover, we demonstrate how the rate of global progress can be increased by exposing evolving robots to well-differentiated competitors and by preserving individuals displaying good performance against hard-to-handle competitors. Finally, we show how the behaviors of the robots evolved with our algorithm become progressively more complex across generations, although progress in performance is not always accompanied by behavior complexification.

In Section 2 we describe the method proposed and its relation to the state of the art. In Section 3 we report our results. In Section 4 we analyze the complexity of the behavior displayed by evolving agents across generations. In Section 5 we report additional experiments performed in a more complex environment. Finally, in Section 6, we draw our conclusions.

## 2 Method and Relation to the State of the Art

Competitive coevolution involves two evolving populations with coupled fitness, such as predators and prey or hosts and parasites. Alternatively, it can involve a single population of competing individuals, such as fighting agents or game players in symmetrical games. The fitness of individuals is computed during multiple evaluation episodes in which the individuals interact with different opponents.

In this article we focus on the maximization of the expected utility, that is, the performance against all possible opponents. Other authors have investigated the use of competitive coevolution for the synthesis of Nash equilibrium solutions [19, 54] and Pareto optimal solutions [17]. The maximization of the expected utility has a high practical relevance, since it is not constrained by the problems and limits of alternative approaches—more specifically, by the fact that Nash equilibrium solutions do not necessarily correspond to high-performing solutions, and by the fact that Pareto optimal solutions can be applied to deterministic problems only and often require the optimization of a number of objectives that is too large for a practical search algorithm [1].

Previous attempts to maximize the expected utility focused on the utilization of archives of ancient opponents or of randomly generated opponents. The idea of preserving individuals from previous generations and of evaluating agents against current and ancient opponents was introduced by Rosin and Belew [40]. Their method preserves the best individuals of each generation in a hall-of-fame archive and evaluates evolving agents against all the opponents contained in the archive.

This technique guarantees historical progress, but not global progress, that is, the development of solutions that are better and better against all possible competitors. Indeed, as reported in [38], predator and prey robots evolved against all the members of the hall-of-fame archive display increasingly better performance against ancient robots included in the archive, but produce solutions that are less effective than those produced with a Vanilla algorithm that does not use the archive. This can be explained by considering that evaluating agents against an increasing number of ancient competitors facilitates premature convergence toward local optima [38]. More generally, it leads to the generation of a limited number of qualitatively different opponents, to solutions that are overfitted to the opponents contained in the archive, and to solutions that generalize poorly to other opponents [32].

A possibly better algorithm is the Maxsolve introduced by De Jong [18], which maintains an archive containing a limited number of agents. In each iteration, the algorithm receives a new set of agents and a new set of opponents that might or might not be included in the archive. The selection is performed by discarding solutions identical to those already included in the archive and by replacing the agents in the archive showing the worst performance with new agents showing the best performance, provided that the latter outperform the former.

Promising results were collected by using a variation of the Maxsolve method by Samothrakis, Lucas, and Runarsson [41] in the context of the coevolution of Othello game players. In their model the archive has a limited size, the evolving agents are evaluated against all the opponents contained in the archive, and the archive is updated only when a new agent outperforms all the opponents included in the archive (in that case, the new agent is used to replace the oldest member of the archive). As remarked by the authors, however, this technique is only applicable to problems in which the outcome of the evaluation episodes is deterministic, as in perfect games [41].

An alternative approach consists in using randomly generated opponents [9, 10]. The potential advantage of this technique is the direct promotion of global progress due to the fact that the agents are always evaluated against new opponents. The disadvantage is that the efficacy of the opponents does not increase across generations. Consequently, the method does not permit one to develop agents capable of defeating strong opponents. Indeed, the results obtained through this method by Samothrakis, Lucas, and Runarsson [41] were much poorer than those obtained by using a variation of the Maxsolve algorithm mentioned above.

In the context of reinforcement learning, the utilization of competitive multi-agent scenarios recently produced remarkable results for the training of game players [26, 44] and simulated robots [3].

### 2.1 The Generalist Algorithm

We propose a new algorithm that promotes the evolution of global progress by filtering out opportunistic solutions. This is realized by periodically (i) evolving a subset of a population (agents) against a subset of the competing population (opponents), (ii) evaluating the evolved agents against all opponents, and (iii) filtering out opportunistic agents that perform well against the selected opponents but poorly against the other opponents.

More specifically, the evolutionary process (see also the pseudocode in Table 1) is organized in a series of phases in which a size-n subset of the first population is evolved against a size-n subset of the second population for a certain number of generations (where N is the size of each of the two populations and n < N) and vice versa. The former and the latter subsets are referred to as the agents and the opponents, respectively. The remaining opponents are referred to as the validation opponents.

Table 1.
Pseudocode for the generalist algorithm (standard condition).
 1 Start 2 N = 80, n = 10, nphases = 1500, ngenerations = 100, current_generation = 0 3 Initialize the genome of pop1 and pop2 populations 4 use predator as evolving agents and prey as opponents 5 for phase in [0, nphases] 6 selectn opponents through the clustering method (see text) 7 selectn agents (i.e., the agents with the highest fitness against each selected opponent) 8 for generation in [0, ngenerations] 9 createn offspring (i.e., n mutated copies of the n selected evolving individuals) 10 evaluaten offspring for n episodes against each opponent 11 replace each parent with each offspring if the fitness of the latter is ≥ the fitness of the former 12 current_generation += 1 13 if ((current_generation % 500) == 0) invert agents and opponents 14 rankN + n agents on the basis of their weighted performance against N opponents (see text) 15 replace the evolving population with the top N agents selected among the N + n agents 16 End
 1 Start 2 N = 80, n = 10, nphases = 1500, ngenerations = 100, current_generation = 0 3 Initialize the genome of pop1 and pop2 populations 4 use predator as evolving agents and prey as opponents 5 for phase in [0, nphases] 6 selectn opponents through the clustering method (see text) 7 selectn agents (i.e., the agents with the highest fitness against each selected opponent) 8 for generation in [0, ngenerations] 9 createn offspring (i.e., n mutated copies of the n selected evolving individuals) 10 evaluaten offspring for n episodes against each opponent 11 replace each parent with each offspring if the fitness of the latter is ≥ the fitness of the former 12 current_generation += 1 13 if ((current_generation % 500) == 0) invert agents and opponents 14 rankN + n agents on the basis of their weighted performance against N opponents (see text) 15 replace the evolving population with the top N agents selected among the N + n agents 16 End

At the beginning of each phase, the two subsets are chosen (see the explanation below). Then for a certain number of generations, the agents are evolved against the opponents. Each agent is evaluated for n episodes against each opponent. At the end of each phase, the population of N agents is replaced with the best N individuals selected among the N original agents and the n evolved agents. The best agents are identified by ranking the N + n individuals on the basis of the average performance obtained against all opponents (i.e., against the opponents and the validation opponents).

The filtering of opportunistic individuals occurs in this last phase, since agents with enhanced performance against the opponents and reduced performance against the validation opponents are discarded. Similar techniques are used in the field of machine learning to avoid overfitting, that is, to avoid the retention of variations that produce improvements with respect to the training set and retrogressions with respect to the validation set [42, 49]. The method proposed can also be considered an efficient heuristic for sampling the “maximally informative test set” discussed by Bucci and Pollack [7].

Using n < N is necessary to leave Nn individuals for the validation set. In our experiments we set N to 80 and n to 10. Preliminary results suggest that qualitatively similar results can be obtained by setting N to 20. We did not collect results by varying n. We use a full pairwise competition in which each agent is evaluated against each opponent. The best-versus-best method [46] or tournament selection [1] allows one to reduce the number of evaluations but increases the risks of overspecialization.

In the case of the experiments reported in this article, agents are evolved through a (1 + 1) evolutionary strategy [39]. Each agent is allowed to generate an offspring, that is, a copy with mutations. Mutations are realized by replacing 2% of the genes with floating-point values selected randomly with a uniform distribution in the range [−5.0, 5.0]. Each parent is replaced by its corresponding offspring if the fitness of the latter is greater than or equal to the fitness of the former. However, the generalist method is a meta-heuristic that can be used in combination with any black box algorithm (e.g., CMA-ES [27]).

The n opponents are selected by clustering all opponents into n groups on the basis of the similarity of the fitnesses achieved by them against all agents, and by choosing the fittest opponent of each group. The n opponents are then chosen by selecting the best opponent of each group. Similarity is calculated on the basis of the Euclidean distance between the N vectors of N values that encode the performance of each opponent against each agent. The clustering is realized in three steps: First, each opponent is clustered with the most similar opponent so as to form N/2 groups; then, each group is clustered with the closest other group so as to form N/4 and finally N/8 groups. This clustering method permits one to select high-performing opponents among a well-differentiated group formed by the same number of opponents. Notice that opponents are differentiated with respect to the strategies that are effective against them and not with respect to the behavior that they produce.

The n agents are selected by choosing the agents that achieved the highest fitness against each of the n selected opponents. This allows one to select agents displaying high performance against each opponent, although not necessarily against all opponents.

The ranking of N + n agents at the end of each phase is performed by multiplying the fitness achieved against each of the N opponents by the average fitness obtained by the opponent. The utilization of this weighted ranking technique [40] permits us to preserve agents that perform well against high-performing opponents.

The utilization of alternated evolutionary phases in which a population evolves against a fixed set of opponents and vice versa allows us to avoid the need to reevaluate the performance of the original agents against the opponents. Moreover, it allows us to reduce the speed at which the social environment of the agents varies. As shown by Milano et al. [33, 34], agents evolved in environments that vary at a moderate rate (viz., every N generations) achieve better performance than agents evolved in environments that vary every generation.

To verify the importance of selecting well-differentiated opponents and the importance of preserving agents performing well against high-performing opponents, we also carried out experiments with a variation of this algorithm, called the Simplified algorithm, in which (i) the opponents are selected by choosing the n individuals with the highest average fitness against all agents, and (ii) N + n agents are ranked directly on the basis of their average fitness, rather than by using the weighted ranking described above.

Finally, to verify the importance of filtering out opportunistic individuals, we also carried out experiments with a second control algorithm, in which all agents are evaluated against all opponents (i.e., in which n = N). This method corresponds to a Vanilla coevolutionary algorithm that does not include a method for filtering out opportunistic agents.

### 2.2 The Predator-and-Prey Problem

We decided to test our algorithm on the coevolution of predator and prey robots, since it is a challenging problem both for natural organisms and for autonomous robots [35], and since it has been widely used as a test bed for competitive coevolutionary algorithms [5, 6, 20, 21, 22, 25, 28, 35, 38, 50]. Indeed, the need to face highly dynamic, largely unpredictable, and hostile environments requires the development of reactive, robust, and reliable solutions. Moreover, mastering predator and prey competition requires the ability to display several integrated behavioral and cognitive capabilities such as avoiding fixed and moving obstacles, optimizing the motion trajectory with respect to multiple constraints, integrating sensory information over time, anticipating the behavior of the opponent, managing appropriately available energy resources, disorienting the opponent and coping with protean defense behaviors [27], and adapting on the fly to the behavior of the current opponent.

The robots are constituted by a simulated MarxBot [4] provided with neural network controllers. The connection strength of the robots' neural network that determines their behavior is encoded in artificial genotypes and evolved. Predators are evolved for the ability to capture prey (i.e., to reach and physically touch the prey) as fast as possible, and prey are evolved for the ability to avoid being captured as long as possible. The fitness of the prey and of the predators corresponds to the fraction of time required by the predator to capture the prey and to the reciprocal of the fraction of time required by the predator to capture the prey, respectively. A detailed description of the characteristics of the robots and of the robots' neural controllers is provided in the  Appendix.

### 2.3 Measuring Progress

In experiments involving a single population of agents evolving in nonvarying environments, progress corresponds to increase of fitness over generations. However, in competitive coevolutionary experiments in which the social environment varies, the fitness cannot be used to infer directly the efficacy of individuals. This is due to the fact that the fitness of individuals depends not only on the characteristics of the agents, but also on the characteristics of the competitors that vary throughout generations.

This implies that, as pointed out by Van Valen [51], evolving populations should adapt at the same rate as their competitors to preserve the same fitness level. The term “Red Queen,” introduced by Van Valen [51] to characterize the dynamics of multiple evolving populations, is derived from the statement that the Red Queen makes to Alice in Lewis Carroll's novel Through the Looking-Glass: “Now, here, you see, it takes all the running you can do, to keep in the same place.” A phase during which the fitnesses of the two populations do not vary can be either a stagnation phase, in which the two populations do not improve, or a progress phase, in which both populations improve and in which the advantage gained by the retention of adaptive variations in one population is compensated by the retention of adaptive variations in the opponent population. Moreover, phases during which the fitness of one population increases can correspond either to an improvement of that population or to a regression of the opponent population.

Consequently, measuring progress in a competitive coevolutionary scenario requires the utilization of more elaborate measures, such as CIAO and master tournaments. CIAO plots [11, 13] are obtained by post-evaluating individuals against competitors of previous generations and are consequently useful to measure historical progress. Master tournament analyses [38] are obtained by post-evaluating individuals against competitors of previous and future generations and are consequently useful to measure both historical and global progress. The relative efficacy of alternative experimental conditions with respect to global progress can be evaluated by testing the individuals evolved in a first experimental condition with the competitors evolved in a second experimental condition [32, 38]. Finally, global progress can also be measured by post-evaluating the individuals of successive generations of an evolutionary experiment against the opponents evolved in other replications of the experiment [32]. In other words, the opponents obtained in another replication can be used as a validation set.

The duration of progress is the number of generations during which the evolving agents keep increasing their performance against competitors of previous generations (historical progress) and/or previous and future generations (global progress). The rate of progress is the ratio between the amount of performance increase, measured by post-evaluating the individuals against competitors of previous generations, and the number of generations during which that increase is produced.

## 3 Results

Here we report the results obtained by evolving the predator and prey robots for 150,000 generations with the generalist algorithm (Standard) and with the two control algorithms (Simplified and Vanilla). Five replication experiments for each condition were run. The predator and prey population size (N) was set to 80. The number of agents and opponents (n) was set to 10. Evolving phases last 100 generations.

As expected, the fitnesses of the predator and of the prey increase and decrease, respectively, during the phases in which predators evolve, and decrease and increase, respectively, during the phases in which prey evolve. For clarity, Figure 1 reports data for the first and last 5,000 generations (upper and lower panels). Sudden variations occurring every 100 generations are caused by the selection of new agents and opponents (see Section 2.1). Notice how the evolving robots improve their performance against their current competitors through the entire evolutionary process. This indicates that the coevolutionary dynamics never converges on a stable state. The evolving individuals always maintain an ability to progress against their current competitors. This result is in line with the data collected in many previous coevolutionary experiments [5, 12, 28, 38, 50] in which coevolving individuals keep changing until the end of the evolutionary process, never converging on a stable state.

Figure 1.

Average fitness of the evolving robots (n agents) throughout the generations. The red and green curves correspond to predator and prey robots, respectively. Data plotted every generation. For the sake of clarity we displayed only the first and the last 5,000 generations of the first replication of the experiment (top and last panel, respectively). The other replications produced qualitatively similar results.

Figure 1.

Average fitness of the evolving robots (n agents) throughout the generations. The red and green curves correspond to predator and prey robots, respectively. Data plotted every generation. For the sake of clarity we displayed only the first and the last 5,000 generations of the first replication of the experiment (top and last panel, respectively). The other replications produced qualitatively similar results.

Figure 2 displays the average fitness of the two populations of 80 individuals evaluated against each other throughout the generations. As can be seen, the relative fitness of prey increased over generations, whereas predators' fitness decreased. This implies that, overall, the problem of capturing prey is more difficult than the problem of escaping predators in our experimental condition. Prey, however, never managed to fully defeat predators. This implies that both species have the possibility of improving during the entire coevolutionary process. Notice also that for the reasons discussed in Section 2.3, the fact that the relative fitness of the two populations was fairly stable throughout the generations does not imply that the behavior of the agents remained stable. As we will illustrate below, predator and prey became more and more effective throughout the generations. The fitness remains relatively stable because progress of one species is generally compensated by progress of the other species.

Figure 2.

Average fitness of predator and prey populations (N individuals) evaluated against each other throughout the generations. The red and green curves correspond to predators and prey, respectively. Data plotted every 100 generations. Data for the first replication of the Standard experimental condition. The other replications produced qualitatively similar results.

Figure 2.

Average fitness of predator and prey populations (N individuals) evaluated against each other throughout the generations. The red and green curves correspond to predators and prey, respectively. Data plotted every 100 generations. Data for the first replication of the Standard experimental condition. The other replications produced qualitatively similar results.

To verify whether coevolution leads to historical and global progress, we post-evaluated the two populations every 10,000 generations against competitors of previous and future generations, selected every 10,000 generations. The results indicate the occurrence of both historical and global progress (see Figure 3). Indeed, robots at any particular generation generally performed better against ancient competitors than against current competitors. For example, predators of generation 150,000 achieved a fitness of about 0.25 against contemporary competitors and a fitness of about 0.6 against competitors of generation 10,000. Prey of generation 150,000 achieved a fitness of about 0.75 against contemporary competitors and a fitness of about 0.9 against competitors of generation 10,000. To appreciate the implications of these differences, it is important to point out that relatively small improvements in fitness corresponded to large improvements in behavioral capabilities (see below and the additional online materials). Preliminary results of these experiments were reported in [45].

Figure 3.

Performance of predators and prey of every 10,000 generations post-evaluated against competitors of previous and following generations (master tournament). For each comparison, we indicate the performance of the predator. The performance of the prey corresponds to 1.0 minus the performance of the predator. The left and right plots display the results of the first replication and the average result of five replications, respectively.

Figure 3.

Performance of predators and prey of every 10,000 generations post-evaluated against competitors of previous and following generations (master tournament). For each comparison, we indicate the performance of the predator. The performance of the prey corresponds to 1.0 minus the performance of the predator. The left and right plots display the results of the first replication and the average result of five replications, respectively.

Moreover, robots at any particular generation generally performed better against opponents of future generations than against robots of previous generations. This implies that the strategies acquired by predator and prey of succeeding generations are not only more and more effective against previous opponents but also more and more effective in general—that is, are more and more capable of defeating opponents, including opponents that they never encountered.

For what concerns historical progress, Table 2 shows the most recent generation of ancient opponents that achieved significant lower performance than current opponents against each tested generation of agents (1-tailed t-tests, N = 5 replications). As shown in the table, in the case of the standard algorithm, historical progress continued until generation 130,000 for predators and until generation 140,000 for prey. Indeed, predators of generation 130,000 produced significantly better performance against prey of generation 70,000 than against prey of generation 130,000. Moreover, prey of generation 140,000 achieved significantly better performance against predators of generation 70,000 than against contemporary prey. In the case of the Simplified algorithm, historical progress continued up to generation 140,000 for both predators and prey. In contrast, the Vanilla algorithm did not show any evidence of historical progress. In fact, performance against ancient competitors was not significantly better than performance against recent competitors, independently from how ancient the former competitors were.

Table 2.
Historical progress measured as the most recent generation of ancient opponents against which evolving agents obtained significantly better performance than against current opponents.
Agent generationStandardSimplifiedVanilla
PredatorsPreyPredatorsPreyPredatorsPrey
10
11
12
13
14
15
Agent generationStandardSimplifiedVanilla
PredatorsPreyPredatorsPreyPredatorsPrey
10
11
12
13
14
15

Notes. Results relative to the three evolving algorithms are separately reported in the left (Standard), middle (Simplified), and right (Vanilla) columns. Generations are expressed in thousands, that is, 1 = 10,000, 2 = 20,000, and so on. Generation 0 refers to the very first generation of randomly generated individuals. Current opponents refer to the opponents of the same generation. Comparison between performances was conducted by using a one-tailed t-test, with N = 5 replications.

For what concerns global progress, Table 3 shows the most recent generation of evolving agents, up to generation 140,000, that achieved significantly better performance against opponents of the last generation with respect to opponents of previous generations (1-tailed t-tests, N = 5 replications). As shown in the table, in the case of the Standard algorithm, coevolution produced global progress up to generation 80,000 and 120,000 in the case of predators and prey, respectively. Indeed, predators of generation 80,000 produced significantly better performance against prey of generation 150,000 than did predators of previous generations down to generation 30,000. Moreover, prey of generation 120,000 produced significantly better performance against predators of generation 150,000 than did prey of previous generations down to generation 70,000. In the case of the Simplified algorithm, global progress continued up to generation 90,000 and 110,000 in the cases of predators and prey, respectively. In contrast, the Vanilla algorithm did not show any evidence of global progress. Indeed, the performance of recent agents was not significantly better than the performance of their ancestors against competitors of generation 150,000.

Table 3.
Global progress measured as the most recent generation of evolving agents that achieved significantly better performance against opponents of generation 150,000 than did agents of more ancient generations.
Agent generationStandardSimplifiedVanilla
PredatorsPreyPredatorsPreyPredatorsPrey
10
11
12
13
14
Agent generationStandardSimplifiedVanilla
PredatorsPreyPredatorsPreyPredatorsPrey
10
11
12
13
14

Notes. Results relative to the three evolving algorithms are separately reported in the left (Standard), middle (Simplified), and right (Vanilla) columns. Generations are expressed in thousands, that is, 1 = 10,000, 2 = 20,000, and so on. Generation 0 refers to the very first generation of randomly generated individuals. Comparison between performances was conducted by means of a one-tailed t-test, with N = 5 replications.

Overall, these data indicate that the Standard and the Simplified algorithms produced both historical and global progress for prolonged evolutionary periods. The Vanilla algorithm, in contrast, did not produce either historical or general progress.

To verify the relative efficacy of the three methods we post-evaluated predators and prey of generation 150,000 evolved with the Standard algorithm against the opponents of the same generation evolved with the Simplified and Vanilla algorithms (Figure 4). In each comparison, we evaluated the robots evolved in the five replications of the Standard condition against the robots evolved in the five replications of the Simplified and the Vanilla conditions. The predators evolved in the Standard condition achieved significantly higher performance when they were evaluated against the prey evolved in the Simplified (p < .05) and Vanilla (p < .001) conditions than when they were evaluated against the prey evolved in the Standard condition (Figure 4, red histograms). Similarly, the prey evolved in the Standard condition achieved significantly higher performance when they were evaluated against the predators evolved in the Simplified (p < .05) and the Vanilla (p < .001) conditions than when they were evaluated against the predators evolved in the Standard condition (Figure 4, blue histograms). Therefore, predators and prey evolved in the Standard condition outperformed the agents evolved in the other conditions.

Figure 4.

Cross-experiment test. The histograms display the performance of predators (red) and prey (blue) evolved in the Standard condition evaluated against the opponents evolved in the Standard, Simplified, and Vanilla conditions, respectively.

Figure 4.

Cross-experiment test. The histograms display the performance of predators (red) and prey (blue) evolved in the Standard condition evaluated against the opponents evolved in the Standard, Simplified, and Vanilla conditions, respectively.

Overall, these results indicate that the possibility of filtering out opportunistic individuals on the basis of the performance obtained against the validation opponents, which is missing in the Vanilla condition, permits developing more effective solutions. Moreover, these results indicate that exposing evolving agents to well-differentiated opponents and preserving agents capable of defeating strong opponents, which cannot happen in the Simplified experimental condition, permits developing better solutions.

The agents evolved displayed quite sophisticated behaviors, as can be appreciated from Figure 5 discussed below, and from the videos available from http://laral.istc.cnr.it/res/predprey2019/. The agents evolved in the Standard experimental condition, in particular, display all the capabilities that we mentioned above: They avoid fixed and moving obstacles; optimize their motion trajectory with respect to multiple constraints; integrate sensory information over time and regulate their behavior accordingly; anticipate the behavior of the competitor; disorient the competitor by producing variable and irregular behaviors and display strategy able to cope with such protean behaviors; and adapt on the fly to the behavior of the current competitor. Remarkably, prey display the ability to avoid obstacles and to escape from the predator by moving both forward and backward, and the ability to alternate phases during which they move forward or backward appropriately, depending on the circumstances.

Figure 5.

Representative predator and prey behaviors of agents of generation 150,000 ordered by the behavioral complexity of predators (panel (a)) or prey (panel (b)). Each box displays the trajectories of the predator and the prey in red and green. The light and dark circles indicate the starting and the final position of the robots in the corresponding evaluation episode. The curves at the bottom of the trajectories indicate the translational and rotational movements of the predator and prey robots over time (red and green lines, respectively). The red and green numbers at the top indicate the behavioral complexity of the predators and prey, respectively.

Figure 5.

Representative predator and prey behaviors of agents of generation 150,000 ordered by the behavioral complexity of predators (panel (a)) or prey (panel (b)). Each box displays the trajectories of the predator and the prey in red and green. The light and dark circles indicate the starting and the final position of the robots in the corresponding evaluation episode. The curves at the bottom of the trajectories indicate the translational and rotational movements of the predator and prey robots over time (red and green lines, respectively). The red and green numbers at the top indicate the behavioral complexity of the predators and prey, respectively.

## 4 Behavior Complexification

Visual inspection of the robots' behavior indicates that the capabilities of the agents and the articulation of the agents' behavior tend to increase throughout generations. To verify whether and to what extent the behavior of the robots become more complex throughout generations, we need to identify a method to measure the complexity of behavior.

Although humans have an intuitive notion of what complexity is, identifying a formal way to measure it is far from trivial. One widely used measure of complexity is Shannon's entropy [43], which measures the uncertainty of a random variable. This technique can be easily applied to systems composed of multiple discrete components such as cellular automata [55]. Its application to systems that are continuous and characterized by multilevel and multi-scale organizations, however, is more challenging. This is the case of the behavior of our robots, since it is the result of continuous actions and since it displays a multi-scale organization in which one can identify simple short-term behaviors lasting a few hundred milliseconds and long-term behaviors lasting several seconds. An example of short-term behavior is constituted by a phase lasting a few hundred milliseconds during which a prey robot turns on the spot to avoid a wall located in its frontal direction. An example of long-term behavior is constituted by a phase during which a predator robot moves along a circular-like trajectory by attempting to reach the prey, which moves at full speed along an external circular-like trajectory, and by attempting to push the prey against the walls surrounding the arena.

A simple and effective way to measure the complexity of the behavior exhibited by our robots consists in measuring the average derivative of the translational and rotational speed of the robots' wheels calculated on the basis of the following equation:
$c=∑t=1stv¯t−tv¯t−1+rv¯t−rv¯t−1s×2$
where $tv¯$ and $rv¯$ are the desired translational and rotational speeds, and s is the number of steps.

Figure 5 shows the complexity level of typical evolved behaviors calculated on the basis of this measure. The figure was obtained by (i) measuring the complexity levels of the behavior exhibited by 80 evolved predators evaluated against 80 evolved prey on the basis of the equation above, (ii) ranking the corresponding 3,600 evaluation episodes on the basis of the complexity of the behavior exhibited by predators and prey, and (iii) displaying the behavior of the predators and prey with the simplest, the simple/average, the average, the average/complex, and the most complex behavior (from left to right box, respectively). As can be seen, the measure correctly discriminated between simple behaviors in which the predators move at nearly constant speed along straight or circular-like trajectories (see left boxes of Figure 5, both panels) and more complex behaviors in which predators suddenly reversed their direction of motion to block reiterated attempts of the prey to find alternative escape directions (see far right box of Figure 5, both panels).

This method for measuring complexity is not general. In particular, it can be too simple in the case of agents provided with articulated limbs in which the variations of the initial joints have much larger effects than the variation of the final joints, or in the case of agents evaluated for significantly long periods of time in which the time extension of functionally relevant behaviors can vary widely. However, it is appropriate for the experimental setting considered in this article, in which the behavior of the agents is regulated by few independent actuators and in which the overall duration of the agents' behavior is limited.

Figure 6 shows the average complexity level of predator and prey behaviors across generations for the experiments carried out with the Standard, Simplified, and Vanilla algorithms. The complexity measure is averaged over the individuals of the population obtained in the five replications of the experiment. The behavior of the agents evolved for 150,000 generations in the Standard condition is more complex than the behavior of the agents evolved in the other two conditions (versus Simplified, p < .01; versus Vanilla, p < .01). Moreover, the behavior of the agents evolved in the Simplified condition is more complex than the behavior of the agents evolved in the Vanilla condition (p < .01).

Figure 6.

Behavioral complexity of agents throughout the generations in the case of the experiments carried out with the Standard, Simplified, and Vanilla experimental conditions. The left and right pictures display the complexity of the behavior of predator and prey robots, respectively. Each point of each curve represents the average complexity of the behavior of all agents of a given generation post-evaluated against all opponents of the same generation, every 10,000 generations. Data averaged over five replications. The error bars depict standard errors.

Figure 6.

Behavioral complexity of agents throughout the generations in the case of the experiments carried out with the Standard, Simplified, and Vanilla experimental conditions. The left and right pictures display the complexity of the behavior of predator and prey robots, respectively. Each point of each curve represents the average complexity of the behavior of all agents of a given generation post-evaluated against all opponents of the same generation, every 10,000 generations. Data averaged over five replications. The error bars depict standard errors.

To verify the duration of the complexification process across generations, we compared the complexity of the behaviors of the robots of each generation (every 10,000 generations) with the complexity of the behaviors of the robots of previous generations. In the case of the Standard condition, the complexity of both predators and prey increased significantly up to generation 90,000, as demonstrated by the fact that the complexity of the behaviors of the robots at this stage is significantly higher than the complexity of the behaviors of the robots in previous generations for both predators and prey. In the case of the experiment performed in the Simplified condition, the complexity of behavior increases up to generation 110,000 for both predators and prey. In contrast, in the case of the experiment performed in the Vanilla condition, the complexity of behavior does not vary significantly across generations.

The fact that the agents evolved in the Standard condition that possess the best-performing strategies (Figure 4) display the most complex behavior indicates the presence of a correlation between performance and behavior complexity. This correlation is also supported by the fact that the performance and the complexity of behavior increase across generations until generations 90,000 and 110,000 in the cases of the Standard and Simplified experimental conditions.

The fact that the complexity of behavior increases for a greater number of generations in the Simplified condition than in the Standard condition can be explained by considering that, as shown above, the rate of progress in the Simplified condition is lower than the rate of progress in the Standard condition.

Figure 7 shows the correlation between the behavioral complexity and the performance of the agents. Performance is measured by post-evaluating the agents against the opponents of the last generation. Performance and behavioral complexity are strongly positively correlated in the cases of the Standard and Simplified conditions, but are not correlated in the case of the Vanilla condition (Figure 7). Indeed, there is a strong positive correlation in the case of predators (r = .90, p < .01) and prey (r = .95, p < .01) in the Standard condition (Figure 7, left). A similar pattern can be observed in the case of the Simplified experimental condition (Figure 7, center) for predators (r = .95, p < .01) and prey (r = .96, p < .01). In contrast, in the case of the Vanilla condition (Figure 7, right), the two measures are not correlated: predators (r = −.09, p = .75) and prey (r = −.06, p = .83).

Figure 7.

Correlations between the performance and the behavioral complexity of evolving agents. Data for the Standard (left panel), Simplified (central panel), and Vanilla (right panel) conditions. Performance refers to the fitness obtained by post-evaluating the agents against the opponents of the last generation.

Figure 7.

Correlations between the performance and the behavioral complexity of evolving agents. Data for the Standard (left panel), Simplified (central panel), and Vanilla (right panel) conditions. Performance refers to the fitness obtained by post-evaluating the agents against the opponents of the last generation.

Overall, these analyses show that global progress produces a complexification of the agents' behavior. There is not, however, a one-to-one correspondence between progress in performance and behavior complexification. Progress in performance arises as a result of variations that increase, maintain, or reduce the complexity of the agents' behavior. The complexification is caused by the fact that the first type of variation is more common than the third.

## 5 Increasing the Complexity of the Environment

Previous studies reported evidence indicating that complexity of the environmental conditions can favor progress [8, 47]. To verify whether this can occur in the case of the problem studied in this article, and to verify whether the relative efficacy of the three algorithms compared is influenced by the specificity of the problem considered, we ran a new set of experiments in which the environment is more complex. This was realized by adding a cylindrical obstacle with a diameter of 0.1 m at the center of the environment. The obstacle is sufficiently high to visually occlude the opponent. This additional obstacle represents a new opportunity and a new constraint for both species. From the point of view of predators, it constitutes an additional opportunity to push prey into deadlock situations and an additional constraint on the movement of the predators. From the point of view of the prey, it enables them to hide from the predators and to shelter from predator attacks, but also constitutes a constraint with respect to escape paths.

Figure 8 shows the master tournament analysis for the three experimental conditions. Overall, the performance of the predators is higher in these experiments than in the basic experiments reported above. However, also in this case neither of the two species manages to fully defeat the competing species. Consequently, both species have the possibility of improving during the entire coevolutionary process.

Figure 8.

Performance of predators and prey in every 10,000 generations post-evaluated against competitors of previous and following generations (master tournament). For each comparison, we indicate the performance of the predator. The performance of the prey is 1.0 minus the performance of the predators. Results obtained in the Standard (left panel), Simplified (central panel), and Vanilla (right panel) conditions. Each panel shows the average results of five replications.

Figure 8.

Performance of predators and prey in every 10,000 generations post-evaluated against competitors of previous and following generations (master tournament). For each comparison, we indicate the performance of the predator. The performance of the prey is 1.0 minus the performance of the predators. Results obtained in the Standard (left panel), Simplified (central panel), and Vanilla (right panel) conditions. Each panel shows the average results of five replications.

The dynamics of the coevolutionary process is qualitatively similar and is characterized by historical and global progress in the Standard and Simplified conditions and by the lack of global progress in the Vanilla condition.

Figure 9 shows the comparison of the historical and global progress in the basic experiments reported in previous sections and in the new experiments with the additional obstacle. As the measure of historical progress, we report the performance of the agents of the last generation against the opponents of previous generations (Figure 9, panel (a)). As the measure of global progress, we report the performance of agents of all generations against the opponents of the last generation (Figure 9, panel (b)). Performance has been normalized in the range [0.0, 1.0].

Figure 9.

Panel (a) displays the performance of the agents in the last generation against opponents in previous generations. Panel (b) displays the performance of the agents in all generations against agents in the last generation. The left, middle, and right plots show the data obtained in the Standard, Simplified, and Vanilla experimental conditions. Performance data have been normalized in the range [0.0, 1.0]. Each plot shows the average result of five replications.

Figure 9.

Panel (a) displays the performance of the agents in the last generation against opponents in previous generations. Panel (b) displays the performance of the agents in all generations against agents in the last generation. The left, middle, and right plots show the data obtained in the Standard, Simplified, and Vanilla experimental conditions. Performance data have been normalized in the range [0.0, 1.0]. Each plot shows the average result of five replications.

As can be seen, the Standard and the Simplified conditions produce consistent historical and global progress during the entire evolutionary process, both in the original experiments and in the new experiments with the additional obstacle. The Vanilla condition does not produce historical and global progress after the first 10,000 generations, either in the original experiment or in the new experiment with the additional obstacle. The rates of progress produced by the Standard and Simplified conditions in the experiments without and with the additional obstacle are similar.

Consequently, the use of a more complex environment does not affect the achievement of historical or global progress.

## 6 Discussion

We have demonstrated how a specially designed competitive coevolutionary method that identifies and filters out opportunistic individuals produces long-term historical and global progress. This is achieved by periodically dividing the opponents into two subgroups of evolving and non-evolving individuals and by using the latter subgroup to validate the generality of the evolving agents. The term historical progress indicates that evolving agents improve their performance against ancient competitors. The term global progress indicates that evolving agents improve their performance against all types of competitors, including competitors that they did not encounter (e.g., future competitors). The term long-term refers to the fact that evolution continues to produce and to accumulate progress for long evolutionary periods.

As far as we know, this is the first time that competitive coevolutionary experiments involving embodied and situated agents have lead to long-term global progress (i.e., global progress over more than 100,000 generations). Indeed, previous related experiments [5, 11, 12, 28, 38, 50] did not show evidence of global progress and were carried on for only a few hundred generations.

From a theoretical point of view, the selection of agents that generalize with respect to a limited number of validation opponents does not guarantee global progress. On the other hand, as pointed out by Wagner [52], the numbers of solutions that are effective in variable environmental conditions tend to decrease exponentially with the increase of the number of environmental conditions. This implies that the solutions that are effective against a sufficiently large number of opponents can be general—that is, can generalize to a wide range of opponents.

Finally, we showed how exposing agents to well-differentiated opponents and preserving agents capable of defeating strong opponents increases the rate of progress and leads to better solutions.

The method proposed presents elements introduced in other algorithms described in the literature, but combines them in new ways. In particular, the use of a subset of the opponents for validation presents similarities with techniques used in neural networks to reduce or eliminate overfitting [42, 49]. The selection of well-differentiated opponents and the preservation of agents capable of defeating strong opponents has similarities with multi-objective optimization methods [14, 16] and with techniques used to preserve population diversity [29, 36].

The accumulation of global progress over many generations leads to highly effective solutions that involve the production of rather articulated behaviors. More specifically, although there is not a one-to-one correspondence between progress and behavior complexification, the behavior of the evolving agents tends to become more complex across generations.

## References

References
1
Angeline
,
P. J.
, &
Pollack
,
J. B.
(
1993
).
Competitive environments evolve better solutions for complex tasks
. In
Proceedings of the 5th International Conference on Genetic Algorithms
(pp.
264
270
).
2
Baldassarre
,
G.
,
Trianni
,
V.
,
Bonani
,
M.
, et al
(
2007
).
Self-organized coordinated motion in groups of physically connected robots
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B, Cybernetics
37
,
224
239
.
3
Bansal
,
T.
,
Pachocki
,
J.
,
Sidor
,
S.
, et al
(
2018
).
Emergent complexity via multi-agent competition
. In
6th International Conference on Learning Representations, ICLR 2018—Conference Track Proceedings
.
4
Bonani
,
M.
,
Longchamp
,
V.
,
Magnenat
,
S.
, et al
(
2010
).
The marXbot, a miniature mobile robot opening new perspectives for the collective-robotic research
.
In IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010—Conference Proceedings
(pp.
4187
4193
).
New York
:
IEEE
.
5
Buason
,
G.
,
Bergfeldt
,
N.
, &
Ziemke
,
T.
(
2005
).
Brains, bodies, and beyond: Competitive co-evolution of robot controllers, morphologies and environments
. In
Genetic Programming and Evolvable Machines
,
6
,
25
51
.
6
Buason
,
G.
, &
Ziemke
,
T.
(
2003
).
Co-evolving task-dependent visual morphologies in predator-prey experiments
. In
E.
Cantú-Paz
,
J. A.
Foster
,
K.
Deb
, et al
(Eds.),
Genetic and Evolutionary Computation—GECCO 2003
(pp.
458
469
).
Berlin, Heidelberg
:
Springer
.
7
Bucci
,
A.
, &
Pollack
,
J. B.
(
2003
).
A mathematical framework for the study of coevolution
. In
Foundations of Genetic Algorithms (FOGA VII)
,
Vol. 7
(pp.
221
235
).
8
Cartlidge
,
J.
, &
Bullock
,
S.
(
2004
).
Combating coevolutionary disengagement by reducing parasite virulence
.
Evolutionary Computation
,
12
,
193
222
.
9
Chong
,
S. Y.
,
Tiňo
,
P.
,
Ku
,
D. C.
, &
Yao
,
X.
(
2012
).
Improving generalization performance in co-evolutionary learning
.
IEEE Transactions on Evolutionary Computation
,
16
,
70
85
.
10
Chong
,
S. Y.
,
Tiňo
,
P.
, &
Yao
,
X.
(
2009
).
Relationship between generalization and diversity in coevolutionary learning
.
IEEE Transactions on Computational Intelligence and AI in Games
1
,
214
232
.
11
Cliff
,
D.
, &
Miller
,
G. F.
(
1995
).
Tracking the red queen: Measurements of adaptive progress in co-evolutionary simulations
. In
G.
Weiss
&
S.
Sen
(Eds.),
Adaptation and learning in multiagent systems
(pp.
200
218
).
Berlin, Heidelberg
:
Springer Verlag
.
12
Cliff
,
D.
, &
Miller
,
G. F.
(
1996
).
Co-evolution of pursuit and evasion II
. In
P.
Maes
,
M.
Mataric
,
J. A.
Meyer
, et al
(Eds.),
From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior
(pp.
506
515
).
Cambridge, MA
:
MIT Press
.
13
Cliff
,
D.
, &
Miller
,
G. F.
(
2006
).
Visualizing coevolution with CIAO plots
.
Artificial Life
,
12
,
199
202
.
14
Coello
,
C.
,
Lamont
,
G.
, &
Van Veldhuizen
,
D.
(
2007
).
Evolutionary algorithms for solving multi-objective problems
.
Boston
:
Springer
.
15
Dawkins
,
R.
, &
Krebs
,
J. R.
(
1979
).
Arms races between and within species
.
Proceedings of the Royal Society B: Biological Sciences
,
205
,
489
511
.
16
Debb
,
K.
(
2001
).
Multi-objective optimization using evolutionary algorithms
.
Chichester, UK
:
Wiley
.
17
De Jong
,
E. D.
(
2004
).
The incremental Pareto-coevolution archive
. In
K.
Deb
(Ed.),
Genetic and Evolutionary Computation—GECCO 2004
(pp.
525
536
).
Berlin, Heidelberg
:
Springer
.
18
De Jong
,
E.
(
2005
).
The MaxSolve algorithm for coevolution
. In
H.-G.
Beyer
(Eds.),
GECCO '05: Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation
(pp.
483
489
).
New York
:
ACM
.
19
Ficici
,
S. G.
, &
Pollack
,
J. B.
(
2003
).
A game-theoretic memory mechanism for coevolution
. In
E.
Cantú-Paz
,
J. A.
Foster
,
K.
Deb
,
L. D.
Davis
, et al
(Eds.),
Genetic and Evolutionary Computation—GECCO 2003
(pp.
286
297
).
Berlin, Heidelberg
:
Springer Verlag
.
20
Floreano
,
D.
, &
Nolfi
,
S.
(
1997
).
God save the Red Queen! Competition in co-evolutionary robotics
. In
J. R.
Koza
,
K.
Deb
,
M.
Dorigo
,
D. B.
Fogel
,
M.
Garzon
,
H.
Iba
, &
R. L.
Rido
(Eds.),
Genetic Programming 1997: Proceedings of the Second Annual Conference
(pp.
398
406
).
San Francisco
:
Morgan Kaufmann
.
21
Floreano
,
D.
, &
Nolfi
,
S.
(
1997
).
Adaptive behavior in competing co-evolving species
. In
P.
Husbands
&
I.
Harvey
(Eds.),
Proceeding of the Fourth European Conference on Artificial Life
(pp.
378
387
).
Cambridge, MA
:
MIT Press
.
22
Floreano
,
D.
,
Nolfi
,
S.
, &
,
F
. (
1998
).
Competitive co-evolutionary robotics: From theory to practice
. In
R.
Pfeifer
,
B.
Blumberg
,
J.-A.
Meyer
, &
S. W.
Wilson
(Eds.),
From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior
(pp.
515
524
).
Cambridge, MA
:
MIT Press
.
23
Hansen
,
N.
, &
Ostermeier
,
A.
(
2001
).
Completely derandomized self-adaptation in evolution strategies
.
Evolutionary Computation
,
9
,
159
195
.
24
Harrington
,
K. I.
,
Freeman
,
J.
, &
Pollack
,
J.
(
2014
).
Coevolution in hide and seek: Camouflage and vision
. In
Proceedings of the 14th International Conference on the Synthesis and Simulation of Living Systems, ALIFE 2014
(pp.
25
32
).
Cambridge, MA
:
MIT Press
.
25
Haynes
,
T.
, &
Sen
,
S.
(
1996
).
Evolving behavioral strategies in predators and prey
. In
G.
Weiss
&
S.
Sen
(Eds.),
Adaptation and Learning in Multiagent Systems
(pp.
113
126
).
Berlin, Heidelberg
:
Springer Verlag
.
26
Heinrich
,
J.
, &
Silver
,
D.
(
2016
).
Deep reinforcement learning from self-play in imperfect-information games
.
27
Humphries
,
D. A.
, &
Driver
,
P. M.
(
1970
).
Protean defence by prey animals
.
Oecologia
,
5
,
285
302
.
28
Janssen
,
R.
,
Nolfi
,
S.
,
Haselager
,
P.
, &
Sprinkhuizen-Kuyper
,
I.
(
2016
).
Cyclic incrementality in competitive coevolution: Evolvability through pseudo-Baldwinian switching-genes
.
Artificial Life
,
22
,
319
352
.
29
Laumanns
,
M.
,
Thiele
,
L.
,
Deb
,
K.
, &
Zitzler
,
E.
(
2002
).
Combining convergence and diversity in evolutionary multiobjective optimization
.
Evolutionary Computation
,
10
,
263
282
.
30
Massera
,
G.
,
Ferrauto
,
T.
,
Gigliotta
,
O.
, &
Nolfi
,
S.
(
2014
).
Designing adaptive humanoid robots through the FARSA open-source framework
.
,
22
,
255
265
.
31
Miconi
,
T.
(
2008
).
Evolution and complexity: The double-edged sword
.
Artificial Life
,
14
,
325
344
.
32
Miconi
,
T.
(
2009
).
Why coevolution doesn't “work”: Superiority and progress in coevolution
. In
Proceedings of the 12th European Conference on Genetic Programming
(pp.
49
60
).
Berlin
:
Springer Verlag
.
33
Milano
,
N.
,
Carvalho
,
J. T.
, &
Nolfi
,
S.
(
2017
).
Moderate environmental variation promotes adaptation in artificial evolution
.
arXiv preprint
1
18
.
34
Milano
,
N.
,
Carvalho
,
J. T.
, &
Nolfi
,
S.
(
2017
).
Environmental variations promotes adaptation in artificial evolution
. In
2017 IEEE Symposium Series on Computational Intelligence
(pp.
1
7
).
New York
:
IEEE
.
35
Miller
,
G.
, &
Cliff
,
D.
(
1994
).
Protean behavior in dynamic games: Arguments for the co-evolution of pursuit-evasion tactics
. In
D.
Cliff
,
P.
Husbands
,
J. R.
Meyer
, &
S. W.
Wilson
(Eds.),
From Animals to Animats III: Proceedings of the Third International Conference on Simulation of Adaptive Behavior
.
Cambridge, MA
:
MIT Press
.
36
Mouret
,
J. B.
, &
Doncieux
,
S.
(
2009
).
Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity
. In
2009 IEEE Congress on Evolutionary Computation, CEC 2009
(pp.
1161
1168
).
New York
:
IEEE
.
37
Nolfi
,
S.
(
2012
).
Co-evolving predator and prey robots
.
,
20
,
10
15
.
38
Nolfi
,
S.
, &
Floreano
,
D.
(
1998
).
Co-evolving predator and prey robots: Do ‘arms-races’ arise in artificial evolution?
Artificial Life
,
4
,
311
335
.
39
Rechenberg
,
I.
(
1973
).
Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution
.
Stuttgart
:
Frommann‐Holzboog Verlag
.
40
Rosin
,
C. D.
, &
Belew
,
R. K.
(
1997
).
New methods for competitive coevolution
.
Evolutionary Computation
,
5
,
1
29
.
41
Samothrakis
,
S.
,
Lucas
,
S.
,
,
T. P.
, &
Robles
,
D.
(
2013
).
Coevolving game-playing agents: Measuring performance and intransitivities
.
IEEE Transactions on Evolutionary Computation
,
17
,
213
226
.
42
Sarle
,
W. S.
(
1995
).
Stopped training and other remedies for overfitting
. In
Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics
(pp.
352
360
).
43
Shannon
,
C. E.
(
1948
).
A mathematical theory of communication
.
Bell System Technical Journal
,
27
,
379
423
.
44
Silver
,
D.
,
Huang
,
A.
,
,
C. J.
, et al
(
2016
).
Mastering the game of Go with deep neural networks and tree search
.
Nature
,
529
,
484
489
.
45
Simione
,
L.
, &
Nolfi
,
S.
(
2018
).
Achieving long-term progress in competitive co-evolution
. In
2017 IEEE Symposium Series on Computational Intelligence, SSCI 2017—Proceedings
(pp.
1
8
).
New York
:
IEEE
.
46
Sims
,
K.
(
1994
).
Evolving virtual creatures
. In
Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1994
(pp.
15
22
).
New York
:
Association for Computing Machinery
.
47
Sinervo
,
B.
, &
Lively
,
C. M.
(
1996
).
The rock-paper-scissors game and the evolution of alternative male strategies
.
Nature
,
380
,
240
243
.
48
Sperati
,
V.
,
Trianni
,
V.
, &
Nolfi
,
S.
(
2008
).
Evolving coordinated group behaviours through maximisation of mean mutual information
.
Swarm Intelligence
,
2
,
73
95
.
49
Srivastava
,
N.
,
Hinton
,
G.
,
Krizhevsky
,
A.
, et al
(
2014
).
Dropout: A simple way to prevent neural networks from overfitting
.
Journal of Machine Learning Research
,
15
,
1929
1958
.
50
Stanley
,
K. O.
, &
Miikkulainen
,
R.
(
2002
).
The dominance tournament method of monitoring progress in coevolution
. In
GECCO 2002: Proceedings of Bird of a Feather Workshops: Genetic and Evolutionary Computation Conference
(pp.
242
248
).
51
Van Valen
,
L.
(
1973
).
A new evolutionary theory
.
Evolutionary Theory
1
,
1
30
.
52
Wagner
,
A.
(
2011
).
The origins of evolutionary innovations: A theory of transformative change in living systems
.
Oxford
:
Oxford University Press
.
53
Watson
,
R. A.
, &
Pollack
,
J. B.
(
2001
).
Coevolutionary dynamics in a minimal substrate
. In
L. A.
Spector
,
E. D.
Goodman
,
A.
Wu
,
W. B.
Langdon
, &
H. M.
Voigt
(Eds.),
GECCO '01: Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation
(pp.
702
709
).
San Francisco
:
Morgan Kaufmann
.
54
Wiegand
,
R. P.
,
Liles
,
W. C.
, &
De Jong
,
K. A.
(
2002
).
Analyzing cooperative coevolution with evolutionary game theory
. In
D. B.
Fogel
,
M. A.
El-Sharkawi
,
X.
Yao
, et al
(Eds.),
Proceedings of the 2002 Congress on Evolutionary Computation, CEC 2002
(pp.
1600
1605
).
New York
:
IEEE Press
.
55
Wolfram
,
S.
(
1984
).
Cellular automata as models of complexity
.
Nature
,
311
.
419
424
.

### Appendix

#### A.1 The Robots and the Environment

The robots were simulated MarxBot [4], that is, circular robots with a diameter of 17 cm equipped with a differential drive motion system, a ring of 24 color LEDs, 24 infrared sensors, 4 ground sensors, an omnidirectional camera, and a traction sensor. Predator and prey robots had their LEDs turned on in red and green respectively. The robots were situated in a 3 × 3 m square arena surrounded by black walls. The ground was colored in grayscale with a level of darkness that varies linearly from full white to full black from the center to the periphery of the arena (see Figure 10, left panel).

Figure 10.

Left panel: the robots and the environment in simulation. The red and green robots correspond to the predator and prey robot, respectively. Right panel: the neural network controller. Black arrows indicate all-to-all feedforward connections between two layers. All-to-all recurrent connections are present between the hidden neurons.

Figure 10.

Left panel: the robots and the environment in simulation. The red and green robots correspond to the predator and prey robot, respectively. Right panel: the neural network controller. Black arrows indicate all-to-all feedforward connections between two layers. All-to-all recurrent connections are present between the hidden neurons.

The maximum speed (ms) that the wheels of the differential drive motion system could assume was 10 and 8.5 rad/s for the prey and the predators, respectively, in the case of the experiments reported in Section 3, and 4 and 10 rad/s for the prey and the predators, in the case of the experiment reported in Section 5. The relative speed of the two robots was tuned to balance approximately the overall complexity of the problems faced by predators and prey, that is, to prevent the average fitness of either species from reaching the maximum or the minimum value.

The experiments were run in simulation by using the FARSA open-software tool, which includes an accurate simulator of the robots and of the environment [30]. FARSA has been used to successfully transfer results obtained in simulation to hardware in similar experimental settings [2, 48]. The update frequency of state of the environment, of the robots, of the robots' sensors and motors, and of the robots' neural network was 10 Hz.

The experiments reported in this paper can be replicated by downloading and installing FARSA, which is available from https://sourceforge.net/projects/farsa/, and by downloading and installing the experimental plugin available from http://laral.istc.cnr.it/res/predprey2019/Simione_Nolfi_2019_plugin.zip. Videos of the evolved behaviors are available from http://laral.istc.cnr.it/res/predprey2019/.

#### A.2 The Neural Network Controller

Each robot is provided with a neural network controller that includes 25 sensory neurons, 10 internal neurons with recurrent connections, and 2 motor neurons (Figure 10, right panel).

The sensory layer includes 8 sensory neurons that encode the average activation state of eight groups of three adjacent infrared sensors, 8 neurons that encode the fraction of green or red light perceived in the eight 45° sectors of the visual field of the camera, 1 neuron that encodes the average amount of green or red light detected in the entire visual field of the camera, 4 neurons that encode the state of the four ground sensors, 1 neuron that encodes the average activation of the four ground sensors, 1 neuron that encodes whether the robot collides with an obstacle (i.e., whether the traction force detected by the traction sensor exceeds a threshold), 1 clock neuron that encodes the time passed since the beginning of the trial, and 1 simulated fatigue neuron that encodes the tiredness of the robot (i.e., the amount of energy recently spent by the robot; see Equation 1 below). The state of the sensory neurons was normalized in the range [0.0, 1.0].

The motor layer included two motor neurons (tm and rm) that encoded the desired translational and rotational motion of the robot in the range [0.0, 1.0].

The maximum speed at each time step (mst) is modulated on the basis of the tiredness of the robot at time t (tirt), which is the energy spent by the robot to move during the last 20 s. They are calculated on the basis of the following equations:
$tirt=∑t=−2000rst/2002$
(1)
$mst=ms1−tirt$
(2)
where ms is the maximum speed of the wheels described above and rst is the absolute speed of the two wheels at time t normalized in the range [0.0, 1.0].
The desired rotational speeds of the left and right wheels (rsl and rsr) at time t are calculated on the basis of the following equations:
$rslt=mst×rmt×ftmtiftmt<0.5mst×rmtotherwise$
(3)
$rsrt=mst×rmt×ftmtiftmt>0.5mst×rmtotherwise$
(4)
$fx=−23x−0.52+1$
(5)

The dependence of the maximum speed at which predators and prey can move on the tiredness (i.e., on the amount of energy recently spent by the robot) encourages evolving robots to avoid wasting energy.

#### A.3 Evolving Parameters and Fitness Calculation

The connection weights and the biases of the neural network controller of each robot are encoded in a vector of 442 floating-point values (genotype). The genotypes of the two populations at generation 0 are filled with values generated with a random uniform distribution in the range [−5.0, 5.0].

Predators were evolved for the ability to capture prey (i.e., to reach and physically touch the prey) as fast as possible, and prey were evolved for the ability to avoid being captured as long as possible. Each robot is evaluated against n competitors, one competitor at a time, during n corresponding trials. At the beginning of each trial the predator and prey robots were placed on the middle left and the middle right side of the environment, facing toward each other, and were allowed to move for 1,000 steps, corresponding to 100 s (Figure 10, left). The fitness of the predator is the fraction of that time required by the predator to capture the prey. The fitness of a prey is the reciprocal of the fraction of that time required by the predator to capture the prey. The total fitness is the average fitness obtained during the episodes in which the agents are evaluated against different opponents.