## Abstract

We examine the effect of cooperative and competitive interactions on the evolution of complex strategies in a prediction game. We extend previous work to the domain of noisy games, defining a new organism and mutation model, and an accompanying novel complexity metric. We find that a mix of cooperation and competition is the most effective in driving complexity growth, confirming prior results. We also compare our complexity metric with simpler metrics such as raw strategy size, and demonstrate the effectiveness of our metric in distinguishing true complexity from mere genetic bloat.

## 1 Introduction

An important question in the field of artificial life is to identify the conditions under which an evolutionary system is able to drive continuous growth in the complexity of evolved agents [12]. A variety of possible answers have been conjectured in both the artificial life [10, 11] and evolutionary biology literatures [2, 3].

In particular, we are interested here in exploring hypotheses for the evolution of complexity via coevolutionary interaction. Dawkins identifies the phenomenon of an *evolutionary arms race* in which competing species drive one another to ever higher levels of complexity. This model of competitive coevolution is popular in the artificial life literature, either in the form of explicit competitive interaction, or implicit competition for reproductive success via fitness-proportional selection. However, it has been observed that purely competitive interactions are vulnerable to a variety of pathological dynamics that stunt the continued development of complexity. For example, [13] displays a process of evolutionary disengagement, in which one competitor falls too far behind the other, so that neither receives a meaningful selection signal, and evolutionary progress gives way to genetic drift.

On the other hand, cooperative coevolution, in which two species share a common fitness objective, has also been studied [7, 8]. Here, members of different species must coordinate to jointly perform a task, and an individual's fitness is determined not only by its own strategy, but also by how well it coordinates with members of other species. Thus, as new strategies emerge in one species, members of other species must adapt to keep pace. As in the case of purely competitive coevolution, this model is also vulnerable to pathological dynamics. For example, species may settle into a mediocre equilibrium in which low-complexity strategies collude to exploit trivial or degenerate solutions.

In biological evolution, species interact in various ways, both competitive and cooperative, with many other species. Thus, neither of the above models is sufficient to describe the richness of the fitness landscape that drives the observed growth in complexity over the course of evolutionary history. Although we cannot hope to simulate all aspects of a real biological ecosystem, we can examine model ecosystems in which species experience a small but varied set of interactions, allowing us to take a small step in the direction of understanding how the mix of competitive and cooperative dynamics may interact in nature.

In contrast to the above hypotheses, Gould [3] conjectures that the apparent rise in complexity over the course of natural evolution is not due to any sort of evolutionary pressure towards increased complexity, but is merely the result of a random walk through the space of phenotypes, which is naturally bounded below at zero (because there cannot be a negatively complex organism). A random walk bounded from below will, over time, reach ever higher peaks. This view will serve as our control hypothesis. In order to conclude that a particular dynamic truly increases complexity, we should require that it do so at a rate higher than that which occurs due to genetic drift alone.

In this work, we extend the work presented in [5], which focuses on understanding the role of cooperative and competitive interactions in the development of complexity. In particular, we adapt the methodology of that prior work to the domain of noisy games, and analyze whether the previous findings are sustained in this new domain. This allows us to identify which aspects of those results may be particular to the original model, and which are more robust.

Our new model is inspired by the work of Lindgren [4], in which agents play the iterated prisoner's dilemma, but have a small chance of random noise changing their desired action. Agents must therefore learn to cope with this additional noise, potentially making coordination more difficult. The addition of noise also aids in computing the average score agents will receive over an infinite number of rounds.

Following Lindgren, we represent our agents as a list of actions to be taken for each possible history of a certain length. Agents can evolve to use longer or shorter histories, and can mutate their actions for each particular history independently. By interpreting these strategies as decision trees, we develop an information-theoretic metric of agent complexity.

## 2 Previous Work

We summarize here our model originally presented in [5]. At a high level, our goal was to produce a minimal simulation in which we could compare the growth of complexity that resulted from a variety of competitive and cooperative evolutionary systems. To this end, we defined a linguistic prediction game that could be either cooperative or competitive in nature, a set of ecosystems that defined the interactions between sets of coevolving species, an organism model that defined the representation of an individual's strategy and the mutations that may be performed, and a complexity metric, which allowed us to quantitatively measure the complexity of an individual organism.

### 2.1 Linguistic Prediction Game

The linguistic prediction game was a two-player game played over a sequence of rounds. At each round, each player produced an output symbol, either 0 or 1. Players could have the objective either of matching the output of the other player, or of mismatching it. If the players shared an objective, the game was cooperative, and if they had opposing objectives, the game was competitive.

### 2.2 Organism Model

In [5], agents were modeled as finite state automata that deterministically produced output symbols according to the history of symbols produced by their opponent. Mutations included the creation or deletion of nodes in the automata, redirection of transition links, or changing the output symbol of a node. Complexity was calculated by performing finite state automata minimization and counting the nodes in the minimized automata.

### 2.3 Ecosystems

Our previous work considered the set of six ecosystems shown in Figure 1. Each species within an ecosystem represented a population of individual agents. Each agent accumulated fitness by interacting with each member of the other species, according to the interactions defined by the ecosystem. We will consider the same set of ecosystems in this work, with the exception of the 3-Comp ecosystem, which was unsuccessful in the previous work.

### 2.4 Previous Results

In summary, [5] showed that ecosystems with a mix of competitive and cooperative interactions were much more effective at driving complexity growth, and were able to sustain a positive trajectory of complexity throughout the experiment. On the other hand, the purely competitive and purely cooperative systems produced initial growth, but quickly slowed and converged to lower complexity values than the mixed systems reached, and showed no further growth. All systems were able to induce more complexity growth than the control ecosystem, in which the population was subject only to genetic drift and no fitness selection.

## 3 Noisy Prediction Game

The noisy prediction game is the same as the standard prediction game, with the addition of a stochastic element. Whenever an agent attempts to make a move (i.e., play 0 or 1), there is a small but nonzero chance, *n*, that noise will interfere and the agent will play the opposite. This has a few implications for successful strategies. For example, consider a pair of agents that share the cooperative goal of matching their outputs. In the standard game, the strategy “start with 0, and then repeat whatever the opponent does” will receive the maximum score when playing against itself. However, in the noisy game, eventually chance will cause one agent to accidentally play 1 for a single round before returning to 0. This will cause the other agent to reply with a 1, and so forth, locking the two into a series of failures to match. This will continue until chance again intervenes, either causing both to play 0 on the same round, or both to play 1. Such a system will only earn $12$ point on average.^{1} By contrast, if one of the two agents were to simply play “always 0,” the system would be significantly more robust to this noise, and would earn nearly a full point on average.

## 4 Representing Strategies

As in Lindgren [4], each organism consists of a history length *L*_{h} that defines the size of the history window on which it bases its actions, as well as an array of 2^{Lh} real numbers 0 ≤ *x*_{i} ≤ 1 that define the probability of playing 1 (0 is played with probability 1 − *x*_{i}) in response to each possible history. A history sequence consists of the moves made by both the player and the opponent. A history sequence of length 1 is simply the previous move made by the opponent, a history sequence of length 2 is the previous move made by the player and the previous move made by the opponent, and so forth. For example, a strategy with *L*_{h} = 1 would have two values, *x*_{0} and *x*_{1}, where *x*_{0} indicates the probability of playing 1 after the opponent plays 0, and *x*_{1} is the probability of playing 1 after the opponent plays 1. The strategy “copy the opponent's last move” would be represented as *L*_{h} = 1, *x*_{0} = 0, *x*_{1} = 1. A more detailed example is shown in Figure 2. In contrast to Lindgren's model, we allow continuous strategies, in which *x*_{i} can be any value in the range [0, 1].

## 5 Mutation Model

Our mutation model consists of three possible mutations: a duplicate operation, which increases the value of *L*_{h} by one; a split operation, which decreases the value of *L*_{h} by one; and a point mutation, which simply replaces the value of one particular *x*_{i} with a new random value. The duplicate mutation doubles the length of the genome, and fills both halves with a copy of the original. For example, the genome (0.25, 0.75) will become (0.25, 0.75, 0.25, 0.75). The structure of the genome means that this mutation is neutral—a new bit of history information has been added, but both possible values of that bit result in the same original strategy. However, the mutation does open up new mutational avenues, by allowing future point mutations to differentiate the two. The split does the opposite: It randomly selects one of the two halves of the genome to keep, and drops the other. The effect is that one fewer bit of history information is used. The point mutation is straightforward.

As we will see in the next section, scoring this game is computationally expensive, especially for larger strategies. To combat this, Lindgren used a low mutation rate (on the order of 10^{−5}), which means that new forms are rarely introduced into the population, and the scores obtained by two interacting phenotypes can be cached and reused over many generations. With a high mutation rate, we would need to constantly recompute scores from scratch. We will similarly apply a low mutation rate, which will also allow a comparison against the higher rate used in our previous work.

## 6 Scoring the Infinite Game

The system of two players playing a noisy game using finite-length histories forms a Markov process, in which each Markov state corresponds to one of the possible histories. The transition probabilities for this process can be calculated using the action probabilities from each player's strategy along with the noise parameter *n*. In order to calculate the expected score of each player, we calculate the stationary distribution of this Markov process. Given Markov matrix *M*, we want to solve for the stationary vector *x* in the equation *M* × *x* = *x*, subject to the constraints that 0 ≤ *x*_{i} ≤ 1 and Σ_{i}*x*_{i} = 1. This produces a system of sparse linear equations that is readily solvable with standard methods. Once the stationary vector has been calculated, we use the game matrix to compute the scores received for each possible history, and compute a weighted average according to the probabilities of each history.

The specific mathematical techniques necessary to score this sort of game are not well documented in the artificial life literature. Lindgren [4] devotes only a brief paragraph to the method of computing scores, and there are few other works that employ his techniques. We present a full description of the relevant methods in the appendix.

## 7 Measuring Complexity

Previous work [4] has evaluated the complexity of strategies by measuring their history length. However, this method fails to capture the differences between a strategy that defines specific actions for every possible unique history of a given length, and a strategy that depends only on a small portion of its history length. For example, consider a strategy *X* with *L*_{h} = 10 that has 1024 entries. Suppose *x*_{i} = 0 when *i* is even, and *x*_{i} = 1 when *i* is odd. This strategy, despite appearing to depend on 10 bits of history, actually depends only on the most recent move of the opponent (as this move will occupy the low-order bit of *i*). A possible defense against this sort of degenerate case would be to find the minimal equivalent strategy—the strategy with the shortest history length that performs identical actions. While this method would accurately identify the above strategy as very simple, it would fail if we were to augment the strategy as follows. Instead of *X*_{i} = 0, let *X*_{i} = *ϵ*_{i} when *i* is even, and instead of *X*_{i} = 1, let *X*_{i} = 1 − *ϵ*_{i} when *i* is odd, where *ϵ*_{i} are very small random values. This new strategy is very nearly identical to our original, but is no longer reducible, because each *X*_{i} value is slightly different from the next. We would like to have a metric that identifies this modified strategy as almost as simple as the original.

Instead of viewing our strategies as bit strings, we can frame them as decision trees, in which each decision node checks the value of a particular history bit, and leaf nodes contain the action probability for the history sequence that they represent. For example, consider the strategy (0.2, 0.7, 0.5, 0.1). This strategy can be represented as a decision tree with three nodes as shown in Figure 3. Note that this is not the only possible representation—we can change the order of our decision nodes to create alternate but equivalent decision trees.

*N*is the set of possible histories at a node,

*V*is the set of all possible history values (0 or 1 in our case),

*n*is a particular history string,

*n*

_{h}is the history value of

*n*at position

*h*, and

*H*(

*N*) is the entropy of our output over

*N*. We will initially calculate entropy under the assumption that all possible histories are equally likely.

For example, in the tree shown in Figure 3, at the root node (call it *N*_{0}), we know no history bits, and so we imagine a chance of playing 1 of 0.375 (the average over the four possible histories), which gives an initial entropy of *H*(*N*_{0}) ≈ 0.954 bits. Once we learn the value of *h*[0], we may either have a chance of 0.45 ($0.2+0.72$ in *N*_{1}), or 0.3 ($0.5+0.12$ in *N*_{2}) of playing 1, giving a total entropy of *H* = $HN1+HN22$ ≈ 0.937 bits. Thus, learning the value of *h*[0] has only gained us 0.954 − 0.937 ≈ 0.017 bits. In other words, this strategy makes little use of the information contained in *h*[0].

What about the value of *h*[1]? We can similarly calculate the information gain for *N*_{1} and *N*_{2}, yielding *IG*(*N*_{1}) ≈ 0.191 and *IG*(*N*_{2}) ≈ 0.147 bits. We therefore see that this strategy makes more use of the information in *h*[1] than of the information in *h*[0].

We now look to combine these information gain values into an overall complexity score for this strategy. Intuitively, a strategy with low information gain is not complex, because it acts in the same way regardless of the history it observes, and a strategy with high information gain is more complex, because it adapts its behavior depending on the history. A tempting metric would be to simply calculate the total information gain over the entire decision tree. In the above example, this would yield a total gain of 0.186 bits. Consider however the two decision trees in Figure 4. Both trees have a total information gain of 1 bit, but the left tree makes full use of both *h*[0] and *h*[1], while the right tree only makes use of *h*[0]. In fact, none of our strategies can possibly have an information gain higher than the right tree, because the entropy of the root can never be more than 1 bit. This is clearly unsatisfactory.

To address this, we turn to the concept of *certificate complexity* from the study of Boolean functions [1]. The certificate complexity of a Boolean decision tree is the maximum number of bits one might need to know in order to determine the output of that decision tree. Formally, we consider a Boolean decision tree as a function, *f*, and an input to that function, *x*. A certificate for *x*, *c*_{x} is a subset of the bits in *x* that is sufficient to determine the value of *f*(*x*). The certificate complexity of *f* is then *C*(*f*) = max_{x} min_{cx}*length*(*c*_{x}). Certificate complexity does a good job of separating the example trees in Figure 4. The left tree requires two history bits to fully determine its output, while the right tree requires only one.

Certificate complexity is concerned only with the maximum certificate length across all possible histories, so we can construct a tree for which only a few possible histories require long certificates while the rest require only short certificates, and such a tree will have the same certificate complexity as one for which all histories require long certificates. For our analysis, this is an undesirable property—we would like to capture the total complexity of a strategy, not just its single most complex component. Further, certificate complexity is applicable to Boolean functions with deterministic outputs of 0 or 1, while our strategies allow leaves to be labeled with arbitrary probabilities.

To address these issues, we propose two extensions. First, instead of considering the length of a particular certificate history, we sum the total information gain across that history. In this way, we can accommodate probabilistic outputs. Histories only contribute complexity to the extent that they inform the action chosen by the strategy. A large decision tree in which the leaves contain probabilities between 0.49 and 0.51 might have long certificates (in that all history bits are necessary to determine the exact action probability), but those certificates barely contribute any additional information—we're nearly as uncertain about the outcome as we were before learning any history bits. We will correctly assign such a tree low complexity despite long certificates.

Secondly, instead of considering only the longest certificate, we will sum the individual information gain values of every node in the tree. It is important to note that this is not the same as calculating the total information gain of the tree itself, because we are not discounting the information gained at deeper nodes. For example, the left tree in Figure 4 has a total information gain of 1 bit, but the sum of its individual nodes' information gains will be 2 bits. *N*_{0} gains 0 bits, and *N*_{1} and *N*_{2} gain 1 bit each when we consider them in isolation. This metric will result in higher scores for trees in which information gain occurs near the leaves.

We also need to address the question of the equivalent of finding a minimal certificate. If we naively construct a decision tree, it may turn out that we put the history bits that form a certificate near the leaves, while putting meaningless history bits near the root. In this case, we would inflate the overall complexity score by placing the information-gaining history bits at a level with more nodes. In fact, we could add ever more meaningless bits of history to the root to make the complexity score as large as we want. To avoid this, we will define the canonical decision tree for a particular strategy as the one with the lowest sum of individual information gains. Such a decision tree may include any ordering of history bits, and may even include different orderings across different subtrees.

## 8 Experiments

As in our previous work, we will consider a variety of ecosystem models, which contain both cooperative and competitive interactions between species. Our aim will be to compare and contrast the results in this new model with the results obtained previously.

We will examine the same core set of ecosystem models described in Figure 1, and use the all-versus-all interaction graph as shown in [5]. For each experiment, we will report both the median history length of each population's genomes and its information-gain complexity score as described above. Both values will be averaged over 20 runs. We will use a population size of 100 for each species, and will simulate for 10,000 generations.

Instead of replacing entire populations at once as in the previous experiments, we will instead employ a Moran^{2} model [6]. Such a model proceeds via discrete time steps, in which one individual is selected for replacement uniformly at random, and another is selected to reproduce using fitness-proportional selection. In other words, one's fitness has no bearing on the likelihood of death, only on the likelihood of being a parent to the new offspring. After each replacement event, fitnesses are recalculated for the distribution in the population. In simulations with multiple populations, all populations are updated simultaneously.

We select this model both to offer contrast to the model used previously and for computational reasons. With a low mutation rate, it will be likely that most new organisms will be identical to their parent, which will allow us to cache the game scores obtained by the parent's strategy and avoid repeating a potentially costly computation. This process will also potentially allow a particularly fit ancestral line to produce many successive offspring over the course of the simulation, allowing it to build upon more mutation opportunities than might be afforded in a generation-based simulation. To simulate 10,000 generations, we will use the convention that a generation is equivalent to a number of time steps equal to the population size. Thus, we will simulate one million death/birth events for each species.

We initialize our populations to be an even mix of the four basic strategies: [0, 0], [0, 1], [1, 0], and [1, 1]. We choose this initialization over randomly generating 100 strategies for several reasons. First, it ensures we can make an accurate comparison between runs. No run can begin with a lucky initialization. Secondly, in the mixed strategy case, generating 100 disparate strategies would incur significant computational cost to evaluate all 10,000 interactions.

### 8.1 Control

The control ecosystem consists of a single population with no interactions, so that all members of the population have exactly the same fitness. Therefore, any change in the population over time is merely the result of genetic drift. As before, this serves as a baseline against which other ecosystems can be measured. The level of complexity acquired by the control population is the amount we should expect any other ecosystem to reach if it were the case that interactions had no bearing on the development of complexity.

Figure 5 (left) shows the average history length over time for the control ecosystem using mixed strategies. An individual run will behave roughly as a random walk that is bounded below, meaning that on average, it will slowly reach higher and higher values as time goes on. This is exactly what we observe, as average sizes increase at a moderate pace early on, but slow to near constancy by the end of the simulation.

Figure 5 (right) shows the average values of our information complexity metric for the control ecosystem. Notably, this shows an initial decline in complexity, as the random walk moves away from the strategies [0, 1] and [1, 0], which have an information complexity of 1, towards random mixed strategies of history length 1, the majority of which will have an information complexity well below 1.

### 8.2 Two-Species Competitive

In the two-species competitive ecosystem, we have two species A and B, such that A receives fitness for matching B's output, and B receives fitness for mismatching A's output.

Figure 6 (left) shows the growth in median history length in this ecosystem. Perhaps surprisingly, this ecosystem produces significantly less growth in history size than the control. Here, the competitive dynamics do not induce an evolutionary arms race towards complexity, but instead drive the populations into a dynamic stable state of cycle chasing in which rapid evolutionary turnover occurs but no objective progress is made.

Figure 6 (right) shows the average information complexity scores for this ecosystem. At the start of the simulation, there is a very rapid increase, which occurs as the strategies [0, 1] and [1, 0] quickly eliminate [0, 0] and [0, 1] from the population. To see why this occurs, consider a member of population A called *a*_{1}, which wants to match the outputs of its opponents in population B. If *a*_{1} plays [0, 0] (always 0), it will only be successful against opponents in B that also play always 0 or that play [0, 1] (copy the opponent), and will lose against [1, 0] (do the opposite of what the opponent does) and [1, 1] (always 1). On the other hand, if *a*_{1} plays [0, 1] (copy the opponent), it will be successful against [0, 0], [1, 1], and [0, 1], and will tie against [1, 0]. The reverse holds for population B, where [1, 0] will be dominant. Thus, after a few iterations, the populations converge to [0, 1] for A and [1, 0] for B.

Eventually, one species or the other will evolve a longer history-length strategy that is able to exploit the opponent. For instance, the strategy [0, 1, 0, 0] allows a member of population A to win against the strategy [1, 0]. The reason this results in a sharp decrease in complexity is that the most evolutionarily convenient response from B is to mutate to [0.5, 0.5]. This strategy requires only two point mutations to reach from [1, 0]; and both intermediate forms [0.5, 0] and [0, 0.5] perform better than [1, 0]. The alternative path, the evolution of an even more complex response, instead requires at least one duplication mutation and several point mutations. Further, the [0.5, 0.5] strategy is unexploitable in the sense that it will tie against any possible opponent.

In some cases, a partial arms race does occur—in some simulations we observe sharp increases in the complexity of both species. However, this increase is often short-lived. Whenever one population falls behind, the pressure is to revert to [0.5, 0.5] or other similarly random and therefore low-information-complexity strategies. The availability of this evolutionary escape hatch causes any growth in complexity to be lost, as the populations revert to simplistic behavior.

Furthermore, short-history-length strategies offer greater adaptability than longer-history-length ones. An organism with strategy [0, 0] can switch to playing [1, 1] in as few as two reproduction events, while an organism with strategy [0, 0, 0, 0, 0, 0, 0, 0], despite making identical moves, will require many more mutations to switch to playing always 1. In cases where the populations descend into cycle chasing, the Red Queen effect will favor the shorter, more adaptable strategy, as populations race to keep up with one another.

The contrast between this result and the performance of the two-species competitive ecosystem in the previous work illustrates an important fact about complexity in competitive coevolution. Many evolutionary games contain mediocre equilibria, such as is induced by the existence of the [0.5, 0.5] strategy. In the presence of such equilibria, it is difficult for an arms race to sustain itself, as disengagement will result not merely in temporary genetic drift, but in reversion to simplicity.

Additionally, the choice of strategy representation and mutation model in our previous work allowed organisms to maintain a reservoir of unused genetic material, which would allow evolved substructures to persist through evolutionary backslides. In our present work's model, simplification irrecoverably deletes the lost genes, which makes backslides particularly devastating to the development of complexity.

### 8.3 Two-Species Cooperative

The two-species cooperative ecosystem is identical to the two-species competitive ecosystem, except each species receives fitness for matching the other's output, and thus their goals are aligned.

Figure 7 (left) shows the average sizes for this ecosystem. We observe a trajectory very similar to the control ecosystem, with both species ending up with only slightly larger average history lengths by the end of the simulation (4.32 for the control, compared to 4.70 and 4.64 for the cooperative ecosystem). However, the cooperative ecosystem does not seem to show the slowing of growth observed in the later stages of the control simulation, with both species maintaining a more or less constant-slope trajectory.

While the history-length metric may suggest the conclusion that the cooperative system is not significantly different from the control, our information complexity metric tells a different story, as seen in Figure 7 (right). Here we see that the evolved strategies are significantly more complex than those in the control simulation, despite their similar average history lengths. Whereas the control simulation shows an average information complexity of about 1.5 bits by the 10,000th generation, here we observe final information complexity values of about 2.4 and 2.3 for the two cooperators, an increase of more than 50%.

Unlike the competitive model, there does not appear to be significant pressure towards a simple strategy. Although [0, 0] and [1, 1] are strong strategies, they rely heavily on the presence of the same strategy in the other population, and are not robust to mutations in the other population, which allows room for more complex strategies to arise, just as seen in our previous work.

Further, the cooperative system produces much less pressure for constant change than the competitive system. Whereas success in the competitive simulation required organisms to be maximally adaptable, and therefore encouraged small genomes, the cooperative simulation is more amenable to slow change. Both populations are incentivized towards sticking with the current convention, allowing ample time for precise adaptations in larger strategies. In this setting, a mutation in a section of the genome that is referenced rarely can still produce sufficient fitness advantage to become fixed in the population. In the competitive setting, the advantage conferred by such a mutation would likely disappear long before it could establish itself.

### 8.4 Three-Species Mixed

In the three-species mixed ecosystem, we have both a cooperative and a competitive interaction. Species A competes with species B, but also cooperates with species C. In our previous work, we found that this ecosystem was able to sustain significant complexity growth. However, in this new model, competitive interactions have been found to significantly reduce complexity growth, due to the presence of very simple equilibria.

Figure 8 (left) shows the growth in average history length for this ecosystem. The growth is significantly higher than that observed in the two-species competitive case, including that of species B, which is in a purely competitive interaction. This demonstrates that even in the presence of a trivial equilibrium in the competitive game, the addition of a cooperative relationship is able to boost complexity growth.

On the other hand, the total growth in size is roughly equal to that found in the control experiment—a weaker result than was observed in our previous work's experiments. This suggests that a cooperative interaction by itself is not a panacea: If the competitive evolution strongly favors simplicity, we may not find open-endedness with the addition of cooperation.

Figure 8 (right) shows the information complexity scores for this ecosystem. Here we find that species A and C show complexity growth significantly above the control ecosystem (final values of about 2.1 as compared to about 1.5 for the control), despite the similar overall history length. Furthermore, these species show a clear upward trend at the end of the experiment, which was not the case for the control simulation. Again, our information complexity metric has revealed trends that are not apparent from looking only at history length. Species B, by comparison, does not exceed the information complexity score observed in the control experiment, and its growth slows by the end of the 10,000 generations.

As in the purely competitive ecosystem, we observe a very sharp trajectory for all populations in the initial generations. This occurs because species A and B very quickly adopt the [0, 1] and [1, 0] strategies seen in the competitive setting. Species C is then able to adopt [0, 0] or [1, 1], very low-complexity strategies, which will be matched well by the strategy of species A. Again following the trajectory of the purely competitive setting, species A and B then abandon these pure strategies in favor of more random, lower-complexity strategies. Unlike the purely competitive setting, however, these random strategies do not constitute an equilibrium, thanks to the presence of the cooperative interaction, allowing both species to escape to higher-complexity strategies.

### 8.5 Four-Species Mixed

In the four-species mixed ecosystem, we have two competing species, A and B, and each has a cooperative partner, C or D respectively.

Figure 9 (left) shows the average history lengths for this ecosystem. We see that this ecosystem produces history lengths larger than any other we have examined so far, with a maximum value of 5.5 (as compared to 4.3 for the control simulation). Interestingly, species C shows a significantly lower average size than the other three, and in particular a lower size than species D, which is apparently identical. In fact, there is one slight difference between the two. Species A tries to match species B in their competitive relationship, and also tries to match species C in their cooperative relationship. On the other hand, species B tries to mismatch species A but tries to match its cooperative partner species D. The fact that species A always has the same objective (to match the other player) appears to allow its cooperative partner to succeed using a simpler strategy than is needed for species B's cooperative partner.

Figure 9 (right) shows the average information complexity for this ecosystem. As is the case for the history lengths, this ecosystem produces significantly more information complexity than any of our other experiments, with the exception of species C. As before, species C lags well below the others. In the remaining three cases, a clear positive trajectory is maintained through the end of the experiment, indicating continued growth.

## 9 Discussion

In general, our experiments confirm our results from our previous work in this new setting. The ecosystems that mix cooperation and competition are significantly more effective at inducing complexity growth than the purely cooperative or competitive systems, as well as more effective than the control experiment. This is despite the fact that we have changed the genotype model, the mutation rate, the generation model, and the metric of complexity. This suggests that our previous results are transferable and not merely a product of the specific settings of the original experiments.

There are, of course, some differences, most notably the lack of complexity evolved in the purely competitive ecosystem in this new experiment. As previously discussed, this is a result of the low-complexity Nash equilibrium present in this particular domain. Despite this, the inclusion of a cooperative interaction is sufficient to restore above-control complexity growth, demonstrating the power of our mixed ecosystem model.

Our novel complexity metric has proven effective at elucidating complexity growth that is not apparent from looking only at the length of the genomes of organisms, as had been done in [4]. This highlights the value of constructing domain-specific metrics beyond genome length.

We have focused on analyzing the *complexity* of our artificial organisms rather than their *fitness*. This is because the fitness trajectories that result from these experiments tend to hide the true dynamics of the system.

For example, cooperative ecosystems show nearly flat fitness values over the course of the experiment, even as complexity increases. In other words, species are not evolving to be significantly better at coordination, but rather to be better at coordination with the specific strategies extant in the other population. Even very simple strategies (such as “always output 1”) are capable of very successful cooperation given the right partners—such a simple strategy only becomes unfit when attempting to cooperate with a more diverse set of partners.

Similarly, competitive ecosystems tend to show only brief spikes and dips in an otherwise flat fitness trajectory. These occur when one species evolves a more successful strategy, but equilibrium is quickly restored, either due to the development of an effective counter by their opponent, or due to regression of the opponent to highly random strategies. In general, both the interaction of very simplistic and that of very complex strategies result in indistinguishable fitness levels.

## 10 Future Directions

There are several avenues for further exploration in these experiments. As previously discussed, calculating fitness values for the infinite game quickly becomes very computationally expensive as the strategy history length grows, which has limited both the maximum genome size and the total experiment length we have been able to simulate. Given additional computational resources, these experiments could be enhanced by increasing the cap on history length, and by allowing for much longer simulations. While most experimental runs stayed below the size-10 limit for much of the simulation on average, many individual organisms reached the limit, and at times entire populations converged to history-length-10 strategies, especially in the more successful ecosystems.

We have also conducted initial exploratory experiments aimed at an alternative approach to addressing the computational expense of larger genome sizes. By biasing the mutation rate towards simplifying mutations (in this case, the split operator), we can add additional pressure to the simulation away from longer strategies. The magnitude of this pressure can be modulated by adjusting the size of the bias towards simplifying mutations. By slowly increasing this bias until simulations no longer produce complexity growth, we can estimate the overall pressure towards complexity induced by each ecosystem. Those ecosystems that are able to sustain complexity growth in the face of harsher biases against it are the ones that will produce the most growth in the neutral setting. Early results demonstrate the effectiveness of this technique in analyzing the performance of ecosystems with growth close to that of the control, but a full exploration is left for future work.

Lindgren [4] notes that certain history strings occur vanishingly rarely in these sorts of simulations, which makes the strategy values associated with these strings subject to potentially extremely low selective pressure.^{3} In that situation, these rarely accessed genome sites will tend to drift, which may artificially inflate the apparent complexity of the strategy. Our reliance on a control experiment that is entirely subject to genetic drift helps to identify the magnitude of this potential inflation. However, there are other possible solutions. In particular, if we were to weight the complexity contribution of each genome site by the rate at which it is accessed, we could focus our measurements on those sites that are subject to selective pressure, rather than those that are merely subject to drift. As noted in the footnote, this is still an imperfect measure of a site's true importance, and more work may be needed to identify a more accurate method of weighting.

## Notes

A similar dynamic is observed in [4] for the tit-for-tat strategy.

No relation to this author.

Even if a history string occurs very rarely, it may still experience high selective pressure if the action taken at that point significantly alters the trajectory of the interaction, but this will often not be the case.

Note that if we wish to handle two strategies with different history lengths, we can simply expand the shorter of the two strategies so that they match. Because our convention is to index into our histories using the low-order bits to indicate the most recent moves, we can simply duplicate a strategy string to expand its effective history length by 1, without changing its actions. For example, the strategy {0, 1} is the same as the strategy {0, 1, 0, 1}.

In other words, if *h*_{1} is the sequence ABCDEFG, then *h*_{2} must be of the form CDEFG?? in order for *h*_{1} → *h*_{2} to be a valid transition.

For two-action games. In general, there will be |*A*|^{2} nonzero entries.

This may also be called the limiting distribution—in our case they are the same thing.

Recall that the columns of *T* are not linearly dependent, because each row must sum to 1. If the columns of a matrix are linearly dependent, then so are its rows.

## References

### Appendix: Solving Finite-History, Noisy, Infinitely Repeated Games

#### A.1 Introduction

In this technical report, we will present an overview of techniques for efficiently calculating fitness payoffs in infinitely repeated matrix games, in which each participant's strategy depends only on a finite history of previous moves, and in which, with small probability, moves can be randomly replaced by other moves in a form of noise. We will focus our discussion on practical techniques applicable to an artificial life setting, and will assume basic familiarity with artificial life concepts such as matrix games and fitness payoffs.

This model of infinite games was presented by Lindgren [4] to analyze the dynamics of the iterated prisoner's-dilemma game. However, Lindgren's work devotes only a brief paragraph to the techniques used to actually compute the payoffs for his model. Our goal here is to provide a roadmap for artificial life researchers looking to reproduce these techniques and results.

#### A.2 Definitions

We will define our simulation as follows. We have an iterated game in which players can select from a set of actions, *a*_{i} ∈ *A*, at each step. The payoffs received for each player are defined by an |*A*| × |*A*| payoff matrix *M* and depend only on the current moves selected (i.e., not on the history of previous moves). Each player will follow a strategy *S* in which they select their move based on a finite-length history of previous moves, *H*. For each possible history, *S* will define a distribution over *A*, the probability of taking each possible action given that history. Further, whenever a player attempts to make a move, there is a small probability *n* that they will instead make a different move, selected at random.

In general, this setup is compatible with any finite action set; however, for clarity we will restrict our examples to an action set of size 2, as in the prisoner's dilemma. We will call our actions 0 and 1, as this will aid us in succinctly expressing our strategies. In particular, we will define a strategy with history length |*H*| as a string of 2^{|H|} values, where the *i*th value represents the probability of taking action 1 given the history sequence with value *i* when interpreted as binary. For example, a strategy for history length 2 might be *s* = {1.0, 0.5, 0.75, 0.1}. In this case, we would take action 1 with 100% probability given the history 00, with 50% probability given the history 01, with 75% probability given the history 10, and with 10% probability given the history 11.

Lindgren represents histories as sequences of moves ending with the opponent's most recent previous move, then the player's most recent previous move, then the opponent's second most recent previous move, and so forth. Thus, the low-order bits of the history sequence are the most recent actions, and the high-order bits are the most distant. The choice of history ordering and the choice of whether to include a player's own moves in the history are ultimately arbitrary (at least from the perspective of solving the game—these choices may still have an impact on other aspects of an evolutionary simulation). In our examples, we will follow Lindgren's notation, such that a history of 01 indicates that the opponent previously played action 1 and the player previously played action 0.

Our goal is, given two strategies *s*_{1} and *s*_{2}, to calculate the average payoff received by each if they play the game for an infinite number of rounds. We will approach this problem analytically, rather than by actual simulation of the strategies. We will view the game as a Markov process in which the transitions between the various possible histories are determined by the strategies of the two players. We will then find the stationary distribution of this Markov process, which will tell us the average frequencies with which the players will find themselves in each possible history. From there, we can simply compute the most recent payoffs received in each history, and average these payoffs according to their probabilities.

#### A.3 Framing the Game as a Markov Process

To frame our repeated game as a Markov process, we need to first define what constitutes the states of that Markov process. Given two strategies with history length |*H*|,^{4} we will consider each of the 2^{|H|} possible histories as a state. Note that if |*H*| is odd, which in our convention means that it depends on one more unit of history of the opponent's moves than the player's own moves, we will need to instead consider histories of length |*H*| + 1, so as to capture the necessary information for both players.

Given two histories *h*_{1} and *h*_{2} and two players *A* and *B* with strategies *s*_{A} and *s*_{B}, we will compute the probability of transitioning from *h*_{1} to *h*_{2} after one round of play. Let each history represent the sequence of actions by players A and B as follows: *h* = {*A*_{2}, *B*_{2}, *A*_{1}, *B*_{1}, *A*_{0}, *B*_{0}}, where *A*_{0} represents the most recent move by player A, *A*_{1} the next most recent, and so forth. First, we observe that one round of play will discard the two oldest actions in our history (or the oldest action, if the history is of length 1), will add two new actions to the front of our history (the two most recently played actions), and will shift all other actions two spots into the past. Note that in order for it to be possible to transition from *h*_{1} to *h*_{2}, it must be the case that they are consistent with the above process—the *N*th moves of *h*_{1} must match the (*N* + 1)th moves of *h*_{2}.^{5} If this criterion is not met, the transition probability between *h*_{1} and *h*_{2} is 0. On the other hand, if it is met, then we can consult the strategies to determine the transition probability.

*h*

_{1}to transition to

*h*

_{2}, it must be the case that player A plays

*h*

_{2}[

*A*

_{0}] given history

*h*

_{1}, and player B plays

*h*

_{2}[

*B*

_{0}] given history

*h*

_{1}. Let

*p*

_{A}be the probability given to

*h*

_{2}[

*A*

_{0}] by

*s*

_{A}given

*h*

_{1}, and

*p*

_{B}the probability given to

*h*

_{2}[

*B*

_{0}] by

*s*

_{B}given

*h*

_{1}. There are four possible ways for both players to play the needed action to produce

*h*

_{2}. First, both players may attempt to play the appropriate action, and noise interferes with neither. The probability of this occurring is

*p*

_{A}× (1 −

*n*) ×

*p*

_{B}× (1 −

*n*) (recall that

*n*is our noise probability). Second, A attempts to play the appropriate action, and noise does not interfere, but B attempts to play the incorrect action, and noise does interfere. The probability of this occurring is

*p*

_{A}× (1 −

*n*) × (1 −

*p*

_{B}) ×

*n*. Similarly, we may have that A attempts to play the incorrect action, and noise interferes, while B attempts to play the correct action, and noise does not interfere, with probability (1 −

*p*

_{A}) ×

*n*×

*p*

_{B}× (1 −

*n*). Finally, we may have that both players attempt to play the incorrect action, but noise interferes with both, giving probability (1 −

*p*

_{A}) ×

*n*× (1 −

*p*

_{B}) ×

*n*. So, in total, the probability of transitioning from

*h*

_{1}to

*h*

_{2}will be

*s*

_{A}= {0, 1} (play 1 if the opponent played 1, and 0 otherwise) and

*s*

_{B}= {0.5, 0.5} (play randomly regardless of what the opponent does), with a noise value of

*n*= 0.01. To capture the dynamics of these strategies, we will need to track two bits of history (so as to know each player's last move). We want to calculate the transition probabilities between the four possible histories 00, 01, 10, and 11. For 00 → 00, the probability is calculated as follows:

*s*

_{A}[00] = 0, so

*p*

_{A}= 1 (because there is a 100% chance that A will attempt to play 0), and

*s*

_{B}[00] = 0.5, so

*p*

_{B}= 0.5. Following Equation 1, this gives

*p*(00 → 00) = 1 × 0.99 × 0.5 × 0.99 + 1 × 0.99 × 0.5 × 0.01 + 0 × 0.01 × 0.5 × 0.99 + 0 × 0.01 × 0.5 × 0.01 = 0.495. Repeating this for the other possible transitions gives the following Markov matrix:

We can observe a few key facts about this matrix and the transition matrices we will generate in general. First, the sum of each row must be 1, because the total outgoing probability from a particular history must be 1. Second, as we increase the history length of our strategies, the transition matrix will quickly become very sparse, because most pairs of histories will not be compatible for a transition. In fact, each row will have exactly four^{6} nonzero entries. Third, our transition matrix is irreducible, meaning that all states are reachable (possibly requiring more then one step) from one another. Note that this is a consequence of our nonzero noise value, and is the motivation for including noise in our simulation. As we will see, this attribute is critical for our method of computing the fitness payoffs for our strategies.

#### A.4 Computing the Stationary Distribution

*D*, of our Markov process.

^{7}To do this, we will solve the equation

*T*is our transition matrix, and

*D*is subject to the constraint Σ

_{i}

*d*

_{i}= 1. In other words, we want to find a distribution of histories that remains the same after being passed through a round of transitions.

*D*| unknowns and |

*D*| equations. We can rewrite it as follows:

However, it is important to note that our equations are not linearly independent.^{8} We therefore need another independent equation to make this system solvable. Luckily, we have one: our constraint that Σ_{i}*d*_{i} = 1. We can integrate this into our formula by replacing any one row of *T* − *I* with all 1s, and replacing the corresponding entry of the left-hand side with a 1. Let these new values be $T\u02c6$ and $0\u2192\u02c6$. We can then compute $T\u02c6$^{−1} × $0\u2192\u02c6$ = *D*.

As a practical matter, it is critical to recall that *T* will be very sparse for larger history lengths. In such cases, it will be computationally infeasible to compute the inverse of $T\u02c6$ directly, and instead specialized techniques for solving sparse systems of linear equations will be required. The details of these techniques are beyond the scope of this report, but the authors found success using the **scipy.sparse.linalg** package, in particular the **spsolve()** method.

#### A.5 Computing Fitness Payoffs

Once the task of computing the stationary distribution is accomplished, it is comparatively trivial to compute the fitness payoffs for the strategies. For each entry in the stationary distribution, compute the fitness received in the final (most recent) round. We then take an average of these payoffs, weighted by their respective probabilities.