## Abstract

Selection is an essential component of any evolutionary system and analysing this fundamental force in evolution can provide relevant insights into the evolutionary development of a population. The 1990s and early 2000s saw a substantial number of publications that investigated selection pressure through methods such as takeover time and Markov chain analysis. Over the last decade, however, interest in the analysis of selection in evolutionary computing has waned. The established methods for analysis of selection pressure provide little insight when selection is based on more than comparison-of-fitness values. This can, for instance, be the case in coevolutionary systems, when measures unrelated to fitness affect the selection process (e.g., niching) or in systems that lack a crisply defined objective function. This article proposes two metrics that holistically consider the statistics of the evolutionary process to quantify selection pressure in evolutionary systems and so can be applied where traditionally used methods fall short. The metrics are based on a statistical analysis of the relation between reproductive success and a quantifiable trait: one method builds on an estimate of the probability that this relation is random; the other uses a correlation measure. These metrics provide convenient tools to analyse selection pressure and so allow researchers to better understand this crucial component of evolutionary systems. Both metrics are straightforward to implement and can be used in post-hoc analyses as well as during the evolutionary process, for example, to inform parameter control mechanisms. A number of case studies and a critical analysis show that the proposed metrics provide relevant and reliable measures of selection pressure.

## 1 Introduction

It is hard to overestimate the importance of selection in any evolutionary system: it is an essential component of evolution, weeding out genetic material that is associated with maladaptive behaviour or suboptimal solutions. To understand the evolutionary process, it is therefore essential to understand the selection mechanism that drives it. In the 1990s and 2000s, a range of publications on this topic reflected the research community's recognition of its importance. Over the last few years, the number of publications considering selection pressure per se, in particular its quantification, has decreased, and contemporary evolutionary computing research pays scant explicit attention to this aspect of evolutionary systems: a search for evolutionary computing articles published since 2010 with “selection pressure” as author's keyword returned six papers in ACM's digital library, and seven papers in IEEE Explore. We argue that truly understanding the effect of developments in evolutionary computing—ranging from informed mutation strategies to methods to improve population diversity—requires the analysis of their impact on selection pressure. This article proposes two metrics to quantify selection pressure that are straightforward to implement for any evolutionary system and so provide a convenient tool for such analyses.

The 1990s saw a spate of publications where selection pressure in evolutionary algorithms was characterised in terms of *takeover time* (e.g., Goldberg and Deb, 1991; Bäck, 1994; Blickle and Thiele, 1995; Miller and Goldberg, 1996). This line of research established a standard technique for studying and comparing selection algorithms: variation operators such as mutation and recombination are turned off, leaving only unaltered reproduction. With selection the only active operator, the growth rate of copies of the best initial individual over time is monitored. The *takeover time* is the time it takes for the single best individual to conquer the whole population. A shorter takeover time thus means a higher selection pressure (Goldberg and Deb, 1991). Such analyses led, for instance, to the recognition that fitness proportionate selection leads to selection pressure that varies across problem instances or that tournament selection allows for adjustable selection pressure independent of the fitness function (Blickle and Thiele, 1995; Miller and Goldberg, 1996). More recently, Rudolph (2001) used a graph-based model to derive theoretical takeover times for selection methods in spatially structured populations.

Takeover time analysis does not provide a complete picture if there is a possibility that the best individual becomes extinct. This realisation led to the development of takeover probability analysis, where the probability that the best individual has taken over the population at some time is measured. Chakraborty et al. (1996) numerically estimated takeover probabilities on the basis of simulations, while Rudolph (2000) used Markov-chain analysis to calculate takeover probability theoretically.

Takeover analyses fall short when considering evolutionary systems where selection is modified by considerations unrelated to fitness per se, for example, to maintain diversity or to structure the population into species. This disqualifies these methods for analysis of selection pressure under commonly applied techniques such as niching as well as more recent developments such as novelty search as a secondary objective (Mouret and Doncieux, 2012). Multiobjective evolutionary algorithms such as NSGA strive to develop multiple Pareto optimal solutions (Deb and Gupta, 2005) and therefore pose similar problems for analyses based on takeover time or probability.

De Jong and Sarma (1995) based an analysis of selection pressure on a quantitative model of selection intensity in local neighbourhood selection which was derived from takeover time analysis. When comparing the performance of a neighbourhood-based selection scheme with that of a serial tournament selection scheme, they found that the neighbourhood-based selection scheme was consistently outperformed, even though the selection pressures were—according to their measure—equivalent. Further statistics-based analysis of the “emergent” selection pressure revealed that the variance in selection pressure was much higher in the distributed setup than it was in the serial case, and that this caused the difference in performance. This caused them to underscore the “importance of an analysis of the variance of selection schemes,” implicitly making the case for a statistical analysis of selection pressure and highlighting the importance of studying the effects of selection schemes in the context of the evolutionary system as a whole.

An alternative method of characterising selection pressure relies on Markov-chain analysis (Rudolph, 1998) to analyse convergence time. This method may accommodate analysis of algorithms with measures to maintain population diversity, but it does rely on the supposition that evolution is used towards optimising some goal and analysis is geared towards estimating the time it takes to reach that goal. This metric provides relevant quantification for evolutionary computing applications that seek to find a (near-) optimal solution to some problem. The problem can range from numerical optimisation, planning, and design to robot control. In evolutionary computing, this often serves well: the selection of individual solutions usually depends on some numerical measure of how well they solve the problem at hand. Selection pressure then is a direct consequence of comparing these numerical values to determine which individuals survive and procreate.

The focus on the time it takes to achieve some level of performance, however, makes it hard to apply such methods to evolutionary systems where selection pressure does not derive straightforwardly from a comparison of some (numeric) measure of performance. An obvious example of a system where evolution is not guided by a crisply defined objective is natural evolution; biologists, in fact, consider fitness not as an a priori *determinant*, but as an a posteriori *measure* of reproductive success. Similar considerations apply to many artificial evolutionary systems that are coevolutionary or objective-free—both common in artificial life research. (e.g., Ray's (1991) Tierra system or Sim's (1994) evolving artificial creatures through competition). In cases such as these, the concepts of convergence to optimality and takeover time simply do not apply. Selection pressure is, however, very much present in such evolutionary systems: it is, after all, an essential ingredient of evolution.

Research into the evolutionary dynamics of such systems would benefit from a method that allows an objective characterisation or quantification of selection pressure even without any measurable objective. Although these kinds of systems do not base selection on an individual's performance on some objective function, there will be traits that influence reproductive success. Analysing the relationship between selection and such traits (e.g., the length of a program in Tierra) can provide relevant insights into the evolutionary process. Such analyses require a metric that can quantify selection pressure related to traits that are implicitly linked to reproductive success.

The loss of diversity metric introduced by Blickle and Thiele (1995) and further investigated by Motoki (2002) reflects the proportion of individuals that are not selected by means of the loss of genetic diversity during selection. This method quantifies selection pressure without reference to any performance metric: it can therefore be relevant when studying selection in evolutionary systems without explicit selection. Because loss of diversity does not quantify the link between the individuals' traits and their reproductive success, it cannot shed light on the relation between selection pressure and any traits.

Droop and Hickinbotham (2012) proposed a frequentist approach to analyse the benefit of genome components. On the premise that beneficial components will occur more frequently as evolution progresses, they compare the observed and expected frequency of components to quantify the reproductive advantage of the components. It may be possible to extend this method to phenotypic or behavioural traits so that their reproductive advantage can be assessed.

In addition to analysing selection pressure, an important benefit of an implementable metric of selection pressure is that it enables informed control of selection at runtime. In a recent overview of research into parameter control in evolutionary algorithms, Karafotias et al. (2015) show that the research effort into control schemes that adapt the selection operators at runtime is a fraction of the research into adaptive variation operators. This disparity seems illogical when one considers that evolution is a process that emerges from the interplay between heredity, variation, and selection as famously formulated by Darwin (1859). Parameter control offers substantial benefits (Smith, 1998), and controlling as crucial an aspect as selection can be expected to deliver many of these benefits. We think that further research into the possibilities of control of selection operators on the basis of metrics as proposed in this article is worthwhile, but elaborating on this possibility, for example, developing a concrete adaptive control heuristic is beyond the scope of this article.

This article proposes two straightforward methods to quantify selection pressure. Both holistically consider the evolutionary process and analyse the association between performance (or, more generally, some quantifiable trait) of each individual in a population and its number of offspring. The first metric we propose is based on a probability estimate; the second uses a correlation measure to quantify selection pressure. Early versions of these metrics were used to analyse selection pressure in an evolving population of robots by Haasdijk et al. (2014) and Haasdijk (2015).

## 2 Quantifying Selection Pressure

This article investigates two methods of quantifying selection pressure that characterise the relation between number of offspring and some quantifiable trait(s). When researching evolutionary algorithms as optimisers with a crisply defined fitness function—the traditional evolutionary computing mindset—an individual's fitness is the obvious candidate for this trait. For multiobjective algorithms, the individual fitness components are equally obvious choices. In cases where selection is not directly related to an objective, for instance in artificial life systems without an objective function, it can be less obvious which traits to analyse. Section 4.2 shows an example of an analysis of such a system where different aspects of robot behaviour determine reproductive success.

For the sake of simplicity, we use the term *trait* for both implicit and explicit fitness measures. Also, we describe the metrics in terms of increased fitness implying more offspring. For minimisation problems where lower fitness indicates higher quality, this can be trivially converted, as one of the case studies will show.

The metrics we propose quantify selection pressure for a population. The exact definition of a population for the purposes of these metrics depends on the particularities of the evolutionary system. For generational algorithms, all individuals in a particular generation is an obvious choice. When a new generation has been created, the strength of selection pressure for that generation's parents can be calculated and logged. Plotting the selection pressure for subsequent generations allows the experimenter to analyse the development of selection pressure over the run of the evolutionary algorithm. When individuals are not replaced per generation, for example, in steady state systems or in many artificial life implementations, a population can be defined as all individuals that were considered in a particular time interval. One can also define the population as all the individuals considered during the whole course of an evolutionary run, but this will obscure any trends in selection pressure over the course of evolution when evolutionary pressure is not constant.

### 2.1 Probability Based: Fisher's Exact Test

The first metric, , considers the selection process from the viewpoint of probability. It is based on the premise that an increasing level of certainty that the relation between trait and fecundity is not random indicates a higher selection pressure. If there were no selection pressure related to the trait, the relationship between trait and fecundity would be random, and contrariwise, if an individual's chances of generating offspring depend on trait, the relationship is systemic.

Fisher's exact test sums the likelihoods of the observed case and more extreme cases (cases with a larger proportion of observations in and ). The test constructs these more extreme cases by incrementing and by 1 while keeping the totals intact by decrementing and . When and/or , no more extreme cases can be constructed. The sum of the probabilities is the likelihood that there is no association between the columns (above or below median trait ) and the rows (above or below median offspring ). A high value for indicates that this is very likely and therefore that the relation between offspring and performance is weak, suggesting a low selection pressure. A low value for , on the other hand, indicates a strong relation, evidence of high selection pressure. employs the one-tailed version of Fisher's exact test because it quantifies a known or at least suspected association between trait and fecundity. The threshold for the cells in the contingency table is set so that the median value is included in the “low-value” sets: cases with median trait value count towards and and those with median number of children count towards and . This increases strictness for the association test and avoids empty sets when high selection pressure leads to 0 as the median number of offspring ().

This use of Fisher's exact test is somewhat different from its typical use, which is to establish whether a significant relationship exists by comparing to some threshold (usually 5%). In this metric, we don't compare with a predefined threshold but interpret it as a measure of the strength of the relationship between trait and number of offspring. P-values per se give no indication of the effect size, but in populations of the same size larger effects will result in lower p-values (Rosenthal and Rosnow, 1984). Thus, under the assumption of fixed population size, it is valid to interpret as an indication of the strength of selection pressure.

#### 2.1.1. Example Calculation

To clarify the calculation of , consider the example of a small population with trait and number of offspring in Table 1.

(a) Individuals with trait and number of offspring . | ||
---|---|---|

Individual . | Trait . | Offspring . |

1 | 0 | 0 |

2 | 1 | 0 |

3 | 1 | 1 |

4 | 2 | 0 |

5 | 3 | 2 |

6 | 4 | 1 |

7 | 5 | 0 |

8 | 5 | 2 |

9 | 7 | 2 |

10 | 9 | 2 |

Median | 3.5 | 1 |

(b) Contingency table | ||

Trait | ||

Offspring | ||

4 | 2 | |

1 | 3 |

(a) Individuals with trait and number of offspring . | ||
---|---|---|

Individual . | Trait . | Offspring . |

1 | 0 | 0 |

2 | 1 | 0 |

3 | 1 | 1 |

4 | 2 | 0 |

5 | 3 | 2 |

6 | 4 | 1 |

7 | 5 | 0 |

8 | 5 | 2 |

9 | 7 | 2 |

10 | 9 | 2 |

Median | 3.5 | 1 |

(b) Contingency table | ||

Trait | ||

Offspring | ||

4 | 2 | |

1 | 3 |

### 2.2 Correlation Based: Kendall's -b

The second metric, , measures the correlation between trait and offspring. To be applicable for arbitrary evolutionary systems, a metric of selection pressure must not make any assumptions about the distributions of trait and offspring or about the shape (e.g., linearity) of the relation between them. Also, there can be large numbers of cases with the same trait or the same number of offspring in the population, and the test must be able to handle these ties. Kendall's rank correlation coefficient with ties, Kendall's -b for short, meets these requirements.

*concordant*if () or (),

*discordant*if () or if () and a

*tie*in , respectively if () or if (). Let denote the number of concordant pairs, the number of discordant pairs, and the number of possible pairings with a tie in , respectively . Then:

The denominator calculates the number of possible ways of selecting distinct pairs, corrected for the number of ties. Kendall's -b returns a value in the interval where 1 indicates a strong positive association and a strong negative association.

Thus, an extreme value for indicates a strong correlation between trait and offspring and therefore high selection pressure. A negative value indicates that lower trait value correlates with more offspring, indicating a minimisation problem.

#### 2.2.1 Example Calculation

Revisiting the example from Table 1, we calculate the numbers of discordant, concordant, and tied pairings for each individual as shown in Table 2. Note that each possible pairing is considered only once: for instance, the concordant pairing of individuals 1 and 10 is counted only in the row for individual 1.

Individual . | Trait . | Offspring . | Discordant . | Concordant . | Ties-Trait . | Ties-Offspring . |
---|---|---|---|---|---|---|

1 | 0 | 0 | 0 | 6 | 0 | 3 |

2 | 1 | 0 | 0 | 5 | 1 | 2 |

3 | 1 | 1 | 2 | 4 | 0 | 1 |

4 | 2 | 0 | 0 | 5 | 0 | 1 |

5 | 3 | 2 | 2 | 0 | 0 | 3 |

6 | 4 | 1 | 1 | 3 | 0 | 0 |

7 | 5 | 0 | 0 | 2 | 1 | 0 |

8 | 5 | 2 | 0 | 0 | 0 | 2 |

9 | 7 | 2 | 0 | 0 | 0 | 1 |

10 | 9 | 2 | 0 | 0 | 0 | 0 |

Total | 5 | 25 | 2 | 13 |

Individual . | Trait . | Offspring . | Discordant . | Concordant . | Ties-Trait . | Ties-Offspring . |
---|---|---|---|---|---|---|

1 | 0 | 0 | 0 | 6 | 0 | 3 |

2 | 1 | 0 | 0 | 5 | 1 | 2 |

3 | 1 | 1 | 2 | 4 | 0 | 1 |

4 | 2 | 0 | 0 | 5 | 0 | 1 |

5 | 3 | 2 | 2 | 0 | 0 | 3 |

6 | 4 | 1 | 1 | 3 | 0 | 0 |

7 | 5 | 0 | 0 | 2 | 1 | 0 |

8 | 5 | 2 | 0 | 0 | 0 | 2 |

9 | 7 | 2 | 0 | 0 | 0 | 1 |

10 | 9 | 2 | 0 | 0 | 0 | 0 |

Total | 5 | 25 | 2 | 13 |

## 3 Analysis of Behaviour

To investigate the behaviour of the metrics for varying ranges of selection pressure, we perform an experiment on artificially generated data with varying levels of randomness in the relation between trait and offspring. The artificial data sets simulate populations of 100 individuals, with the best having a trait of 100, the next best of 99, and so on. In many evolutionary systems, and in particular in those in evolutionary computing, a substantial proportion of the population is tied in their number of offspring: many individuals have 0 or 1 offspring, for instance. To reflect this large proportion of ties in the generated data set, the top 20% have 4 offspring, the next 20% have 3, and so on. This represents the results of a deterministic selection scheme with high selection pressure (in reality, such a scheme would rapidly lead to unmanageable population sizes, but that is irrelevant to the current analysis).

To simulate variations in selection pressure for a fraction of the individuals, the offspring count is changed to a random value between 0 and 4. For example, a fraction means that a random number of offspring is entered for approximately 10 randomly selected individuals. The remaining 90 individuals are entered as for the aligned case. Thus, increasing simulates the effects of lowering selection pressure. To assess the metrics' sensitivity to selection pressure, we varied from 0 to 1. Fifty data sets were generated for each value of , and and were calculated on each set. Figure 1 shows the median value and interquartile range of both metrics over the sets against .

Both metrics show a smooth monotonic trend, supporting the conclusion that both and provide a relevant and reliable metric of selection pressure. The plot shows a linear trend for , which implies that is equally sensitive across a wide range of selection pressure. , on the other hand, is clearly nonlinear and seems more sensitive to differences when selection pressure is high. Thus, may be preferable to detect differences when selection pressure is comparatively high (as is very often the case in evolutionary algorithms) and to detect trends in the development of selection pressure over time.

Figure 1 shows that the interquartile ranges of both and are minimal when is small and increase with . The metric's interquartile range decreases again when , but that of does not. The low variability for small values of is simply the consequence of the minimal number of perturbations in the data: there is little change in the values of , , , and (when calculating ) or and (when calculating ). When becomes very high, the effect of perturbations on , , , and decreases again because these have an effect only when an individual's record crosses the median. Consider again the example calculation in Table 1: changing the offspring count for individual 2 from 0 to 1 would have no effect on the calculation of because that individual still counts towards , but the values of , , and do change, causing a different value for . The following section takes a closer look at the metrics' variability.

### 3.1 Variability

De Jong and Sarma (1995) noted that it is important to analyse the variance of selection schemes. Therefore, it is important that a metric of selection pressure provides consistent results for similar levels of selection pressure. To investigate the variability of and , consider the 50 populations that were generated with . The simulated selection pressure is more or less the same in each of these samples: on average, 50% of the offspring count is aligned with the trait and 50% is randomly assigned. Random effects will cause a difference in the emergent selection pressure, and the measures should reflect this difference, but fluctuations in the metrics must reflect those in the data and not introduce additional noise.

Figure 2a shows the distribution of and values for the data sets. The interquartile ranges for and run from 3.79 to 6.40 and 0.391 to 0.486, respectively. If the variation in the metrics is due to the fluctuations in the data, the metrics should agree on the level of selection pressure for the samples. To assess the level of agreement, we calculated the correlation between the and results using Pearson's r, which reports a correlation of 0.8257 (, ). The scatterplot in Figure 2b summarises the results, along with a fitted trend line ().

The strong correlation indicates that and mostly agree on the level of selection pressure, although in some cases the variations in the data affect the metrics differently. This suggests that at least a substantial part of the variation in the metrics is the result of variation in the data, not of instabilities in the metrics. Thus, it appears that variations in the and values reliably indicate variations in selection pressure of the investigated evolutionary system.

### 3.2 Influence of Population Size

As stated earlier, is a valid indication of the strength of selection pressure *under the assumption of fixed population size*. Probability-based metrics are by nature susceptible to varying levels of sample size: larger populations imply more evidence and therefore increased certainty about the relation between trait and fecundity, even if the effect size is the same.

To illustrate this effect, we generated 10 sample populations of size 100 with as before and calculated and for each population. We also calculated and for the union of these 10 populations as if it were a single population of 1000 individuals. Metrics that are not susceptible to effects of population size will return similar values for the 10 smaller populations and the large combined population. Figure 3 shows the results of this comparison.

returns comparable (although not entirely equal) values for similar sets of individuals regardless of size while shows a much elevated value for larger populations. This can be easily understood from Formula 1: multiplying the counts , , , and in, for example, the contingency table shown in Table 1 by 2 (which has a similar effect as combining the populations of two runs of an evolutionary algorithm) would result in a p-value of 0.01 instead of 0.24.

Thus, the metric is unsuitable when comparing selection pressure between evolutionary algorithm runs with different population sizes or for systems where the population size is not constant,^{1} and in such cases, the metric would be preferable. For comparisons where the population size is (almost) constant, such as those in the case studies presented here, does provide an appropriate metric to quantify selection pressure. An interesting consequence of 's sensitivity to population size is that differences and trends in selection pressure can be highlighted in larger or combined populations, even if these are not apparent for smaller populations, as we will see in the next section.

## 4 Case Studies

### 4.1 Genetic Algorithm

This is a minimisation problem, so lower trait values indicate better performance. Consequently, the correlation-based metric will show a negative correlation between trait and fecundity. The settings for the algorithm, listed in Table 3, are based on a preliminary parameter sweep. Code for these experiments (including the settings used here) is available at https://github.com/ci-group/SelectionPressure.

Parameter | Setting |

crossover rate | 1 |

mutation rate | 0.25 |

mutation operator | Gaussian perturbation |

mutation step size | 1 |

population size | 200 |

elitism | 1 |

initial temperature | 1000 |

temperature decrement | 0.02 |

Parameter | Setting |

crossover rate | 1 |

mutation rate | 0.25 |

mutation operator | Gaussian perturbation |

mutation step size | 1 |

population size | 200 |

elitism | 1 |

initial temperature | 1000 |

temperature decrement | 0.02 |

When a new generation is created, the trait and number of offspring for each individual in the parent population is logged. Both and are then calculated to accommodate post-hoc analysis of the selection pressure over the course of the runs.

The algorithm implements tournament selection: each tournament consists of a number of individuals drawn randomly from the population and the best individual is then selected as parent for an individual in the next generation. To generate each individual in the new population, two parents are selected, after which recombination and mutation are applied. Figure 4 shows the selection pressure over time for tournament sizes of 2, 5, and 10 individuals.

It is well known that selection pressure increases with tournament size, and this is confirmed by the consistently more extreme values for for larger tournament sizes, although changing the tournament size from 5 to 10 has a much smaller (if any) effect than changing it from 2 to 5. The metric does increase when tournament size changes from 2 to 5, but it shows very similar values for tournament sizes 5 and 10. The plots also show that for this scheme, selection pressure is more or less constant for any particular tournament size, in line with the findings reported by Blickle and Thiele (1995).

To analyse the results in more detail, we consider three populations (one for each tournament size setting) with both and close to the respective median values at generation 10000. Table 4 shows the calculations for in these populations. Because the case study concerns a minimisation objective, is greater than , resulting in a negative correlation as mentioned earlier. Increasing selection pressure implies lower values for and/or higher values for . Indeed, decreases substantially when the tournament size increases from 2 to 5: there are fewer pairings with lower fitness and higher number of offspring. The increase in is less marked. Both and remain more or less stable when tournament size increases further to 10.

tournament size | |||

2 | 5 | 10 | |

calculation | |||

5350 | 3273 | 3131 | |

10133 | 10711 | 10454 | |

4324 | 3529 | 3694 | |

125 | 2853 | 3160 | |

−0.27 | −0.45 | −0.44 | |

Population diversity | |||

498.98 | 430.59 | 391.95 | |

4 | 76 | 80 | |

15 | 246 | 275 |

tournament size | |||

2 | 5 | 10 | |

calculation | |||

5350 | 3273 | 3131 | |

10133 | 10711 | 10454 | |

4324 | 3529 | 3694 | |

125 | 2853 | 3160 | |

−0.27 | −0.45 | −0.44 | |

Population diversity | |||

498.98 | 430.59 | 391.95 | |

4 | 76 | 80 | |

15 | 246 | 275 |

A striking result of increasing tournament size is that diversity decreases as evidenced by the data in Table 4. The value is the number of copies of the best individual in the population; for example, the population with binary tournament contains 4 exact copies of the individual with optimum fitness. This number increases to almost 40% of the population for tournament sizes 5 and 10. Selection among these individuals is essentially random. is the count of tournaments won by these individuals combined. Because of the larger tournament size, the copies are very likely to dominate the tournaments: they win approximately two-thirds of the tournaments for tournament sizes 5 and 10, respectively. Thus, and are for a large part determined by random ordering within the top individuals and barely change for higher tournament sizes. Thus, there is little difference in the correlation between fitness and offspring because a substantial, identical, part of the population dominates the remainder.

As noted in Section 3.2, larger populations increase the certainty of Fisher's Exact Test. This can be exploited to emphasise differences in selection pressure that are hard to detect in small populations. In this case, we can emphasise the small difference in selection pressure between tournament sizes 5 and 10 by analysing combined populations as illustrated by Figure 5.

Taking data from the 10000th generation of ten runs, we calculated for each of the ten populations individually as well as for the populations combined. The results are shown in blue and red, respectively. The grey error bars indicate the interquartile ranges for each metric over the ten populations. The plot compares the median selection pressure for the two metrics over the ten populations of size 200 to the result when these populations are combined into a single set of 2000 individuals. The difference in values for tournament sizes 5 and 10 is much more pronounced in the combined population, so small differences in selection pressure can indeed be detected by analysing larger or combined populations with the metric.

To further illustrate the metrics' usefulness, we consider a situation where selection pressure is not constant over time as it is with roulette wheel selection. To this end, we modified the algorithm to include Boltzmann selection (Goldberg, 1990).

Figure 6 shows the selection pressure over time for standard tournament selection, Boltzmann selection (both with tournament size 5) and for fitness-proportionate (roulette wheel) selection. The fitnesses for roulette wheel selection are scaled so that the individual fitnesses add up to 1. The results clearly show how the selection pressure for Boltzmann selection rises as the temperature decreases, following the exponential trend dictated by the Boltzmann equation. Towards the end of the runs, reaches 0 and the selection pressure is the same as for deterministic tournament selection. The results also show that the selection pressure for fitness proportionate selection is substantially lower than for the alternative selection operators. Fitness-proportionate selection is associated with decreasing selection pressure as evolution focusses on highly fit solutions. In this case, this is counteracted by fitness scaling and the selection pressure remains stable.

As noted earlier, the nonlinear nature of can emphasise trends in selection pressure. This can be seen in Figure 6, where the exponential trend for Boltzmann selection is more pronounced in panel 6a than in 6b.

These analyses confirm established relationships between selection schemes, their settings, and the resulting selection pressure about the effect of various selection mechanisms. In themselves, they therefore do not yield any new insights, but they do show that both metrics are appropriate, and that with metrics such as these it is possible to trace the development of selection pressure as evolution progresses. Tracing this development is particularly interesting when analysing systems where selection depends on more than merely comparative performance. The following section provides an example of such a case.

### 4.2 MONEE

The second case study considers an instance of MONEE (Haasdijk et al., 2014) where selection occurs at two levels. The first level is environment-driven, with implicit selection, defined by the mechanics of procreation in the environment only. The second level adds explicit selection for task performance. The environmental level can be seen as survivor selection and the task level as parent selection.

In these experiments, a collection of simulated robots is placed in an arena with obstacles and pucks scattered across it. The robots are equipped with sensors to detect obstacles and pucks. The robots are controlled by a neural network and the weights of this neural network form a robot's genotype. Robots activate a controller for a fixed amount of time (2000 time steps), after which it switches to a quiescent “egg” state for 200 time steps during which it collects genomes from passing active controllers.

#### 4.2.1 Environmental Selection

The first level of selection is entirely environment driven: whenever a robot comes within a small distance of the egg, it transmits its genome, and the egg stores the genome in an internal cache. When the egg phase terminates, the robot will select one of the genomes in its cache and activate a controller with a mutated copy of the selected genome. The robot then clears its cache and the cycle repeats. Bredeche et al. (2012) showed that, if the selection of an individual from the cache is random, this system promotes movement: robots that move create more opportunities to pass their genome to other robots than robots that remain stationary. All robots change controllers after the same fixed amount of time, but they do so asynchronously (one could say that they do not have the same “age”). Because the robots change controllers asynchronously, the definition of a population for the calculation of the metrics is less straightforward than for a generational algorithm. In this case, we define a population as all the controllers that ran to completion within a time frame of 5000 ticks.

Note that there is no objective function that drives evolution: whether a genome spreads through the population is only an effect of the environment and the robot behaviour determined by that genome. In particular, movement is not explicitly selected for, but the rules of the environment are such that movement provides a reproductive advantage. Figure 7 shows the development of the resulting selection pressure over time. Both the and metric show that the selection pressure starts low and then rises to a maximum at approximately 250000 time steps. Selection pressure then decreases as rapid movement becomes common in the population. The pressure that results from the implicit selection defined by the environment is much lower than that of the explicit selection in the previous case study. Still, there is, as expected, a clear relationship between movement and reproductive success. Note that the variance in selection pressure is much higher in this system than it is for a traditional evolutionary algorithm (compare, for instance, with Fig. 6). This increased volatility results from of the randomness caused by local selection: robots must “meet” to be able to transmit their genomes so reproductive success depends not only on an individual's behaviour, but also on the behaviour of other robots and the circumstances a robot is in (e.g., a crowded or relatively sparse part of the arena).

This analysis provides a good example of selection pressure in an evolutionary system without an objective function. There is an expected relation between a behavioural trait (movement) and reproductive success, but the trait is not explicitly selected for. It is, in fact, not even measured by the individuals except for logging to enable post-hoc analysis. A similar measure would be the number of instructions required to generate offspring in ALife systems such as Tierra (Ray, 1991) or Avida (Adami and Brown, 1994).

#### 4.2.2 Task-Based Selection

MONEE extends this objective-free scheme by adding task-based selection. The task for the robots is to pick up pucks. Robots pick up pucks by simply driving over them, and a replacement puck is automatically placed in a random location so that the supply of pucks is limitless. When robots transmit their genomes, they annotate the genome with the number of pucks they have picked up so far running that genome's controller. Now, when a robot selects a new controller from the cache of received genomes, it does so using a binary tournament with the number of pucks picked up as trait, adding a second level of selection to the system.

With two forces that drive selection, there are now two (behavioural) traits that relate to selection pressure: one is the distance a robot covered using a controller (selected for indirectly through the environment); the other is the number of pucks a robot collected running that controller. Thus, we now measure selection pressure twice: once by analysing the relation between fecundity and distance covered, and once by analysing the relation between fecundity and the number of pucks collected. Note that only the number of pucks collected is an explicitly defined fitness. Quantifying selection pressure can help understand how these two selection pressures combine.

The results of this analysis in Figure 8 show similar, but more pronounced dynamics in the amount of selection pressure as before: initially, selection pressure is low, but it then rapidly builds up as both movement and picking up pucks becomes more common, to finally settle at an intermediate level when appropriate behaviour becomes prevalent. Also note that the selection pressure now is substantially higher than it was with only implicit selection through the environment in Figure 7. The selection pressure related to robot movement also increases compared to the implicit selection runs. This is a result of the correlation between movement and picking up pucks: robots must move to pick up pucks, not only to spread their genomes. The trends in developing selection pressure are more pronounced for than for .

In systems such as this, selection pressure is to a large extent determined by the environment: it is the environment that determines how easy it is to transmit genomes, how long a lifetime is, how easy it is to find a puck. Quantifying selection pressure can help understand the implications of setting particular environmental parameters. As an example, consider the ease with which robots can transmit their genomes successfully; in this case, this can be adjusted by changing the range at which two robots can communicate. A larger communication range makes it easier for the robots to successfully transmit genomes to each other, effectively increasing the number of genomes in each robot's cache. Similar to increasing the tournament size in regular evolutionary algorithms, this can be expected to increase selection pressure: genomes now must compete with more of their peers to win the chance for procreation. On the other hand, movement becomes less relevant to reproductive success as it becomes easier to transmit genomes, and therefore selection pressure (at least with respect to distance covered) should decrease. An analysis of effective selection pressure can help determine how these two forces combine.

Figure 9 shows the selection pressure for different communication ranges. For this illustrative example, only the metric is shown. Each of the four plots shows the selection pressure over time for a particular communication distance setting. Selection pressure increases substantially when changing the communication range from 10 to 20, but the effect of increasing it from 20 to 30 is much less pronounced, while there is no evident change when further increasing it to 40. Note that the trend of initially increasing and then relaxing selection pressure persists. The analysis shows that the results of increasing the number of successful transmissions outweigh the effects of making movement less relevant to successful transmission of genomes.

The analyses in this case study concern a relatively straightforward setup where the environmental and task-based selection pressures both encourage movement and do not oppose each other. However, quantifying selection pressure can yield relevant insights in more complex settings, for example, when the behaviour required by the task is at odds with that required by the environment or when multiple tasks play a role. Haasdijk (2015) shows an example of analysis of selection pressures when environment and task push require mutually exclusive behaviour.

## 5 Conclusion

The number of publications regarding the analysis of selection pressure in evolutionary algorithms has declined after a spate of papers in the 1990s, but understanding the selection process remains crucial to understanding the evolutionary process as a whole. This article proposed two metrics to quantify selection pressure that holistically consider the effects of selection. The metrics do not require a model of the selection process and so are also applicable when the forces on selection are complex, unknown or poorly understood. Thus, these metrics can provide insight where traditionally considered methods such as takeover time and Markov chain analysis fall short and so can provide insight into evolutionary processes where selection is more intricate than comparing fitness values or may even eschew explicit comparisons. Both metrics quantify selection pressure by analysing the relation between individual traits and an individual's number of offspring. One metric, , is based on an estimate of the probability of a relation between some quantifiable trait(s) and reproductive success, the other, , on the correlation between trait and reproductive success.

A critical analysis provides some insight into the drawbacks and benefits of the two metrics introduced in this article. The probability-based metric seems better at providing detail to differentiate between (trends in) the level of selection pressure, particularly when the pressure is high. On the other hand, is sensitive to varying population size because larger populations imply more certainty. The correlation-based measure does not have this issue and returns similar values regardless of population size. Moreover, it indicates not only the magnitude of selection pressure, but also its direction (i.e., it returns negative values for minimisation problems). A consequence of 's sensitivity to population size is that trends in selection pressure over the course of the evolutionary process can be highlighted in larger or combined populations, even if these trends are not apparent for smaller populations.

Two case studies showed that the metrics provide a relevant and reliable measure of selection pressure. The first case study showed that both metrics corroborate established effects of increasing tournament size and of the development of selection pressure with Boltzmann selection, confirming the relevance of these metrics. In one instance (when tournament size was increased from 5 to 10) the metric seemed at odds with established effects, but this seemingly anomalous result could be explained by the lack of diversity in the analysed populations.

The study of systems where evolution results from a more intricate interaction between phenotypes than numerical comparisons of fitness values can benefit from better understanding how the evolutionary system's components influence selection pressure. Metrics such as we propose provide the tools to perform the analyses that can underpin this understanding. In particular, the second case study showed how the metrics apply in these situations, allowing for the analysis of coexisting drivers of selection. It also showed that the metrics allow researchers to analyse and quantify the effect of changes to the environment that influence selection pressure indirectly.

These metrics provide convenient tools to analyse selection pressure and so allow researchers to better understand this crucial component of evolutionary systems. We hope that this contributes to renewed interest in the analysis of the selection procedure in evolutionary systems, both in the presence of crisply defined selection criteria, as in objective free evolution.

The metrics can also be utilised to implement informed adaptation of the selection pressure at runtime. A detailed elaboration is beyond the scope of this article, but we hope that these metrics will increase the awareness of the possibility of controlling selection operators, a chance for improving evolutionary algorithms that is somewhat neglected compared to the research effort dedicated to control of variation operators.

## Acknowledgments

The work presented here was funded through the European Union's Horizon 2020 research and innovation program under grant agreement No. 640891 (DREAM).

## Note

^{1}

In other words, if the population size is not the same, the p-value returned by Fisher's Exact Test is not a reliable indicator of effect size.