Abstract

Dynamic optimisation is an area of application where randomised search heuristics like evolutionary algorithms and artificial immune systems are often successful. The theoretical foundation of this important topic suffers from a lack of a generally accepted analytical framework as well as a lack of widely accepted example problems. This article tackles both problems by discussing necessary conditions for useful and practically relevant theoretical analysis as well as introducing a concrete family of dynamic example problems that draws inspiration from a well-known static example problem and exhibits a bi-stable dynamic. After the stage has been set this way, the framework is made concrete by presenting the results of thorough theoretical and statistical analysis for mutation-based evolutionary algorithms and artificial immune systems.

1  Introduction

Optimisation problems are ubiquitous, and not all optimisation problems have the property that they stay fixed while they are solved by some optimisation algorithm. Some change over time, and if the change is sufficiently fast in comparison to the optimisation process, the changing nature of the problem has to be taken into account. In these cases we speak of dynamic optimisation problems and are often confronted with the situation that no good problem-specific algorithms are available for solving them. In practice, in such situations heuristic optimisers are often used. There are many different randomised search heuristics that can be applied in this context, among them evolutionary algorithms and artificial immune systems.

Evolutionary algorithms have been successfully applied in dynamic optimisation, as witnessed by works devoted to precisely this topic (Branke, 2002; Weicker, 2003; Morrison, 2004; Yang and Yao, 2013). The theoretical analysis, however, is lagging far behind. While it is common that the theoretical analysis of randomised search heuristics follows after their successful application, the last two decades have witnessed an immense development in the theory of randomised search heuristics for static optimisation (Neumann and Witt, 2010; Auger and Doerr, 2011; Jansen, 2013). This article contributes to the endeavour to carry over this success in the analysis of static optimisation to dynamic optimisation.

1.1  Our Contribution

There are five main contributions of this article, three of them general and the other two concrete. The first general contribution is pointing out that the perspective of theoretical analysis of dynamic optimisation should change to adopt the fixed budget computations perspective, a paradigm shift that recently occurred in the analysis of static optimisation (Jansen and Zarges, 2012; Doerr et al., 2013; Jansen and Zarges, 2014b; 2014c; Nallaperuma et al., 2014; Lengler and Spooner, 2015). The second general contribution is clearly pointing out that the rate of change of the dynamic optimisation problem and the speed of the execution platform that runs the (heuristic) optimiser are two different things that should not be confused. We discuss in Section 1.2 why this is important. The third general contribution is the presentation of an example function that we hope is sufficiently simple to attract attention in the further analytical study of randomised search heuristics but sufficiently interesting to capture important properties of real dynamic optimisation problems. It is a bi-stable function that exhibits phases of stability and rapid change. Its definition is motivated by characteristics of practical problems in, for example, pharmaceutical design (see Tifenbach, 2013, p. 126, for a discussion of this aspect). The first concrete contribution is the analysis of a class of mutation-based evolutionary algorithms and artificial immune systems on this bi-stable dynamic optimisation problem. This demonstrates that the analytical perspective of fixed budget computations and our new example problem both provide feasible settings for theoretical analysis. We choose to study not only evolutionary algorithms but also artificial immune systems because there is reason to believe that in situations of rapid change artificial immune systems may have an advantage over evolutionary algorithms (Jansen and Zarges, 2014c). The second concrete contribution is the first in-depth analysis of a variant of a well-known artificial immune system that was suggested by earlier theoretical analysis of artificial immune systems in static optimisation (Jansen et al., 2011).

1.2  State of the Art in the Theoretical Analysis of Dynamic Optimisation

While dynamic optimisation is an important area of application for many randomised search heuristics, the theoretical analysis lags behind even more than it does for static optimisation. Bu and Zheng (2010) and Nguyen et al. (2012) both point this out when discussing the state of the art. Both articles provide an overview of performance measures for dynamic optimisation; Nguyen et al. (2012) also provide an extensive overview of benchmark problems. While these benchmarks have value for empirical studies, they have not proven particularly useful and popular in theoretical studies. The same holds for the different performance measures that are discussed in both overview articles. Complex performance measures tend to elude theoretical analysis, and as a consequence quite simplistic performance measures dominate in theoretical analyses of randomised search heuristics in dynamic optimisation. Alternatively, only very limited aspects of the algorithm are analysed theoretically, and the major parts of the analysis are based on experiments (Stanhope and Daida, 1999).

When considering theoretical analyses we see that they are either based on very simple example functions, which are based on the most popular static example functions, or they consider very specific (and sometimes complicated) functions that are designed with a very specific purpose in mind. Typical instances of theoretical analysis that are based on extremely simple static benchmark functions include work by Droste (2002; 2003) who analyzes a dynamic variant of , where the fitness of a search point equals the number of bits this string has in common with a target string. He concentrates on the first hitting time of the optimum and the (1+1) evolutionary algorithm (EA), a very simple mutation-based evolutionary algorithm that has a population of size 1 and creates only one offspring in each generation. For such algorithms that perform only a very small number of function evaluations per generation (two in this case) it is reasonable to assume that the fitness function does not change during a generation. Other examples of this kind of research include work by Rohlfshagen et al. (2009) and Kötzing and Molter (2012). In both articles very specific example functions are designed in order to prove a specific point. In the case of Kötzing and Molter (2012) the example function is derived from , and the very specific change that is defined is used to make the difference in dealing with the speed of change between ant colony optimisers and evolutionary algorithms explicit. In the case of Rohlfshagen et al. (2009) the custom-design dynamic fitness function is designed to prove the point that sometimes dynamic functions can be easier to optimise than static variants, contrary to common belief and intuition. Another research direction aims at performing an analysis of the random process as Markov chain in the same way Vose (1998) and others established this for static functions. This was done by Tinos and Yang (2010; 2013) but with very limited tangible theoretical results. Tinos and Yang (2014) follow a similar approach, consider a wide range of classes of dynamic optimisation problems, and present a benchmark problem generator. In all three papers the most significant results are empirical, gained from experiments with example functions.

When the number of function evaluations per generation is larger than just two, it becomes dubious if one can simply assume that the dynamic fitness function does not change during a generation. To the best of our knowledge, Branke and Wang (2003) are the only ones to consider the scenario of change during a generation and to provide a detailed analysis of a simple  evolution strategy, an algorithm that also performs only two function evaluations per generation, for this case. Other articles, among them work by Jansen and Schellbach (2005), Kötzing et al. (2015), Lissovoi and Witt (2013), Lissovoi and Witt (2014), Oliveto and Zarges (2013), and Oliveto and Zarges (2015), take larger population size or offspring population size into account but (sometimes implicitly) assume that the fitness function does not change within a generation. This assumption becomes critical when the effects of the choice of the population size and offspring population size are studied. Increasing the size and consequently the number of function evaluations in a generation effectively means slowing down the rate of change in the dynamic objective function. It then becomes unclear if improved performance is actually due to the increased (offspring) population size or the slower rate of change.

What most articles also have in common is that they concentrate on the expected first hitting time of the global optimum (or similar measures). This is motivated by the fact that the analysis of the expected optimisation time for static optimisation is the most used and successful performance measure. The step from the analysis of the expected optimisation time to the analysis of the expected solution quality that is performed when using the fixed budget perspective has not yet been made in theoretical analyses of dynamic optimisation. It is worth noting that in empirical studies it is much more common to concentrate on the average solution quality (Bu and Zheng, 2010; Nguyen et al., 2012).

1.3  Organisation of this Article

Using the perspective of fixed budget computations alone is not sufficient to guarantee that the results of theoretical analyses will be meaningful. It is also required that the considered algorithms be practically relevant, the range of parameters considered make sense in practical settings, and the considered dynamic problem be either relevant itself or exhibit properties that are believed to be relevant. We consider the aspect of the interplay between properties of the problem, parameter settings, and algorithmic properties in Section 2 and point out how this can be taken into account in analysis so that it is no longer overlooked. In Section 3 we introduce and carefully motivate the bi-stable dynamic example problem. Section 4 introduces the classes of evolutionary algorithms and artificial immune systems we consider. For the artificial immune systems we consider a relatively new variant that in some sense hybridises artificial immune systems and evolutionary algorithms and exhibits improved performance in a number of circumstances. Section 5 contains our analysis presenting results for evolutionary algorithms and artificial immune systems for our benchmark problem in a wide variety of settings. We present theoretical results as well as thorough statistical studies of the results of experiments. This helps us to gain a deeper understanding and identify open problems. We summarise and show directions for future research in Section 6.

2  Analysing Randomised Search Heuristics on Dynamic Problems

A dynamic optimisation problem is one that changes over time. Formally, we model this by saying that the quality of a point in the search space, , is given by the function value f(x, t) at time step . Thus, a static optimisation problem can be described as a degenerated dynamic one where holds for all . Note that the time steps are an important property of the problem and as such are independent of the means to solve this problem, in particular independent of the heuristic optimisation method we plan to use.

This independence has important consequences. It implies that the speed of change of the dynamic optimisation problem (expressed as discrete time steps ) is not related to the speed at which our heuristic optimisation method is executed. For instance, if we employ a larger population in an evolutionary algorithm, we should assume that the speed of change of the optimisation problem appears to be faster when measured in the number of generations. The reason is that the speed of change of the optimisation problem is unchanged but each generation of the evolutionary algorithm now takes longer because of the larger population size. As we pointed out in Section 1.2, this has often been overlooked in the past.

In the analysis of randomised search heuristics one usually considers one evaluation of the objective function f to be an atomic event. Usually, the performance of a randomised search heuristic is measured in the number of such function evaluations. This is true for the perspective of runtime or optimisation time analysis (Jansen, 2013) as well as for the perspective of fixed budget computations where the computational budget is measured as the number of function evaluations (Jansen and Zarges, 2012).

We adopt this point of view and assume that evaluating the objective function once, that is, computing f(x, t) for any and any can be carried out in one time step of the dynamic objective function f. We argue that this makes sense because the computation of the function value f(x, t) is obviously connected to properties of the function. If one allowed the objective function to change faster, it would be difficult to see how a function is supposed to be optimisable because in the extreme case it can change arbitrarily during the time it takes to evaluate a single function value.

Clearly, it may be possible to compute f(x, t) in a time that is much faster than that required by a complete time step of the dynamic objective function. We characterise the speed of the computation platform that executes the heuristic optimiser and computes function values by the number of function evaluations it can make in one time step of the dynamic objective function.

Definition 1:

Let be the number of function evaluations that can be carried out between time steps t and . This number is independent of t and characterises the speed of the execution platform.

Having defined the speed of an execution platform , we can now investigate two different kinds of questions. On the one hand, we can investigate how the speed of the execution platform influences the performance of a fixed optimiser. We expect the performance to increase with increasing speed of the execution platform. Finding out how this happens can help to understand when it makes sense to invest in better hardware when being confronted with a dynamic optimisation problem. On the other hand, we can investigate how different optimisers or the same optimiser with different parameter settings perform on the same dynamic problem when executed on the same execution platform, namely, on an execution platform with a fixed speed .

When considering optimisers that work in rounds (or generations), as evolutionary algorithms and artificial immune systems do, it is important to see if the dynamic optimisation problem can change within one such round. As pointed out in Section 1.2, almost all previous work assumes that this does not happen (with the exception of the analysis by Branke and Wang, 2003). In this article we also work under the assumption that no change happens within one round. This means that needs to be sufficiently large so that one round can be executed completely (or, the other way around, that the optimiser needs to be parameterised so that it performs at most function evaluations per round). We make the assumption that optimisers are unaware of the time steps and do not know when the dynamic optimisation problem may change. The execution platform makes sure that one round of an optimiser is carried out in one time step of the dynamic optimisation problem so that the values of search points do not change during one round. If the optimiser makes function evaluations in each round, then the execution platform performs rounds in one time step of the dynamic optimisation problem.

As mentioned in Section 1, we adopt the fixed budget perspective for our analysis. We believe it makes more sense to have a statement about the performance of an optimiser tackling a dynamic optimisation problem at any point of time than to concentrate on one specific aspect like the first hitting time of a (potentially moving) global optimum. Instead of considering the function values, as is common in fixed budget analyses (Jansen and Zarges, 2012), we consider the difference in function value to the optimal function value. This slight change of perspective allows for a somewhat more natural formulation of performance. Note that the following definition formalises the notion of efficiency roughly and only makes sense for objective functions with certain properties. It makes the implicit assumption that it is very easy to find search points with a distance of in function value to the optimal value and becomes increasingly difficult to improve over that.

Definition 2:

For , let f(x, t) denote the function value in time step t of the dynamic optimisation problem and let g denote the round (or generation) of an optimiser for f. Let denote the number of function evaluations that the optimiser makes in round g, let , , …, denote the search points that it evaluates in this round. Let denote the optimal function value in time step t. We define the distance in function value to the optimal value in round g as .

We say that the optimiser has perfect performance in generation g if holds. We say that its performance is good in generation g if it does not perform perfectly but if holds. We say that its performance is mediocre in generation g if its performance is neither perfect nor good but if holds. We say that its performance is bad in generation g if holds.

Note that Definition 2 makes use of the distance in most cases but considers in the case of bad performance. Since we consider randomised search heuristics (namely, evolutionary algorithms and artificial immune systems), is a random variable. Therefore, we make statements about performance that is better than bad by giving bounds on the probability for such a performance.

3  A Dynamic Example Problem

The example problem we introduce is inspired by the example problem , the most commonly studied example problem when analysing the performance of randomised search heuristics in static optimisation. Like , it has a very simple structure that facilitates analysis and understanding while having properties that are natural in some sense. It is a bi-stable function, that is, it oscillates between two different global optima, where it is stable for some time. In phases of change the change is rapid.

The function is a pseudo-Boolean function, that is, it operates on bit strings. At any point in time t the function value is given as the number of bits where a bit string agrees with the current global optimum. We see that the function values are always integers between 0 and n and that n is the global maximum. This is precisely the same as a generalised where the unique global optimum is some fixed bit string. For our example function the two stable global optima are o and its bitwise complement, , where the bit string is a parameter of the function.

The length of the stable phases is also a parameter of the function, called . The search point o is the unique global optimum for a duration of steps. After this stable phase the global optimum moves gradually in a random but orderly fashion toward its bitwise complement, . Once is the unique global optimum, the function is stable again for time steps. After this stable phase the global optimum moves gradually back to o, structurally using the same path but avoiding repetition of any intermediate points. In the nonstable phases where the optimum moves, we have it move by changing exactly one bit in one time step. We see that this implies that the example function has a period of length , i. e., for all and all . We use binary masks to define the transition formally in Definition 3 similarly to the way Yang and Yao (2005) introduced binary masks to define dynamic problem generators.

In the transition where the global optimum moves from o to , the bits change their values in an order so that the bits equal to o always form one contiguous block. One could also consider a somewhat simpler variant of this function where the next bit to change its value is selected uniformly at random among the bits that have not yet changed their value. While for mutation-based evolutionary algorithms such a change in the dynamic problem is unimportant, it can have consequences for other algorithms. Those algorithms potentially affected include evolutionary algorithms with k-point crossover and some artificial immune systems, notably the B-cell algorithm. Since we consider the latter, we consider the variant where unchanged bits form a contiguous block.

We define the example function that we call (Bi-Stable Optimisation problem) formally and precisely in the following. For this we make use of the well-known notation for the concatenation of letters. For a letter , and a length we define bi as the concatenation of i copies of b, for example, . For we obtain the empty word, namely, . As usual, holds. We allow the concatenation of such expressions, for instance, . For two bits let denote the exclusive OR of u and v, that is, if , and if . For two bit strings of equal length n let denote the bitwise exclusive OR of x and y, for example, . Finally, let denote the Hamming distance of x and y, that is, . Note that denotes the bit at position i in and that the leftmost position is .

Definition 3:
For , , and we define the bi-stable optimisation problem . We define the cycle length . For let denote the time index in the current period. Let be the unique global optimum at time step t. We define with the help of transition masks that are defined later.
formula
Given , we define .

To define the transition from o to we use transition masks (for ), where denotes the number of the period (so that the random transition is potentially different in each period). Note that with we have . When we define for some set , this means that is selected uniformly at random from .

We define . For and we define
formula

The definition of the transition masks ensures that the unique global optimum moves from o to in a very specific way. The bits where o and differ are always in one contiguous block if one allows for blocks that are wrapping around, that is, not ending at the right end of the bit string but being considered as continuing at the beginning. This can help artificial immune systems, which make use of contiguous hypermutations (Kelsey and Timmis, 2003), since those mutations always flip some contiguous blocks of bits and have a much better chance of performing such a mutation than standard bit mutations that are used in evolutionary algorithms.

To further clarify the definition of we present an example for , o = 0110, and . We consider the location of the unique global optimum for each step of the first period, that is, for each with in our example. To do this we need to fix the random transition masks and decide , , , . Note that the first mask is selected uniformly at random from the n masks with exactly one 1-bit and the other masks are selected uniformly at random from the two possible masks that extend the current block of 1-bits either to the left and right. The final mask, , was not actually random, since there was only one choice left. We depict the sequence of unique global optima in Figure 1.

Figure 1:

Sequence of unique global optima for with random transition masks , , , . The stable phases are marked in gray. Bits in the global optima differing from the predecessor are printed in bold and red.

Figure 1:

Sequence of unique global optima for with random transition masks , , , . The stable phases are marked in gray. Bits in the global optima differing from the predecessor are printed in bold and red.

Using the same visualisation as in Figure 1, we depict an abstract version of the way the unique global optimum moves in Figure 2, this time for two complete periods. It also shows the change from the transition masks from to from the first to the second period. The random choice of the transition masks indicates that it is much more useful to remember the unique global optima from the stable phases, o and its bitwise complement , than it is to remember the intermediate points.

Figure 2:

Sequence of unique global optima for for bit string length n, first cycle and first global optimum of second cycle. The stable phases are marked in gray.

Figure 2:

Sequence of unique global optima for for bit string length n, first cycle and first global optimum of second cycle. The stable phases are marked in gray.

4  Evolutionary Algorithms and Artificial Immune Systems

We consider one evolutionary algorithm and two variants of an artificial immune system. The evolutionary algorithm is known as (+) evolutionary algorithm (EA), uses a population of size , uniform selection for reproduction, generates independently identically distributed offspring by means of standard bit mutations (Algorithm 1) with mutation probability (and no crossover) and employs plus-selection to select the population for the next generation (see, e.g., Jansen, 2013). A formal description is given as Algorithm 2.

formula
formula

The artificial immune system is the B-cell algorithm as introduced by Kelsey and Timmis (2003). It uses a population of size , generates clones for each member of the population, and applies somatic contiguous hypermutations (Algorithm 3) to all and additionally standard bit mutation (Algorithm 1) to one of them. It applies plus-selection between each member of the population and its clones. A formal description is given as Algorithm 4.

formula
formula

We additionally consider a variant of the B-cell algorithm suggested by Jansen et al. (2011). The only difference is that somatic contiguous hypermutations are applied to the individual undergoing standard bit mutation with probability . Thus, with probability one of the offspring is subject to standard bit mutations only. To obtain a formal description, we replace lines 6–11 in Algorithm 4 by Algorithm 5. Usually we want the probability to be some positive constant strictly less than 1. For experiments we use and invite the reader to think of as this value in all contexts. For theoretical results we are more general and mention for what range of probabilities the statements hold.

formula

We summarise the three algorithms considered in this work.

Definition 4:

In the following, we refer to Algorithm 2 simply as EA. We call Algorithm 4 BCA and denote Algorithm 4 using Algorithm 5 as BCA.

For the algorithms we assume that whenever a fitness value is computed this is automatically translated to the appropriated evaluation with parameters o and and correct current time step t without the need for the algorithm to be aware of the values of o, and t. We keep track of time inside the algorithms by means of a generation counter g.

As discussed in Section 2, we assume that the execution platform ensures that values of search points do not change during one round, that is, one generation can be carried out within one time step of the dynamic optimisation problem. For the algorithms considered here this means that we need to have for the BCA and BCA and for the EA because these are the numbers of function evaluations per generation for the two algorithms.

5  Analysis of Evolutionary Algorithms and Artificial Immune Systems for the Dynamic Bi-Stable Example Problem

5.1  Heuristic Dynamic Optimisation with a Slow Execution Platform

We start our analyses by considering slow execution platforms with and compare the three algorithms introduced in Section 4 for three different lengths of the stable interval in this setting: , , and . Note that implies and , since we assume that one generation can be carried out within one time step of the dynamic optimisation problem.

5.1.1  Short Stable Intervals

We show that all three algorithms fail to catch up with the global optimum if the stable phase is short, namely, . We start with an analysis of the EA and then transfer the theoretical results to the two variants of the BCA.

Theorem 5:

Let , , , and with . The EA is always bad in the first steps with probability converging to 1.

Proof:

Since , all initial search points have Hamming distance to the optimum with probability exponentially close to 1. We first consider the situation during a stable phase of length , that is, while the optimum does not move. Let d denote the current Hamming distance to the optimum. The expected decrease in distance in one mutation is bounded above by . For we have that for some appropriately chosen constant . We see that the expected decrease in distance in mutations, namely, n generations, is bounded above by . Application of Chernoff bounds yields that the probability to decrease the distance by at least is bounded above by for all positive constant . We can choose , which guarantees that and both hold. For the probability is exponentially small and the claim follows.

Outside the stable phases the optimum is moving. If the optimum is moving away from the current search point, it is even more unlikely to decrease the Hamming distance to . For the case where it moves toward the current search point, assume that the optimum moves from o to (the other case is symmetric). We consider all points with equal distance from the search point o. Because of symmetry all these points have equal probability to become the current search point. The moving global optimum will hit exactly one of these points. Since the current search point has linear Hamming distance to o, there are exponentially many such search points, and it is exponentially unlikely that the global optimum decreases the Hamming distance to the current search point below for some sufficiently small constant .

Theorem 6:

Let , , , and with . The BCA is always bad in the first steps with probability converging to 1.

Proof:

Again, after initialisation all search points have Hamming distance to the optimum with probability exponentially close to 1. For the BCA we observe that the probability for any specific mutation is . This implies that the probability to decrease the Hamming distance to the global optimum by any constant is significantly smaller than for the EA. Since the initial search point is selected uniformly at random, the probability that the Hamming distance can be decreased by is superpolynomially small. When optimising this does not change significantly (Jansen and Zarges, 2011). This implies that the BCA performs at most as well as the EA.

Theorem 7:

Let , , , and with and . The BCA is always bad in the first steps with probability converging to 1.

Proof:

The proof of Theorem 6 showed that the probability to decrease the Hamming distance by by means of contiguous hypermutations is superpolynomially small. Additionally applying one standard bit mutation to one of the offspring does not change this. It is not important if we apply only standard bit mutation or standard bit mutation together with contiguous hypermutations. Therefore, the value of is not important and the results follow in the same way.

The preceding theorems prove that all three algorithms perform badly in the considered setting and fail to catch up with the global optimum. However, the theoretical results obtained are rather abstract and only provide a coarse picture of the situation at hand. We therefore consider the results of experiments to provide a more concrete and clear picture. All experiments consist of 100 independent runs of the three considered algorithms for , , and . For the BCA we set . We display average fitness function values over these runs in Figure 3a. Note that the x-axis displays time steps as defined by the dynamic problem, not generations or function evaluations. In one time step, up to function evaluations are made (potentially wasting function evaluations if there are not enough function evaluations left in the current time step to accommodate a complete generation). The algorithms we consider perform either function evaluations (the EA) or function evaluations (the BCAs) per generation. Since we have and here we do have that time steps and generations coincide in this case.

Figure 3:

Visualisation of experiments showing average fitness values over time over 100 independent runs for all three algorithms on for , , , , and different values of . Vertical lines indicate start and end of stable intervals.

Figure 3:

Visualisation of experiments showing average fitness values over time over 100 independent runs for all three algorithms on for , , , , and different values of . Vertical lines indicate start and end of stable intervals.

We see that all three algorithms are bad as predicted by Theorems 5, 6, and 7 and are not able to reduce the distance to 20 or below (corresponding to fitness 80 or above). We see that the EA clearly gets closer to the global optimum in the stable phases than the BCA and has worse fitness when the optimum moves rapidly. Note that our theoretical results are too coarse to reveal these differences. The BCA seems to combine the advantages of the other two algorithms and thus performs best with respect to the observed average function values.

In order to investigate the significance of these experimental results we performed Wilcoxon signed rank tests (Lehmann, 2006) for each pair of algorithms and each iteration. Because of the large number of tests for each pair, we perform Holm-Bonferroni correction (Holm, 1979) and depict the resulting p-values in Figure 4a along with the standard confidence level of .

Figure 4:

Visualisation of experiments showing p-values of the Wilcoxon tests after Holm-Bonferroni correction over 100 independent runs for all three algorithms on for , , , , and different values of (see Theorems 5, 6, and 7). Vertical lines indicate start and end of stable intervals.

Figure 4:

Visualisation of experiments showing p-values of the Wilcoxon tests after Holm-Bonferroni correction over 100 independent runs for all three algorithms on for , , , , and different values of (see Theorems 5, 6, and 7). Vertical lines indicate start and end of stable intervals.

Here and in the following, the diagrams showing the results of the Wilcoxon signed rank tests confirm that, roughly speaking, differences in functions values that are clearly visible tend to be statistically significant. In time steps where the function values of the potential solutions are very similar or even intersect, there are no statistically significant differences, of course. The interested reader can see the details in those plots.

5.1.2  Long Stable Intervals

We now consider a longer stable period, that is, . We see that this length is still not sufficiently long for the BCA; however, both the EA and the BCA are now able to catch up with the global optimum.

Theorem 8:

Let , , , and with . The BCA is always bad in the first steps with probability converging to 1.

Proof:

The statement follows from the proof of Theorem 6. There we argued that the expected decrease in distance for the BCA is per generation. Thus, the expected decrease in generations is bounded above by and the result follows, since in the phases where the optimum moves, it is increased with probability converging to 1, as shown in the proof of Theorem 6.

Theorem 9:

Let , , , and with . The EA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes bad again in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

The statement about the repeating behaviour is a direct consequence of the first statement when applying the union bound. We see that the EA becomes perfect after steps and remains perfect as long as the optimum does not change, as in the proof of Theorem 5. Now consider the subsequent n steps where the optimum changes by 1 bit in each of the steps. If the Hamming distance to the global optimum and the current search point is d, the probability that the EA is able to decrease the Hamming distance is bounded above by . We see that in n steps a Hamming distance is reached with probability exponentially close to 1. After the global optimum has reached either o or , we are in a simulation very similar to the one after initialisation and we can repeat the argument.

Theorem 10:

Let , , , and with and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes bad again in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

We consider only the offspring that are created with application of standard bit mutation. With probability such an offspring is not also subject to contiguous hypermutations. For those offspring the probability distribution is identical to the offspring in the EA. Since all values involved (the speed of the execution platform , the number of individuals , and offspring ) are constants, the introduction of the factor does not change anything significantly. Therefore, the statements about becoming and remaining perfect follow from the proof of Theorem 9. We have already seen in the proof of Theorem 8 that contiguous hypermutations do not help in avoiding becoming bad. Therefore, the complete statement follows.

We again perform experiments to get a more complete picture and depict the results in Figures 3b and 4b in the same way as before. We see that the BCA is still not able to keep up with the global optimum. While the EA reaches the global optimum just before the end of the stable phase, the BCA is not quite as successful. On first sight this seems to be a contradiction to Theorem 10. However, we stress that all our results are asymptotic and thus can still be smaller than for . We remark that for we obtain the results predicted by Theorem 10. We omit a visualisation of these results because of space restrictions.

5.1.3  Very Long Stable Intervals

Finally, we consider very long stable intervals, that is, . It does not come as a surprise that in settings where an algorithm was already good for shorter stable intervals this continues to be the case here. Moreover, the longer stable phases allow the BCA to catch up with the global optimum, too, whereas the stable phase of length was too short for this. All results here are quite direct consequences from results in the earlier sections.

Corollary 11:

Let , , , and with . The EA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes bad again in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

Follows directly from Theorem 9, since a longer stable interval cannot decrease the performance of the algorithm.

Theorem 12:

Let , , , and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre or bad in the next n steps with probability converging to 1. We call this a phase. In the next phases this behaviour is repeated in phases with probability converging to 1.

Proof:

Theorem 8 by Jansen and Zarges (2011) proves an upper bound of for the BCA on . This implies that the BCA becomes perfect after steps and remains perfect for the remaining steps of the stable phase. That it becomes mediocre or bad in the next n steps, when the optimum changes rapidly---this follows from the proof of Theorem 6. Since all this happens with probability very close to 1, the statement about the repetition in a polynomial number of phases follows.

Corollary 13:

Let , , , and with and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes bad again in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

Follows directly from Theorem 10, since a longer stable interval cannot decrease the performance of the algorithm.

Again we perform experiments to get a more complete picture and depict the results for the most interesting time steps (around the unstable interval) in Figures 3c and 4c. The experiments match the predictions in our theorems.

5.2  Heuristic Dynamic Optimisation with a Fast Execution Platform

In this section we consider the situation when we have a faster execution platform. We consider the situation when the execution platform is able to make function evaluations in one time step of the dynamic optimisation problem. This allows us to consider the algorithms with larger number of search points and study the effects of this. The comparison is fair and meaningful, since all algorithms have the same computational budget each time the dynamic problem has the chance to change. In this article we restrict our attention to larger offspring population size and leave the number of search points an algorithm uses as bases for its search restricted to . Studying the effects of larger populations is beyond the scope of this article. We conjecture that having larger values for only makes sense with either more complex dynamic optimisation problems or when one additionally employs mechanisms to make the population maintain some level of diversity (or both). See work by Oliveto and Zarges (2015) for an example of such a study.

5.2.1  Short Stable Intervals

We begin with the consideration of very small offspring populations, that is, . This means we invest the increased speed of the execution platform into having more generations per time step.

Theorem 14:

Let , , , and with . The EA is always perfect after the first step for steps with probability converging to 1.

Proof:

It is well known that the expected optimisation time of the EA with on is and that the probability not to be finished in generations is exponentially small (Jansen, 2013). This implies that the EA performs perfectly after one step with overwhelming probability. After this it will remain perfect while the optimum does not move. When the optimum moves, it changes by 1 bit in each step. In a single step the EA generates offspring by means of standard bit mutation. The probability that none of these equals the new optimum is bounded above by . Thus, the probability to be always perfect for the next steps is .

Theorem 15:

Let , , , and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes bad again in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

In each step the BCA performs generations. In the beginning we have a stable phase and is -like for steps. It is known that the BCA optimises this function on average and with probability very close to 1 in generations (Jansen and Zarges, 2011). Thus, it reaches the optimum in a stable phase within the first steps with probability converging to 1. The second part of the statement follows from the proof of Theorem 6. When the optimum starts moving, the probability to decrease the Hamming distance by 1 in one generation is . Thus, the BCA remains perfect only with probability . Remember that also for larger Hamming distances the expected decrease in distance is bounded by . Thus, with probability converging to 1, the Hamming distance becomes linear in n steps.

Theorem 16:

Let , , , , and with with . The BCA is always perfect after the first step for steps with probability converging to 1.

Proof:

The proof makes use of the same idea as the proof of Theorem 10. In each generation we consider only the offspring created by means of standard bit mutation and that with probability is not also subject to a contiguous hypermutation. Again, the introduction of this constant factor does not change anything significantly and the result is a direct consequence of Theorem 14.

We again perform experiments and present their results in Figures 5a and 6a.

Figure 5:

Visualisation of experiments showing average fitness values over time over 100 independent runs for all three algorithms on for , , , , and different values of . Vertical lines indicate start and end of stable intervals.

Figure 5:

Visualisation of experiments showing average fitness values over time over 100 independent runs for all three algorithms on for , , , , and different values of . Vertical lines indicate start and end of stable intervals.

Figure 6:

Visualisation of experiments showing p-values of the Wilcoxon tests after Holm-Bonferroni correction over 100 independent runs for all three algorithms on for , , , , and different values of . Vertical lines indicate start and end of stable intervals.

Figure 6:

Visualisation of experiments showing p-values of the Wilcoxon tests after Holm-Bonferroni correction over 100 independent runs for all three algorithms on for , , , , and different values of . Vertical lines indicate start and end of stable intervals.

The alternative to sticking with small offspring population sizes and having as many generations per time step as possible is to increase the offspring population size. Of course, the execution platform needs to be fast enough to execute at least one complete iteration of the three considered algorithms. The number of function evaluations for the EA equals (since we have ), and for the two BCA variants it equals (since we have ). This implies that needs to hold. We restrict our attention to the extreme case where , that is, to the case where the offspring population size is so large that the number of generations per time step is bounded above by a constant.

Theorem 17:

Let , , , and with . The EA is always perfect after the first steps for the next steps with probability converging to 1.

Proof:
We consider the situation directly after initialisation and denote with d the current Hamming distance to the optimum. Clearly, holds. The probability to decrease the Hamming distance by at least 1 in a single generation of the EA with is at least
formula
and the probability to see up to such decreases in d subsequent generations is at least . Thus, after at most n generations, the EA is perfect. Since the EA executes n generations within the first steps, that is, before the optimum starts moving, the first claim follows.

Outside the stable phases the optimum is moving, however, in each step only a single bit of the current optimum changes. Thus, again with probability , the EA is able to catch up with this change and the probability to be always perfect for the next steps is .

For the BCA things are rather tight for this kind of setting. We start with proving a statement about the expected number of iterations the BCA with needs to optimise . We use this result to establish a result for the BCA in a setting that is a bit weaker than what we normally consider.

Lemma 18:

The BCA with and finds the optimum of OneMax in expected iterations.

Proof:
Let d denote the Hamming distance of the current search point to the global optimum. Then, a single contiguous hypermutation decreases the Hamming distance with probability . The probability that at least one of the offspring decreases the Hamming distance is . Using standard fitness-level argument and , we derive an upper bound on :
formula
where Hn is the nth harmonic number. With we get .
Theorem 19:

Let ( constant), , , and with . If is sufficiently large, the BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre or bad in the next n steps with probability converging to 1. In the next steps this behaviour is repeated with probability converging to 1.

Proof:

After initialisation the Hamming distance to the global optimum is with probability exponentially close to 1. We know that BCA becomes perfect in iterations if the optimum does not move. If the stable phase has length big enough, this will happen and the BCA remains perfect for the remaining steps of the stable phase.

At the end of the stable phase the optimum starts to move by 1 bit per step. As long as the BCA is good, the Hamming distance between the optimum and the current search point of the BCA is , and for a single offspring the probability to exactly hit the optimum is . Thus, for the probability to find the optimum is bounded by . We conclude that the BCA becomes mediocre or worse.

With small offspring population size, , we have seen that the BCA had a performance comparable to that of the EA and much better than that of the BCA. The reason is that on a -like function standard bit mutations are much more efficient in reducing an already small Hamming distance to the global optimum further than contiguous hypermutations. Since the BCA uses only standard bit mutation for one of the offspring with probability , the expected fraction of offspring that have equal probability distribution in the EA and the BCA equals . With and this is a constant fraction of the offspring population. Since we only perform an asymptotic analysis, it is not surprising that the performance of the EA and the BCA are comparable. With the fraction shrinks to and we can expect to see significant differences.

Theorem 20:

Let ( constant), , , and with . If is sufficiently large, the BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre or bad in the next n steps with probability converging to 1. In the next steps this behaviour is repeated with probability converging to 1.

Proof:

If we ignore the one offspring in each generation that is created by standard bit mutation, the result follows from Theorem 19. Now we consider this offspring. We see that the probability that this offspring is able to follow the moving optimum in a single step is . However, there is only one such offspring and therefore the probability that any of the offspring (including the one that is created by standard bit mutation) locates the optimum is bounded by . Therefore the result follows from Theorem 19.

We again perform experiments and present their results in Figures 7a and 8a. We see that already (i. e., c = 1) is sufficiently large for the BCA and the BCA to locate the optimum during the stable phases.

Figure 7:

Visualisation of experiments showing average fitness values over time over 100 independent runs for all three algorithms on for , , , , , and different values of . Vertical lines indicate start and end of stable intervals.

Figure 7:

Visualisation of experiments showing average fitness values over time over 100 independent runs for all three algorithms on for , , , , , and different values of . Vertical lines indicate start and end of stable intervals.

Figure 8:

Visualisation of experiments showing p-values of the Wilcoxon tests after Holm-Bonferroni correction over 100 independent runs for all three algorithms on for , , , , , and different values of . Vertical lines indicate start and end of stable intervals.

Figure 8:

Visualisation of experiments showing p-values of the Wilcoxon tests after Holm-Bonferroni correction over 100 independent runs for all three algorithms on for , , , , , and different values of . Vertical lines indicate start and end of stable intervals.

5.2.2  Long Stable Intervals

We now consider the situation when the length of the stable interval is considerably longer, , giving the algorithms a much better chance to catch up with the global optimum. Clearly, algorithm performance can only improve in comparison to shorter stable intervals. We start our investigation with small offspring population size where the performance was already quite good with much shorter stable phases.

Corollary 21:

Let , , , and with . The EA is always perfect after the first step for the next steps with probability converging to 1.

Proof:

The statement is a direct consequence from the proof of Theorem 14, since making the stable phase longer by a polynomial number of steps cannot adversely affect the performance of the algorithm.

Theorem 22:

Let , , , and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre or bad in the next n steps with probability converging to 1. We call this a phase. In the next phases this behaviour is repeated in phases with probability converging to 1.

Proof:

The proof is similar to the proof of Theorem 9. The bound of for the time needed to become perfect follows from the expected optimisation time of the BCA on (Jansen and Zarges, 2011) and the fact that the BCA performs generations per time step. The probability to reduce the Hamming distance by d is bounded by . The probability to see such an event in n steps before the Hamming distance is is bounded above by . The expected number of phases where we see this behaviour when considering phases is bounded below by . Application of Chernoff bounds yields the result.

Corollary 23:

Let , , , , and with with . The BCA is always perfect after the first step for steps with probability converging to 1.

Proof:

The statement is a direct consequence of Theorem 16, since making the stable phase longer by a polynomial number of steps cannot adversely affect the performance of the algorithm.

We again perform experiments and present their results in Figures 5b and 6b.

We now consider the case of using larger offspring populations. As before we concentrate only on the extreme case, .

Corollary 24:

Let , , , and with . The EA is always perfect after the first steps for the next steps with probability converging to 1.

Proof:

This is a direct consequence of Theorem 17 since a longer stable interval cannot decrease the performance of the algorithm.

For the BCA with the behaviour changes considerably.

Theorem 25:

Let , , , and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre or bad in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

The first part of the theorem follows directly from Lemma 18, since the first steps after initialisation are stable and the underlying problem corresponds to OneMax. Once the BCA is perfect in a stable phase, it remains stable until the end of the stable phase.

After the stable phase the optimum is moving, and in each step the Hamming distance of the new global optimum to the global optimum in the stable phase is increased by 1. Let d denote the Hamming distance after d such steps. As in the proof of Lemma 18, the probability to decrease the Hamming distance by at least 1 is . For this is and thus with probability converging to 1, d will not increase beyond between two stable phases.

We repeat these arguments to conclude the theorem.

Theorem 26:

Let , , , (with ), and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre or bad in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

If we ignore the offspring created using standard bit mutation, the result is a direct consequence of Theorem 25. Taking this one offspring per generation into account can only make things better. First we observe that it cannot speed up the expected number of generations needed to become perfect because this number is already and standard bit mutations need steps to achieve this. Since in the phase of change the optimum moves in each step by 1 bit and we only have generations per step, this single standard bit mutation is insufficient to keep up with it.

We again perform experiments and present their results in Figures 7b and 8b.

5.2.3  Very Long Stable Intervals

Here, we consider the same settings as before but now with very long stable intervals, concrete with . We have already seen that longer stable phases imply better performance. As before we begin with very small offspring population sizes.

Corollary 27:

Let , , , and with . The EA is always perfect after the first step for the next steps with probability converging to 1.

Proof:

This is a direct consequence of Corollary 21, since a longer stable interval cannot decrease the performance of the algorithm.

Corollary 28:

Let , , , and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre or bad in the next n steps with probability converging to 1. We call this a phase. In the next phases this behaviour is repeated in phases with probability converging to 1.

Proof:

This is a direct consequence of Theorem 22 (longer stable interval).

Corollary 29:

Let , , , (with ), and with . The BCA is always perfect after the first step for steps with probability converging to 1.

Proof:

This is a direct consequence of Corollary 23 (longer stable interval).

Here, we also consider what happens if we invest into a larger offspring population size at the expense of the number of generations. As before we restrict our analysis to the extreme case .

Corollary 30:

Let , , , and with . The EA is always perfect after the first steps for the next steps with probability converging to 1.

Proof:

This is a direct consequence of Corollary 24 (longer stable interval).

Corollary 31:

Let , , , and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

This is a direct consequence of Theorem 25 (longer stable interval).

Corollary 32:

Let , , , (with ), and with . The BCA becomes perfect after steps, remains perfect for the remaining steps of the stable phase, and becomes mediocre in the next n steps with probability converging to 1. This behaviour is repeated in the next steps with probability converging to 1.

Proof:

This is a direct consequence of Theorem 26 (longer stable interval).

5.3  Heuristic Dynamic Optimisation with a Very Fast Execution Platform

Now we consider an even faster execution platform by considering function evaluations in one time step of the dynamic optimisation problem. We consider the same lengths of the stable interval and the same offspring population sizes as before. We additionally analyse the extreme case . The case is now an intermediate case.

5.3.1  Short Stable Intervals

We start with the case and gradually increase the offspring population size. Most of the theoretical results follow directly from previous theorems, since increasing the speed of the execution platform does not decrease the performance of the considered algorithms.

Corollary 33:

Let , , , and with . The EA is always perfect after the first step for steps with probability converging to 1.

Proof:

This is a direct consequence of Theorem 14, since increasing the speed cannot decrease the performance of the algorithm.

Theorem 34:

Let , , , and with . The BCA is always perfect after the first step for the next steps with probability converging to 1.

Proof:

For the BCA’s performance after the first step and during the stable phase it suffices to remember that the expected optimisation time of the BCA with on is and that the probability not to be finished in generations is exponentially small (Jansen and Zarges, 2011). Analogously to the proof of Theorem 14 the probability that no offspring equals the new optimum is bounded above by . Thus, the probability to be always perfect for the next steps is .

Corollary 35:

Let , , , , and with with . The BCA is always perfect after the first step for steps with probability converging to 1.

Proof:

This is a direct consequence of Theorem 16, since increasing the speed cannot decrease the performance of the algorithm.

Next, we consider .

Corollary 36:

Let , , , and