## Abstract

There has been renewed interest in modelling the behaviour of evolutionary algorithms (EAs) by more traditional mathematical objects, such as ordinary differential equations or Markov chains. The advantage is that the analysis becomes greatly facilitated due to the existence of well established methods. However, this typically comes at the cost of disregarding information about the process. Here, we introduce the use of stochastic differential equations (SDEs) for the study of EAs. SDEs can produce simple analytical results for the dynamics of stochastic processes, unlike Markov chains which can produce rigorous but unwieldy expressions about the dynamics. On the other hand, unlike ordinary differential equations (ODEs), they do not discard information about the stochasticity of the process.

We show that these are especially suitable for the analysis of fixed budget scenarios and present analogues of the additive and multiplicative drift theorems from runtime analysis. In addition, we derive a new more general multiplicative drift theorem that also covers non-elitist EAs. This theorem simultaneously allows for positive and negative results, providing information on the algorithm's progress even when the problem cannot be optimised efficiently. Finally, we provide results for some well-known heuristics namely Random Walk (RW), Random Local Search (RLS), the (1+1) EA, the Metropolis Algorithm (MA), and the Strong Selection Weak Mutation (SSWM) algorithm.

## 1 Introduction

Nowadays, the theoretical study of randomised heuristics mainly focuses on rigorous runtime analysis; however, we can trace back literature about general models of evolutionary computation to Holland (1975). Roughly a decade later, Goldberg (1987) focused on modelling the evolving population of a genetic algorithm (GA). This was extended by Vose (1995) who used a Markov chain approach and fixed points analysis. In the same spirit of dynamical systems, we can find an analysis for steady state GAs by Wright and Rowe (2001).

However, Schema theory was proven wrong (Altenberg, 1994) and Markov chain approaches are, in general, intractable (He and Yao, 2003). The remaining mentioned techniques share a common rationale, the stochastic process describing an EA varies according to its expected change. The same idea is used in the so-called ordinary differential equation (ODE) method by Wormald (1995). Here, the EA is considered as a noisy version of an ODE that tracks the expected rate of change. This method has been applied several times in EC (see, e.g., Yin et al., 1995 or Akimoto et al., 2012). Due to this deterministic approach of the expected change, these methods disregard the random component of EAs.

In this article we follow up on a new method presented by Paixão and Pérez Heredia (2017) based on stochastic differential equations. This approach naturally includes the stochasticity implicit in randomised heuristics through the well established tools from Itô calculus. SDEs have already been used in EC for convergence analysis by Schaul (2012); however, the results did not improve the state of the art.

Other modelling attempts such as the application of statistical physics by Prügel-Bennett and Shapiro (1994) yielded highly accurate results when modelling finite populations and fluctuations were also modelled by Prügel-Bennett (1997). However, this method requires that the algorithm's selection operator weights each search point according to the Boltzmann distribution. This is the case for algorithms like Metropolis or SSWM but it will not hold in general. On the other hand, this method and some of the previously mentioned approaches were tackling more general and ambitious problems than what this article covers.

Stochastic differential equations have proven to be a powerful tool when dealing with stochastic processes (see Øksendal, 2003 for many examples). SDEs have accomplished great achievements in physics, starting with the explanation of Brownian motion by Einstein (1905). In finance, especially for Black and Scholes (1973) work on options prices that yielded a Nobel Prize in Economics in 1997. SDEs have also been used in the more related field of Population Genetics (PG) to track the frequency of genes in a population. This approach was used by Kimura (1957) and has ever since been an integral part of the so-called modern synthesis. Many classical results were obtained through this approach, including the steady state distribution of gene frequencies of a finite population under mutation and selection by Kimura (1964), and the probability of fixation of a gene by Kimura (1968). Our work also contributes to the renewed interest of unifying EC and PG (Paixo et al., 2015; Pérez Heredia et al., 2017).

Building on top of the so-called *diffusion approximation*, we derive an SDE that will be used to model the dynamics of EAs. For some scenarios we will be able to solve this equation analytically. We show that this is the case by obtaining equivalent statements to the well-known additive drift by He and Yao (2001) and multiplicative drift by Doerr et al. (2012). Furthermore, we will derive a new version of the multiplicative drift, extending its applicability to non-elitist algorithms even when the drift is pushing the algorithm away from the optimum. The work we present here has several aims:

- •
improving the understating of EA dynamics,

- •
deriving analytic expressions for the expectation and the variance,

- •
introducing SDE to the field of EC and

- •
translation of drift theorems from runtime analysis to fixed budget analysis.

This article focuses on a fixed budget analysis, first introduced by Jansen and Zarges (2012). In this perspective, the research question is to derive the state of the algorithm after some specific time, this approach gives more information about the process than the classical runtime analysis where the question is to find the time needed to optimise a problem. Despite being introduced recently, fixed budget is gaining importance in the theoretical community (Doerr et al., 2016, 2013; Jansen and Zarges, 2012; Lengler and Spooner, 2015; Nallaperuma, Neumann et al., 2017).

Although we build on top of the same mathematical tool, this article widely extends the extended abstract presented by Paixão and Pérez Heredia (2017). First, we will consider more elaborated algorithms such as Metropolis and SSWM. And secondly but more important, we present what to our best knowledge is the first fixed budget analysis of non-elitist EAs. Furthermore, our formulation considers any selection strength. This constitutes an important advantage since drift theorems require to bound the process by a positive and monotonic non-increasing distance function (He and Yao, 2001; Doerr et al., 2012; Johannsen, 2010). In the case of non-elitist EAs, this implies increasing the selection strength such that worsening moves are not very likely to be accepted. However, in our results the drift can even be negative and yet we will be able to provide the expected behaviour, and the variance.

The disadvantage of the SDE method, unlike runtime analysis or other fixed budget techniques, is that it requires performing two approximations. The first approximation consists of considering the system as a time-continuous process. In the second approximation, we will assume that the change of the process $Xt+1-Xt$ follows a distribution without higher statistical moments than the variance. Hence, this method, although is mathematically based, is not completely rigorous. Due to this reason, the last section of the article presents an extensive validation of these approximations. We will compare the obtained results against rigorous literature for runtime and fixed budget. In addition, we present a graphical comparison against the simulated results, and we will experimentally measure the error of the approximation.

The article is structured as follows. In the preliminaries section, we introduce the studied algorithms, fitness functions, a crash course on SDEs, and the diffusion approximation. Then, we derive three drift theorems for fixed budget: additive, multiplicative, and non-elitist multiplicative. In the applications section, we will apply these theorems for five different algorithms on problems. Finally, we present a validation section where we compare our results against the literature and the experimental error.

## 2 Preliminaries

### 2.1 Algorithms

In this article, we consider the maximisation of pseudo-Boolean functions $f:{0,1}n\u2192R$. Although we will be studying five different algorithms, they all fit in the family of trajectory-based algorithms. These heuristics evolve a single individual rather than using a population. The two principal operators that will specify the algorithm used within this class are *mutation* and *selection* (see Table 1). We will consider algorithms with two mutation operators:

- •
*Local mutations:*Flip a randomly chosen bit from $x$. - •
*Global mutations:*Flip each bit of $x$ independently at random with probability $1/n$.

As mentioned in the introduction, we are interested in algorithms that deviate from elitism. This property is obtained by the action of the selection operator through an acceptance probability $pacc:R\u2192[0,1]$ (see Figure 1). For the sake of clarity in the analysis, when analysing non-elitist EAs, we will consider only local mutations yielding a simple but general family of algorithms. Within this class of algorithms we can recover well-known heuristics depending on the acceptance probability. In following sections we present results for this general scheme and we will also introduce four widely extended selection operators to compare our results against the literature.

### 2.2 Fitness Functions

We will consider two of the most theoretically studied functions. First, the OneMax problem represents an easy hill climbing task. The goal is to maximise the number of bit matches with a specific bitstring which is the optimum. Since our algorithms are unbiased, they do not use problem knowledge; without loss of generality we can assume that the optimal search point is the all ones bitstring. Hence, OneMax just counts the number of 1-bits of a search point.

Let $x\u2208{0,1}n$, then $OneMax(x)=\u2211i=1nxi$.

LeadingOnes is a function that counts the number of 1-bits at the beginning of the bitstring. Using again the landscape metaphor, this problem represents a fitness ridge since bit matches do not always increase the fitness and losing one bit match can yield a huge fitness loss.

Let $x\u2208{0,1}n$, then $LeadingOnes(x)=\u2211i=1n\u220fj=1ixj$.

### 2.3 Stochastic Differential Equations

This subsection contains the minimal knowledge of SDEs to be able to understand this article. For more extended and rigorous details we refer the reader to the textbook by Øksendal (2003). Stochastic differential equations are equations that deal with the behaviour of stochastic processes. We can define these processes as a collection of random variables in $R$ over time $t$ (i.e., ${Xt}t\u22650$), together with an underlying probability space $(\Omega ,F,P)$ that from now on we will take for granted.

In the following, we will refer as Itô integrals to integrals over a Brownian process $\u222bgdBs$. Again, for a formal definition of these integrals and the set of functions $V(S,T)$ on where they are defined, we refer the reader to Øksendal (2003). These integrals will have the following properties (adapted from Theorem 3.2.1 in Øksendal, 2003).

$(PropertiesoftheItintegral):$ Let $f,g\u2208V(0,T)$ and let $0\u2264S<U<T$. Then

$\u222bSTfdBt=\u222bSUfdBt+\u222bUTfdBt$

$\u222bST(cf+g)dBt=c\xb7\u222bSTfdBt+\u222bSTgdBt$ (c constant)

$E\u222bSTfdBt=0$.

Performing a variable change on a stochastic integral is a bit more involved than for ordinary integrals. The usual rules do not apply and we will have to use the following statement (Theorem 4.1.2 in Øksendal, 2003) which is known as Itô's formula.

Finally, another useful property specially when computing higher moments, such as the variance, is the Itô Isometry (Corollary 3.1.7 in Øksendal, 2003).

### 2.4 The Diffusion Approximation

#### Are Evolutionary Algorithms Diffusive Processes?

The previous hypothesis would be a rigorous theorem if evolutionary algorithms were a pure diffusion process, but they are not. First of all, we used differential and not difference equations to model a discrete process. This approximation is intrinsic to the chosen method and carries an error of constant size. Secondly, the Gaussian approximation depends on each algorithm and the fitness function. The variable of interest is the transition probability $\Delta (\delta \u2223x)$ from Equation (6), which reads as the probability of reaching the state $x$ from the state $x-\delta $ in one iteration. It was shown before that the Gaussian approximation will carry an error of size $O(\delta 2)$. Hence, depending on the evolutionary operators of the EA this approximation will or will not be accurate.

To motivate the use of this approximation for the algorithms from Subsection 2.1, we can focus just on the mutation operator. The reason is that selection cannot increase the distance $\delta $ between the parent and child solutions. It will only decide if the new individual is accepted or rejected.

The question resides then in how big is the $\delta $ produced by mutation. To illustrate this, let us consider a random walk on the Boolean hypercube and its number of ones $x$. First, with local mutations, the RW will have a constant $\delta =1$. On the other hand, if global mutations are used $\delta $ can be any integer number in ${0,1,\cdots ,n}$. Fortunately, the probability of large mutations decays exponentially with their size $\delta $. Actually, the two smallest values of $\delta $ (0 and 1) will carry together a probability mass of approximately $2/e$. Hence, all the larger jumps together will only have a probability of $\u22481-2/e$. Since $\delta $ is concentrated, the higher order moments should be close to zero, suggesting that the Gaussian approximation is also accurate for global mutations. However, on some fitness problems all the relevant events could belong to the neglected probability mass of $1-2/e$. In this article, we consider only problems (OneMax and LeadingOnes) where, as we will see, global mutations are decently Gaussian approximated. Nevertheless, at the end we provide an extensive validation for Hypothesis ^{7}.

## 3 Drift Theorems for Fixed Budget Analysis

The Diffusion Approximation (Hypothesis ^{7}) considers a scenario which in general does not have an analytic solution. This will be dependent on the mathematical expressions of the drift $b(Xt,t)$ and diffusion $\sigma (Xt,t)$ coefficients. In this article we will consider three simple scenarios where we can obtain equivalent results to drift theorems for runtime analysis.

The methods to solve SDEs are similar to those to solve ODE, but we have to use the Itô formula (Theorem ^{5}) when performing a change of variables. In the following subsections our approach is to use a priori knowledge. We will find another process $Yt$, described by a simpler function $g(x,t)$ that we know is the solution of the equivalent ODE (SDE with $\sigma =0$). For example, if our process is of the form $dXt=Xtdt+XtdBt$ our candidate solution will be $Yt=ln(Xt)$. When the coefficients are more complex, we will try to find a variable change that leads to an already known SDE.

Before starting the derivation of these theorems, we would like to emphasize that this section is purely rigorous. No approximations are used unlike in the derivation of Hypothesis ^{7}.

### 3.1 Additive Drift

The simplest case is when the drift $b(Xt,t)$ does not depend on the time $t$ or the current state $Xt$, i.e., $b(Xt,t)=b$. This corresponds with the additive drift for runtime analysis (He and Yao, 2001).

^{4}. Finally, to compute the variance we use the results obtained in (8) and (9) as follows

$\u25a1$

As in runtime analysis it is not typically the case that we know exactly the drift's value, but we can bound it within some range. It follows directly that knowing those bounds we will be able to bound the expected progress after $t$ iterations.

^{8}, if $b\u2113\u2264b\u2264bu$ and $\sigma \u2113\u2264\sigma \u2264\sigma u$, with $b\u2113,bu\u2208R$ and $\sigma \u2113,\sigma u\u2208R0+$. Then

### 3.2 Multiplicative Drift

Analogous to runtime analysis, the assumption of additive drift is in general too loose since it does not take into account the current state of the process. This is solved by the multiplicative drift theorem from Doerr et al. (2012) when the drift is proportional to the current state $Xt$.

Once more, if we can only compute bounds rather than exact values it follows straightforward the next corollary. Notice that the result for the expectation is very similar to a previous fixed-budget multiplicative drift by Lengler and Spooner (2015) (Theorem ^{4}).

^{10}, if $b\u2113\u2264b\u2264bu$ and $\sigma \u2113\u2264\sigma \u2264\sigma u$ with $b\u2113,bu\u2208R+$ and $\sigma \u2113,\sigma u\u2208R0+$. Then,

### 3.3 Non-Elitist Multiplicative Drift

Drift theorems for the runtime share a common condition: the drift must be bounded by a monotonic function towards the optimum (He and Yao, 2001; Doerr et al., 2012; Johannsen, 2010). This limitation is also shared by previous fixed budget drift theorems (Lengler and Spooner, 2015 and Theorem ^{10}). In the following statement we relax this condition—the process also follows a monotonic behaviour in expectation, but towards the stationary distribution (which can be far away from the optimum). In fact, as we will see in the experimental section, even when the process starts near the optimum it will follow an exponential decay towards its equilibrium state, moving away from the optimum.

Although we have named it *Non-Elitist Multiplicative Drift* this theorem can also be applied for elitist algorithms. This will be the limit case when $b=\alpha =0$ and $\beta <0$ where we recover Theorem ^{10}.

^{5}) we obtain

^{4}) yielding the second result of the theorem. Finally, to compute the variance we will start from the formula $Var(Xt)=E(Xt-E(Xt))2$, which after introducing the two previous results yields

^{6}) we obtain

$\u25a1$

Once more, if exact values for the drift and diffusion coefficients are hard to compute, but can be bounded we can make use of the following corollary.

^{12}, if $a\u2113(b\u2113-Xt)\u2264a(b-Xt)\u2264au(bu-Xt)\u2208R+$ for all $Xt$, and $\alpha \u2113-\beta \u2113Xt\u2264\alpha -\beta Xt\u2264\alpha u-\beta uXt\u2208R0+$ for all $Xt$. Then,

## 4 Applications

Although the results derived in the previous section are rigorous, its application for evolutionary algorithms is just an approximation (see Subsection 2.4). In this section, we will take for granted that the diffusion approximation holds and we will relax our statements to the category of *application* rather than *theorem*. It will be finally in Section 5 where we will extensively validate this assumption (Hypothesis ^{7}).

### 4.1 Elitist Algorithms on LeadingOnes

First, we start with a study case for the additive drift (Theorem ^{8}). A good problem candidate, although not ideal, is the LeadingOnes function. Many algorithms like RLS or the (1+1) EA have an *almost* additive drift on this function; however, at the end of the optimisation process the drift depends on the state. The reason this fitness function is not ideal for an additive drift approach are the fitness boundary conditions ($0\u2264x\u2264n$). Our formulation of additive drift does not consider boundaries on the process. SDEs can cope with these conditions but this will highly increase the complexity and extension of the article, deviating from our introductory interests. To circumvent this problem, we will add a clause in the application's statement excluding our results after the optimum is reached. This approach has already been used in the literature as in Nallaperuma, Neumann et al. (2017).

*Under the Diffusion Approximation, after $t$ iterations, RLS on*Leading-Ones

*will find the optimum or reach an expected value and variance of:*

*where $X0$ is the number of leading ones of the initial search point.*

^{7}is a good approximation for the dynamics of $Xt$. As explained in the introduction, the drift and diffusion coefficients will be the first and second moments of the rate of change of the process. Or mathematically speaking

^{9}); using this corollary with the previously computed coefficients directly leads to the claimed results.

$\u25a1$

The application to the (1+1) EA is equivalent but we have to deal with global mutations, which makes computing the exact value for the drift and diffusion coefficients harder.

*Under the Diffusion Approximation, after $t$ iterations, the (1+1) EA on*LeadingOnes

*will find the optimum or reach an expected value and variance of:*

*where $X0$ is the number of leading ones of the initial search point.*

Let us denote by $Xt$ the number of leading ones in the bitstring at time $t$. And assume that Hypothesis ^{7} is a good approximation for the dynamics of $Xt$.

^{9}proves the result.

$\u25a1$

### 4.2 Elitist Algorithms on OneMax

Secondly, we apply the multiplicative drift for the same algorithms but for the OneMax problem, which is ideal for this drift theorem. Unlike in LeadingOnes, the boundary conditions are not a problem since our multiplicative drift bounds cannot push the system out of the fitness interval $[0,n]$. Notice that the drift is always increasing the number of ones and vanishes when approaching the optimum.

*Under the Diffusion Approximation, the expected fitness reached by RLS in*OneMax

*after $t$ iterations is*

*where $EX0$ is the expected number of ones of the initial search point.*

^{7}is a good approximation for the dynamics of $Zt$. The drift coefficient, as always, will be the first moment (expectation) of the process. The diffusion coefficient will be the second moment.

^{10}). Applying this theorem with $b=-1/n$ and $\sigma =1/n$ we obtain the following results

$\u25a1$

The next application is regarding the (1+1) EA; again in this case we have computed bounds on the coefficients rather than exact values due to the difficulty to obtain exact results with global mutations.

*Under the Diffusion Approximation, the expected fitness reached by the (1+1) EA in*OneMax

*after $t$ iterations is bounded by*

*where $EX0$ is the expected number of ones of the initial search point.*

^{7}is a good approximation for the dynamics of $Zt$. The drift and diffusion coefficients (first and second statistic moments of the process) can be expressed as

^{11}with $-1/en\u2264b\u2264-3.1/n$ and $1/en\u2264\sigma 2\u22643.1/n$ leads to

$\u25a1$

### 4.3 Non-Elitist Algorithms on OneMax

In this subsection, we will consider algorithms with local mutation but different selection operators. We keep general the acceptance probability, with the only assumption that it is monotonically non-decreasing. Afterwards, in the validation section we will plug in the expressions for a RW, MA, and SSWM.

$\u25a1$

Once the coefficients are known, the following application derives the fixed budget results by using the non-elitist multiplicative drift (Theorem ^{12}).

*Consider any Algorithm 1 with a monotonically non-decreasing acceptance function $pacc(\Delta f)$ on*OneMax.

*Under the Diffusion Approximation the following statements hold*

*where $Xt$ denotes the number of 1-bits in the search point at time $t$, $p\u2191=pacc(\Delta f=1)$ and $p\u2193=pacc(\Delta f=-1)$.*

^{7}is a good approximation for the dynamics of $Xt$. Then, the statement is proven after a direct application of Theorem

^{12}with the coefficients from Lemma

^{14}.

$\u25a1$

## 5 Validation of the Diffusion Approximation

The previous section showed several case studies of the application of the three rigorous drift theorems for fixed budget analysis. However, this method was the result of an approximation which we were assuming to be accurate (Hypothesis ^{7}). Finally, we provide an extensive validation to this assumption. We will tackle the accuracy of the method from several angles including experiments but also comparing the results against those obtained by rigorous methods.

### 5.1 Comparison with the Literature

We now compare the results obtained in Subsections 4.1 and 4.2 with some of the fixed budget literature. For the comparison on the LeadingOnes problem, we use the article that presented the first fixed budget analysis (Jansen and Zarges, 2012). In Table 2, we observe a high agreement for the upper bounds estimations (even an exact match for the RLS). Both approaches obtained the same growth term of $2t/n$ and the discrepancies for the (1+1) EA are present only on small terms. However, on the lower bounds the discrepancy is bigger, our method only obtained a growth term of $t/n$ since we had to pessimistically use the smallest drift value (when $Xt=n-1$) to be able to apply the additive drift (recall the proof of Application 10). On the other hand, the results from the literature only hold for time budget values which are sufficiently smaller than the expected optimisation, whereas our results hold until the optimum is found.

Algorithm | $EXt$ | Literature | Diffusion Approximation |

RLS | $\u2265$ | $1+2t/n-2-\Omega (1-\beta )n$ | $1+t/n-2-n$ |

$\u2264$ | $1+2t/n-2-n$ | $1+2t/n-2-n$ | |

(1+1) EA | $\u2265$ | $1+2t/n-ot/n$ | $1+t/(en)-2-n$ |

$\u2264$ | $1+2t/n-ot/n$ | $1+2t/n-2-n$ |

Algorithm | $EXt$ | Literature | Diffusion Approximation |

RLS | $\u2265$ | $1+2t/n-2-\Omega (1-\beta )n$ | $1+t/n-2-n$ |

$\u2264$ | $1+2t/n-2-n$ | $1+2t/n-2-n$ | |

(1+1) EA | $\u2265$ | $1+2t/n-ot/n$ | $1+t/(en)-2-n$ |

$\u2264$ | $1+2t/n-ot/n$ | $1+2t/n-2-n$ |

In Table 3 we consider the OneMax problem. Again, for RLS we compare against Jansen and Zarges (2012); we observe how both results (first two rows) can be recovered from our method noticing that $e-t/n=(e-1/n)t\u2248(1-1/n)t$, which is a good approximation for large $n$. In the case of the (1+1) EA we were able to obtain exactly the same lower bound as Lengler and Spooner (2015). The upper bounds are also in high agreement; however, the constant inside the exponential term is not as tight as in the literature.

Algorithm | $EXt$ | Literature | Diffusion Approximation |

RLS | $=$ | $n1-1-1/nt$ | $n1-e-t/n$ |

$=$ | $n-n/21-1/nt$ | $n-n/2e-t/n$ | |

(1+1) EA | $\u2265$ | $n1-12e-t/(en)$ | $n1-12e-t/(en)$ |

$\u2264$ | $n1-12e-t/n-Ot/n$ | $n1-12e-(3.1t)/n$ |

Algorithm | $EXt$ | Literature | Diffusion Approximation |

RLS | $=$ | $n1-1-1/nt$ | $n1-e-t/n$ |

$=$ | $n-n/21-1/nt$ | $n-n/2e-t/n$ | |

(1+1) EA | $\u2265$ | $n1-12e-t/(en)$ | $n1-12e-t/(en)$ |

$\u2264$ | $n1-12e-t/n-Ot/n$ | $n1-12e-(3.1t)/n$ |

Finally, since to our best knowledge, there are no previous fixed budget analysis of non-elitist algorithm we could not perform this sanity check for RW, MA, and SSWM. Nonetheless, the following subsections circumvents this problem by recurring to the concept of stationary distribution.

### 5.2 Stationary Distribution

*stationary distribution*$\pi (x)$ for when the probabilities of leaving each state cancel out with the probabilities of reaching that state (see e.g. Ross, 1996)

It is straightforward to notice that elitist algorithms do not have a stationary distribution. This is due to the fact that these algorithms always accept moves towards points of higher fitness but never accept moves towards points of lower fitness. This breaks the ergodicity required for a Markov Chain to have a stationary distribution. Hence, this sanity check for our method can only be applied for a RW, MA, and SSWM.

Consider Metropolis and SSWM on OneMax once they have reached the stationary distribution. Then for any initialisation, the expected number of ones will be $EXt=n1+e-\gamma $, where $\gamma =\alpha $ for Metropolis and $\gamma =2\beta (N-1)$ for SSWM.

$\u25a1$

As discussed earlier, the long term (limit for $t\u2192\u221e$) expectation of the non-elitist multiplicative drift tends to the $b$ term from the drift coefficient which according to Equation (14) has a value of $b=(np\u2191)/(p\u2191+p\u2193)$. We can observe in Table 4 how this value matches exactly the real expected fitness at equilibrium $EXt$ showing the strength of the SDE approach for this scenario.

RW | Metropolis | SSWM | |

$p\u2191$ | 1 | 1 | $1-e-2\beta 1-e-2N\beta $ |

$p\u2193$ | 1 | $e-\alpha $ | $e2\beta -1e2N\beta -1$ |

$b$ | $n/2$ | $n1+e-\alpha $ | $n1+e-2\beta (N-1)$ |

$EXt$ | $n/2$ | $n1+e-\alpha $ | $n1+e-2\beta (N-1)$ |

RW | Metropolis | SSWM | |

$p\u2191$ | 1 | 1 | $1-e-2\beta 1-e-2N\beta $ |

$p\u2193$ | 1 | $e-\alpha $ | $e2\beta -1e2N\beta -1$ |

$b$ | $n/2$ | $n1+e-\alpha $ | $n1+e-2\beta (N-1)$ |

$EXt$ | $n/2$ | $n1+e-\alpha $ | $n1+e-2\beta (N-1)$ |

### 5.3 Reconciling Fixed Budget with Runtime Analysis

As argued by Jansen and Zarges (2012), the fixed budget perspective provides more insight on the optimisation process. Runtime analysis focuses on the expected first hitting time of the optimum, this time is expressed as a function of the problem size $ET=f(n)$. On the other hand, fixed budget considers the expected fitness obtained after $t$ iterations. Hence, the result will also depend on the time $EXt=f(n,t)$.

However, both approaches should be in concordance. If we know the first hitting time $T(X)$ of any specific fitness value $X$, it looks natural to compute the inverse function to obtain the equivalent fixed budget result. As shown by Doerr et al. (2013) this is the case for deterministic algorithms, but it will not be true in general for randomised algorithms. The authors circumvent this problem by using sharp bounds on the probability of deviating from the expected times. Finally, they developed a method to recycle expected optimisation times to derive fixed budget results.

In this section, we consider the inverse approach, that is, deriving optimisation times from fixed budget results. Again, inverting the function $EXt=f(n,t)$ for the time will not give, in general, the correct answer. However, we do not aim for a rigorous translation tool as achieved by Doerr et al. (2013). Our purpose is to obtain some time measurement that can be compared with the runtime literature (up to the use of Landau's notation).

#### Additive Drift

Before jumping into conclusions for the studied algorithms, let us consider an additive process. This is a collection of stochastic variables in time $Xt\u22650\u2208R$ whose expected change remains constant $EXt-Xt-1\u2223Xt=b$. When $b$ is positive, the expectation $EXt$ is constantly being increased and $EXt$ always reaches any finite target value $x$ (given enough, but finite time). Let us recall that this will not be the case for EAs reaching the optimum $xopt$ of a fitness function, here $Xt\u2264xopt$ and therefore $Prob(Xt=xopt)=1$ is required in order for $EXt=xopt$. We will consider first a pure additive process and afterwards we will relate it with RLS and the (1+1) EA optimising LeadingOnes.

We mentioned before that computing the inverse function of $EXt$ for the time $t$ will not yield, in general, the correct result. However, this approach is correct for an additive drift process. Here, we can solve $EXt=f(n,t)$ for the time, obtaining a function of the form $t=f(EX,n)$. Finally, if we introduce any target value for $EX$, it will yield the time when the expectation reaches the target value.

$\u25a1$

It is straightforward to notice that this theorem is not applicable for optimisation algorithms. If $Xt$ describes the fitness of the EA, we observe that the boundary condition $Xt\u2264xopt$ breaks the additive requirement: $EXt-Xt-1\u2223Xt=xopt\u22640\u2260b$. As we mentioned earlier, we do not aim for a rigorous translation tool. However, as shown in Subsection 4.1, we can approximate the optimisation process of some algorithms on LeadingOnes with additive drift processes.

In Applications 10 and 11, we found additive drift bounds for the real drift of RLS and the (1+1) EA on LeadingOnes respectively. By introducing $x=n$ (the optimal fitness of LeadingOnes) in Theorem ^{16}, we will compare, in a non-rigorous way, the $\tau (n)$-times for these processes with the runtimes from the literature. Table 5 presents these results showing a high agreement, up to smaller terms, of the SDE method with the literature. Since we used bounds for the drift coefficient $b$, the table includes an upper $\tau u$ and a lower $\tau \u2113$ bound for the time $\tau (n)$.

LeadingOnes | RLS | (1+1) EA |

$b\u2113$ | $1/n$ | $1/en$ |

$bu$ | $2/n$ | $2/n$ |

$EX0$ | 0 | 0 |

$\tau \u2113(n)$ | $n2/2$ | $n2/2$ |

$\tau u(n)$ | $n2$ | $en2$ |

$ET$ | $\Theta (n2)$ | $\Theta (n2)$ |

LeadingOnes | RLS | (1+1) EA |

$b\u2113$ | $1/n$ | $1/en$ |

$bu$ | $2/n$ | $2/n$ |

$EX0$ | 0 | 0 |

$\tau \u2113(n)$ | $n2/2$ | $n2/2$ |

$\tau u(n)$ | $n2$ | $en2$ |

$ET$ | $\Theta (n2)$ | $\Theta (n2)$ |

#### Multiplicative Drift

In the case of the multiplicative drift (Theorems ^{10} and ^{12}) if we were to perform the same sanity check we would obtain an infinite time. This is due to the exponential decay of $EXt$ through the term $e-t$. However, we can circumvent this problem by recurring to the concept of half-life from exponential decaying radioactive processes (see, e.g., Krane, 1987). The half-life is defined as the time taken until half of the atoms from a radionuclide have followed a decay process. In the same spirit, the following definition introduces the notion of $\delta $-life.

As used in Subsection 4, we will focus on the number of 1-bits on linear functions; then we establish our satisfiability criterion as $\delta =1/n$. This implies that in the worst case, where the system starts at a distance $n$ from the long term expectation we will be satisfied when there is only a difference of one 1-bit with the optimal solution.

^{12}) is given by

^{12}can be rewritten as $EXt-b=EX0-b\xb7e-at$. It is straightforward to see that the process $EXt-b$ follows an exponential decay; then using Definition

^{17}for $\delta =1/n$ leads to

$\u25a1$

Table 6 includes the $1/n$-lives ($\tau 1/n$) derived from the SDE method with the above expression and expected optimisation times $ET$ from the literature. Before making the comparison let us remember that optimisation times refer to hitting the optimum whereas the $1/n$-life refers to approaching equilibrium. If the parameters of the algorithm allow the stationary distribution, to be close enough to the optimum this will be a fair comparison. This is the reason why the results for the RW highly differ in the table; a RW cannot optimise OneMax in polynomial time but can reach its stationary distribution. On the other hand, the results for the three other algorithms are in high concordance with the literature (up to smaller terms).

OneMax | RLS | RW | Metropolis | SSWM |

$p\u2191$ | 1 | 1 | 1 | $1-e-2\beta 1-e-2N\beta $ |

$\tau 1/n$ | $nlnn$ | $nlnn$ | $nlnn$ | $nlnnp\u2191$ |

$ET$ | $nlnn+O(n)$ | $\Omega 2n$ | $nlnn+O(n)$ | $nlnnp\u2191+O(n)$ |

$\gamma $ | - | - | $\u2265lnn$ | $\u2265lnn$ |

OneMax | RLS | RW | Metropolis | SSWM |

$p\u2191$ | 1 | 1 | 1 | $1-e-2\beta 1-e-2N\beta $ |

$\tau 1/n$ | $nlnn$ | $nlnn$ | $nlnn$ | $nlnnp\u2191$ |

$ET$ | $nlnn+O(n)$ | $\Omega 2n$ | $nlnn+O(n)$ | $nlnnp\u2191+O(n)$ |

$\gamma $ | - | - | $\u2265lnn$ | $\u2265lnn$ |

### 5.4 Simulations

Another sanity check that we have considered to validate Hypothesis ^{7} is to perform numerical simulations. We built several graphical representations where the experimental results are represented by solid lines (expectation) and a shadowed region (one standard deviation around the expectation). On the other hand, theoretical results for the expectation are represented by dashed lines. For aesthetics reasons we have not included the variance estimations in the figures. This would have implied plotting two extra shadowed zones (upper and lower bounds estimations) per algorithm.

Firstly, the left graph from Figure 2 shows the application of the additive drift for LeadingOnes. We can observe that for RLS this approach is very precise until the last part of the optimisation process. Whereas for the (1+1) EA there is a large gap; nevertheless, the method properly represents the evolution of the (1+1) EA's fitness on this problem. In our opinion, the discrepancy gap is mainly due to two reasons:

- •
choosing an additive drift approach for a process whose drift varies depending on the state and

- •
pessimistic estimations for the additive constant $b$ (see Theorem

^{8}and Applications 10 and 11).

This deviation from additivity can be seen at the end of the simulation when the experimental lines start curving, whereas the theoretical lines remain straight until they hit the optimum.

Secondly, the right graph in Figure 2 compares RLS and the (1+1) EA on the OneMax problem. The comparison between experiments and theory for RLS is excellent, even better than in LeadingOnes due to the absence of the boundary condition. Again for the (1+1) EA the theory collects the exponential decrease of the process. However, there is still a gap which in our opinion is mainly due to equivalent reasons:

- •
using a multiplicative drift when the drift is not exactly multiplicative and

- •
pessimistic estimations for the multiplicative constant $b$ (see Theorem

^{10}and Application 13).

Figure 3 shows the application of the non-elitist drift for OneMax. The left graph in Figure 3 shows an excellent match between theoretical and experimental results for all the algorithms, describing the dynamics of these processes even when they are not able to optimise the problem. With the same accuracy, the right graph in Figure 3 shows a case study where two algorithms (Metropolis and RW) started in between their stationary expected value and the optimum but due to their weak selection strength, they move away the optimum but towards the equilibrium's value.

We would like to point out that our applications were just study cases to motivate the use of SDEs as an analysis tool for EAs. If we were aiming for precision, we could have recycled already known and better bounds for the drift such us $b(Zt)\u2264-Zten1+16Ztn$ from Doerr et al. (2011), but then it would become a variable drift process (Johannsen, 2010).

### 5.5 Experimental Error

The previous sanity check was based on a visual representation of experimental and theoretical results. However, graphical representations can sometimes be misleading and we have to be sure that the results obtained were not just an aesthetic effect. In order to objectively validate the SDE method (Hypothesis ^{7}) we perform an experimental measurement of the error. We will use the widely used concepts of mean absolute percentage error (MAPE) and root mean squared error (RMSE) (see, e.g., Hyndman and Koehler, 2006).

The advantage of the MAPE is that it takes into consideration the magnitude of the observed values. It produces a result in the form of a percentage that is interpreted as the magnitude of the deviation with reality. However, for small values of the observed values this measurement is not appropriate due to the singularity at $Yi=0$. This is a problem for measuring the error of the variance due to the deterministic initialisation used in the experiments and the concentration of the independent runs towards the same fitness value at the end of the simulation. For this reason, we have decided to use the MAPE for the expectations and the RMSE for the variance.

Tables 7 and 8 show agreement with Figures 2 and 3. The results for all the algorithms with local mutations, that is, all but the (1+1) EA, are excellent on OneMax. With $MAPE$ of the expectation being even smaller than $1%$. Only for the (1+1) EA we can see a significant difference, due to the reasons explained in the previous section.

OneMax | $MAPE(EXt)$ | $RMSE(VarXt)$ |

RLS | $0.15%$ | 0.41 |

(1+1) EA lower bound | $16.34%$ | 0.71 |

(1+1) EA upper bound | $22.18%$ | 1.30 |

RW | $0.88%$ | 0.42 |

MA | $0.62%$ | 0.44 |

SSWM | $0.45%$ | 0.37 |

OneMax | $MAPE(EXt)$ | $RMSE(VarXt)$ |

RLS | $0.15%$ | 0.41 |

(1+1) EA lower bound | $16.34%$ | 0.71 |

(1+1) EA upper bound | $22.18%$ | 1.30 |

RW | $0.88%$ | 0.42 |

MA | $0.62%$ | 0.44 |

SSWM | $0.45%$ | 0.37 |

LeadingOnes | $MAPE(EXt)$ | $RMSE(VarXt)$ |

RLS lower bound | $47.90%$ | 7.82 |

RLS upper bound | $3.39%$ | 7.73 |

(1+1) EA lower bound | $72.27%$ | 8.03 |

(1+1) EA upper bound | $23.74%$ | 7.93 |

LeadingOnes | $MAPE(EXt)$ | $RMSE(VarXt)$ |

RLS lower bound | $47.90%$ | 7.82 |

RLS upper bound | $3.39%$ | 7.73 |

(1+1) EA lower bound | $72.27%$ | 8.03 |

(1+1) EA upper bound | $23.74%$ | 7.93 |

Finally, the comparison against LeadingOnes (see Table 8) is not as successful as in the previous case. Here, the errors are quite large as shown in the left graph of Figure 2. But again, we think that the reasons for the discrepancy are not intrinsic to using SDEs; they are due to the chosen drift technique.

## 6 Conclusions

For the last decade, rigorous runtime analysis has become immensely popular, in part due to the emergence of the so-called “drift theorems” that inform about the time to reach a particular state or set of states. However, these results are not directly translatable for fixed budget scenarios which ask the opposite question: how much improvement can be obtained in a fixed number of fitness evaluations? This problem is relevant in practice: in real problems it is usually impossible to know whether the optimum has been found. In this case, it is arguably more useful to know how much improvement an algorithm can achieve in a fixed number of steps.

Here, we have introduced the use of SDEs to the study of EAs which seem particularly suited to obtain results for fixed budget scenarios. Even though SDEs are approximations of the dynamics of EAs, they can produce analytical insight which is not possible with other tools, such as Markov chains. At the same time, SDEs do not discard the stochasticity of the dynamics and allow for the estimation of the variance of relevant quantities, such as the state at which the algorithm will find itself after $t$ iterations.

Here we made a simple preliminary exploration of the potential of these tools for the study of EAs. We did not make use of other techniques for the study of SDEs that allow for greater insight into the dynamics of a stochastic process. For example, the Fokker-Planck equation associated with an SDE can be used to track the full probability distribution of the process. Even though this is, in many instances, impossible to solve analytically it can reveal what are the crucial parameters of a stochastic process. Furthermore, it is often solvable for the steady state, allowing for the characterization of the stationary distribution of the process. In fact, these techniques are commonly used in the related field of population genetics where they have been used to provide foundational results. The results we presented here are in themselves interesting, especially when applied to fixed budget analysis, but they also hold the promise that SDEs can become a standard tool in EC.

On the other hand, it is fair to say that previous modelling attempts (those mentioned in the introduction) were aiming for more general and ambitious problems than what this article has covered. This is why we believe that the natural direction for future work will be considering more complex functions or more elaborated heuristics. But the main challenge would be a rigorous translation of the SDE method presented here.

## Acknowledgments

The author would like to thank Dirk Sudholt and Tiago Paixão for valuable input and discussions on this manuscript. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 618091 (SAGE).

## References

^{2}-composite functions. In

## Note

^{*}

An extended abstract of this article with preliminary results was presented at FOGA '17 by Paixão and Pérez Heredia (2017).