## Abstract

We give a detailed analysis of the optimization time of the -Evolutionary Algorithm under two simple fitness functions (OneMax and LeadingOnes). The problem has been approached in the evolutionary algorithm literature in various ways and with different degrees of rigor. Our asymptotic approximations for the mean and the variance represent the strongest of their kind. The approach we develop is based on an asymptotic resolution of the underlying recurrences and can also be extended to characterize the corresponding limiting distributions. While most of our approximations can be derived by simple heuristic calculations based on the idea of matched asymptotics, the rigorous justifications are challenging and require a delicate error analysis.

## 1 Introduction

The last two decades or so have seen an explosion of application areas of evolutionary algorithms (EAs) in diverse scientific or engineering disciplines. An EA is a random search heuristic, using evolutionary mechanisms such as crossover and mutation, for finding a solution that often aims at optimizing an objective function. EAs proved to be extremely useful for combinatorial optimization problems because they can find good solutions for complicated problems using only basic mathematical modeling and simple operators with reasonable efficiency; see Coello Coello (2006), Deb (2001), and Horn (1997) for more information. Although EAs have been widely applied in solving practical problems, the analysis of their performance and efficiency, which often provides better modeling prediction for potential uses in practice, is much less developed, and only computer simulation results are available for most of the EAs in use; see, for example, Beyer et al. (2002), Droste et al. (1998), Garnier et al. (1999), and He and Yao (2001, 2002). We are concerned in this article with a precise probabilistic analysis of a simple algorithm called ()-EA.

A typical EA comprises several ingredients: the coding of solution, the population of individuals, the selection for reproduction, the operations for generating new individuals, and the fitness function to evaluate the new individual. The mathematical analysis of the time complexity is often challenging mostly because the stochastic dynamics is difficult to capture. It proves more insightful to look instead at simplified versions of the algorithm, seeking for a compromise between mathematical tractability and general predictability. Such a consideration was first attempted in Bäck (1992) and Mühlenbein (1992) in the early 1990s for the ()-EA, using only one individual with a single mutation operator at each stage. An outline of the procedure is as follows.

**Algorithm ()-EA**

Choose an initial string uniformly at random

Repeat until a terminating condition is reached

Create by flipping each bit of (with probability of flipping), each bit independently of the others

Replace by iff

Step 1 is often realized by tossing a fair coin for each of the bits, one independently of the others, and the terminating condition is either exhausting the number of assigned iterations or reaching a state when no further improvement has been observed for a given amount of time.

*optimization time*of OneMax). Then the expected time was argued to be of order , indicating the efficiency of the ()-EA. Bäck (1992) derived expressions for the success, failure, and stagnation probabilities of mutation. A finer asymptotic approximation of the form was derived by Garnier et al. (1999), where when the mutation rate equals . They went further by characterizing the limiting distribution of in terms of a log-exponential distribution (which is indeed a double exponential or a Gumbel distribution). However, some of their proofs, notably the error analysis, seem incomplete (as indicated in their paper). Thus a precise result such as (1) has remained obscure in the EA literature.

While the -EA under simple fitness functions may seem too simplified to be of much
practical value, the study of its complexity continues to attract the attention in the
literature (see, for example, Auger and Doerr, 2011 and
Neumann and Witt, 2010 for more recent developments) for
several reasons. First, the -EA (under some fitness functions)
represents one of the simplest models whose behavior is *mathematically tractable*. Second, the stochastic behaviors under such a simple formulation often
have, although hard to prove, a wider range of applicability or predictability, either for more
general models or for other meta-heuristics. Such a *complexity robustness* can
on the other hand be checked by simulations, and in such a case, the theoretical results are
useful in directing good guesses. For example, Neumann and Witt (2009) showed that 1-ANT behaves identically to the -EA in some situations. Also, Sudholt
and Witt (2010) showed a similar behavior translation
into Particle Swarm Optimization algorithms. Third, although tractable, most of the analyses of
the -EAs
are far from being trivial, and different mathematical tools have been introduced or developed
for such a purpose, leading to more general *methodological developments*, which
may also be useful for the analysis of other EAs or heuristics. Fourth, from a mathematical
point of view, few models can be solved precisely, and those that can be often exhibit
additional *structural properties* that are fundamental and may be of interest
for further investigation. Finally, understanding the expected complexity of algorithms may help
in identifying hard inputs and in *improving the efficiency* of the algorithm;
see, for example, Doerr and Doerr (2014) and Doerr
et al. (2013).

The expected optimization time required by ()-EA has undergone successive improvements, yet none of them reached the precision of Garnier et al.'s result (1); we summarize in Figure 1 some recent findings; a brief account of earlier results can be found in Garnier et al. (1999).

Note that, by a result of Doerr, Johannsen, and Winzen (2010, 2012), which showed that OneMax is the easiest function among all functions with a unique optimum (particularly, among all linear functions), any lower bound for OneMax also provides a lower bound for linear functions. Thus the precise asymptotic bounds we derive in this article may also extend as effective lower bounds for other fitness functions. On the other hand, Sudholt (2013) established that the ()-EA is, for OneMax, the fastest nonadaptive EA that uses only standard bit mutations to create new offspring, starting with a single search point.

These expressions, as well as the numerical values, are consistent with those given in Garnier et al. (1999). From the expression of , it is clear that its characterization lies much deeper than the dominant term . Numerically, such a characterization is also important because is close to being a constant for moderate values of , so that the overshoot ( being negative) from the leading term is not small in such a situation.

Finer properties, such as more precise expansions for , the variance, and limiting distribution will also be established. In particular, the study of the variance provides a measure of spread of the asymptotic distribution, and is in line with recent research on tail probabilities; see, for example, Witt (2014) and Zhou et al. (2012). The extension to does not lead to additional new phenomena as already discussed in Garnier et al. (1999); it is thus omitted in this article.

Our approach relies essentially on the asymptotic resolution of the underlying recurrence relation for the optimization time, and the method of proof is different from all previous approaches (including Markov chains, coupon collection, coupling, drift analysis, etc.). It consists of three major steps depicted in the following diagram.

Briefly, due to the recursive nature of the algorithm, we first derive the corresponding
recurrence relation satisfied by the random variables that capture the remaining optimization
time from different states of the algorithm. In the case when the recurrence can be solved by
techniques from *analytic combinatorics* through the use of generating functions,
the corresponding asymptotic approximations can often be obtained by suitable complex-analytic
tools such as singularity analysis and saddle-point method; see the authoritative book by
Flajolet and Sedgewick (2009) for more information. The
analysis of ()-EA under LeadingOnes belongs to such a case; see Section 7 for details. However, when such a
generating function-based approach fails to provide more manageable forms (in terms of
functional or differential equations), a different route through “matched
asymptotics” may be attempted, which is the one we adopt for the analysis of the -EA
under OneMax. Roughly, we identify terms with the largest contribution on the
right-hand side and guess the right form (the *Ansatz*) by matching the
asymptotic expansions on both sides of the recurrence. The Ansatz is, once postulated, often
easily checked by direct numerical calculations. The final stage is to justify the Ansatz by a
proper *error analysis*, which often involves a delicate asymptotic analysis; see
Wong (2014) for a recent survey of techniques for
recurrences of linear type. Our recurrences are, however, of a nonlinear nature, and involve two
parameters. Note that these two approaches are not exclusive, but instead complementary in many
cases. For example, we rely on generating functions and complex-analytic tools for the proof of
several auxiliary results in this article.

More precisely, we consider and study the random variables , which counts the number of steps taken by ()-EA before reaching the optimum state when starting with 1s (namely, ). We will derive very precise asymptotic approximations for each , . In particular, the distribution of is for large well approximated by a sum of exponential distributions, and this in turn implies a Gumbel limit law when . Then the time for to reach the optimum state by the -EA when starting with a random initial configuration (every bit being Bernoulli) can be readily characterized because the binomial distribution is highly concentrated near the mean; see Table 1 for a summary of our major results.

Fitness . | OneMax () . | LeadingOnes () . |
---|---|---|

Properties | ||

Mean | ||

Variance | ||

Limit law | Gumbel distribution | Gaussian distribution |

Approach | Ansatz & error analysis | Analytic combinatorics |

Fitness . | OneMax () . | LeadingOnes () . |
---|---|---|

Properties | ||

Mean | ||

Variance | ||

Limit law | Gumbel distribution | Gaussian distribution |

Approach | Ansatz & error analysis | Analytic combinatorics |

In addition to its own methodological merit of obtaining stronger asymptotic approximations and potential use in other problems in similar EAs, our approach, to the best of our knowledge, provides the first rigorous justification of the far-reaching results of Garnier et al. (1999) more than seventeen years ago. It also sheds new light on further potential use of similar techniques to related problems of a recursive nature.

This article is organized as follows. We begin with deriving the recurrence relation satisfied by the random variables (when the initial configuration is not random). From this recurrence, it is straightforward to characterize inductively the distribution of for small . The hard case when requires the development of more asymptotic tools, which we elaborate in Section 3. Asymptotics of the mean values of and are presented in Section 4 with a complete error analysis. Section 5 then addresses the asymptotics of the variance. Limit laws are established in Section 6 by an inductive argument and fine error analysis. Finally, we consider briefly in Section 7 the optimization time of the -EA for the LeadingOnes problem. Denote the corresponding optimization time by . We summarize the major results in Table 1. Note that all results for have previously been obtained in Ladret (2005) and we will sketch a different self-contained method of proof for them.

Some technical material is collected in Appendices A–F of *HPRTC*.

**Notation.***Throughout this article, all -terms are with respect to unless otherwise stated. We say that a quantity uniformly for as , or in words “ uniformly for bounded and large ,” if there exists a such that for any there is an such that for all . Here may depend on and . The definition extends similarly when holds uniformly for .*

## 2 Recurrence and the Limit Laws of when

In this section, we derive first a recurrence relation satisfied by the probability generating function of , where denotes the number of steps taken by ()-EA to reach for the first time when starting from the initial state . From this recurrence and starting with , we can then get closed-form expressions one after another by iterating the recurrence, but the expressions soon become too cumbersome. We then use a simple inductive argument to derive the corresponding limit laws when remains bounded, together with an asymptotic approximation to the mean and one to the variance. These results not only reveal the complexity of the analytic problem when viewed from a generating function perspective but also serve to introduce the prototype forms of the mean and variance asymptotics, respectively, which we will examine in more detail later.

### 2.1 Recurrence for

The leading factor in Eq. (5) is the origin of the pervasive presence of “” in our asymptotic approximations.

While this simple recurrence relation is not new in the EA literature (see, for example, Bäck, 1992, Garnier et al., 1999, and He and Yao, 2003), tools have been lacking for a direct asymptotic resolution, which we will develop in detail in this article.

From a computational point of view (notably for higher moments), it is often preferable to use the following recurrence because fewer terms depending on are involved.

This follows from dividing both sides of Eq. (6) by and then rearranging terms there.

### 2.2 : from Geometric to Exponential

If the sequence of characteristic functions of the random variables converges pointwise to as for , and is continuous at zero, then converges in distribution to whose characteristic function is .

for . Such a limit law indeed extends to the case when , which we formulate in the next subsection.

### 2.3 The Distribution of when

Let denote the -th order harmonic numbers and . For convenience, we define .

Before proving this theorem, we derive a simple estimate for , which will be useful in our analysis below.

**Proof of Theorem**Using the fact that for (being a characteristic function), we then deduce, by Equations (4), (10), and (11), that Since is an analytic function of near unity and , there exists an interval in which for some . Thus we can rewrite the above relation as where the -term holding uniformly for bounded and . By iterating this relation times using , we obtain uniformly for bounded and . This and Lévy's continuity theorem (van der Vaart, 1998, Section 2.3) imply Eq. (9).

^{3}: Limit laws.**Proof of Theorem**We now turn to the mean , which satisfies, by taking derivative with respect to and then substituting in Eq. (4), the recurrence By the same reasoning used above for based on Eq. (10), we see that the largest contribution on the left-hand side comes from terms with , and we expect the estimate or which, by iteration, yields . The error analysis to justify this is not difficult but less interesting. Isolating first the terms corresponding to and then rearranging all other terms, we get, again by Eq. (10), Thus we can rewrite this as where both . Note that and may be negative. Rearranging this recurrence as A direct iteration gives, by , This proves the required estimate for the mean when .

^{3}: Mean value.**Proof of Theorem ^{3}: Variance of .** To compute the variance, one may start with the second moment and then consider the difference with the square of the mean; however, it is computationally more advantageous to study directly the recurrence satisfied by the variances themselves.

^{3}.

The simple inductive argument we used here extends to a wider range than (as obvious from the error terms established) but fails when, say . In order to cover the whole range , we will need more refined uniform estimates for the error terms, which will be dealt with in Section 4. Some of the tools needed are developed in the next section.

### 2.4 Asymptotic Expansions and Ansätze for

**Normalizing the mean values.**Let . For simplicity, we consider where , so that satisfies, by Eq. (13), the simpler-looking recurrence with , where From these relations, we obtain , and In general, the 's are all rational functions of but their expressions become long as increases. We thus turn to asymptotic approximation.

**Asymptotic expansions for .** Our uniform asymptotic approximation to was largely motivated by intensive symbolic computations for small . We briefly summarize them here, which will also be crucial in specifying the initial conditions for the differential equations satisfied by the functions () involved in the full asymptotic expansion of ; see Appendix D of *HPRTC*.

**An Ansatz for small .**These also suggest the Ansatz for some functions of . Using this form and the above expansions to match the undetermined coefficients of the polynomials (in ), we obtain successively So we observe the general pattern for some explicitly computable sequence and coefficients . A crucial complication arises here: the general form of each holds only for , and correction terms are needed for smaller . For example, where we use the Iverson bracket notation if holds, and 0, otherwise. It is such a complication that makes the determination of smaller-order terms more involved.

**An Ansatz for large .**All the expansions here hold only for small . When grows, we write and see that and it is exactly this form that motivated naturally our choice of the Ansatz for some function . This will be seen to be equivalent to the approximation . Note that the omnipresence of the harmonic numbers may be traced to the asymptotic estimate (10); see also Lemma

^{11}.

**Formal calculations.** The next formal question then is how to guess this function (before proving all assumptions). Here is the quick sketch of our ideas.

^{15}) and (see Lemma

^{13}), we expect that We are then led to the study of sums of the form for a given sequence . As we will see in the next section (defined in the introduction), so that has to satisfy the differential equation Choosing properly the initial condition, we then conclude that , as defined in Eq. (3). The justification of all these estimates, as well as more refined expansions, turns out to be highly nontrivial and requires several asymptotic tools that will be developed in the following two sections.

## 3 Asymptotics of Sums of the Form

### 3.1 Asymptotics of

### 3.2 Asymptotics of

We now show that and in Eq. (22) can be expressed in terms of linear combination of and .

## 4 The Expected Values of and Their Asymptotics

We will derive in this section a more precise expansion for the mean .

Our analysis will be based on the recurrence (16) for and use the idea of successive asymptotic iteration (or bootstrapping; see
de Bruijn, 1981 or Flajolet and Sedgewick, 2009), which proceeds as follows. We consider first the
difference , which satisfies itself a recurrence of the same type but with a different
nonhomogeneous part. We bound this difference by Lemma ^{11} and a transfer technique, which deduces a uniform bound for the difference from that
of the nonhomogeneous part. Then we repeat the same procedure by subtracting more terms and get
a refined expansion. This same procedure can then be extended and yields a more precise
expansion; see Appendix D of *HPRTC*.

Instead of starting from a state with a fixed number of 1s, the first step of the Algorithm ()-EA described in the introduction corresponds to the situation when the initial state (the number of 1s) is not fixed but random. Assume that this input follows a binomial distribution of parameter (each bit being 1 with probability and 0 with probability ). Denote by the number of steps taken by ()-EA to reach the optimum state. The following result describes precisely the asymptotic behavior of the expected optimization time.

A direct consequence of the precise estimates we derived is the following asymptotic approximation measuring the difference between (random input) and (fixed input), which improves the -bound for derived in the recent paper by Doerr and Doerr (2014).

Since the proof is straightforward either from the expansions in Theorems ^{8} and ^{9} or by the
same method of proof of Theorem ^{9}, we omit the
details, which can be readily manipulated by standard symbolic computation tools. See Figure 6 for a graphical illustration of the difference .

### 4.1 More Asymptotic Tools

We develop here some other asymptotic tools that will be used in proving Theorem ^{8}.

The following lemma is very helpful in obtaining error estimates to be addressed below. It also sheds new light on the occurrence of the harmonic numbers in Eq. (39).

Applying this lemma to the recurrence (16), we then get a simple upper bound for .

For , the inequality holds.

Uniformity of the estimate in the lemma and the asymptotic expansion (29) play a crucial rôle in our analysis.

The approximation can be easily extended and refined if more smoothness properties of are known, which is the case for all functions appearing in our analysis (they are all ).

### 4.2 Proof of Theorem ^{8}

Our method of proof consists in three steps: first a heuristic calculation to get the dominant term, then an error analysis to justify the dominant term with an explicit error term, and finally another refined analysis (of the same inductive argument) to complete the proof of Theorem ^{8}. The main idea of the error analysis is to express the error term as another recurrence of the same type but with a different nonhomogeneous part. Then showing the smallness of the nonhomogeneous part will then lead to the required order estimate for the error.

We start with the following identity whose proof is straightforward.

Note that the sum vanishes for . Take . Then we obtain the following identity, which is itself an asymptotic expansion for large (and ).

**Formal calculations.**Assuming the validity of Eq. (19), we can make the formal calculations in Eq. (20) more precise by Eq. (29), Eq. (39), Lemma

^{13}, and Corollary

^{15}, obtaining Thus we see that satisfies We now specify the initial condition . Since the postulated form (19) holds for (indeed also true for ), we take and see that because . This implies that . The first few terms in the Taylor expansion of read as follows. which can then be checked with the explicit expressions of for small (see Section 2.4).