We give a detailed analysis of the optimization time of the -Evolutionary Algorithm under two simple fitness functions (OneMax and LeadingOnes). The problem has been approached in the evolutionary algorithm literature in various ways and with different degrees of rigor. Our asymptotic approximations for the mean and the variance represent the strongest of their kind. The approach we develop is based on an asymptotic resolution of the underlying recurrences and can also be extended to characterize the corresponding limiting distributions. While most of our approximations can be derived by simple heuristic calculations based on the idea of matched asymptotics, the rigorous justifications are challenging and require a delicate error analysis.
The last two decades or so have seen an explosion of application areas of evolutionary algorithms (EAs) in diverse scientific or engineering disciplines. An EA is a random search heuristic, using evolutionary mechanisms such as crossover and mutation, for finding a solution that often aims at optimizing an objective function. EAs proved to be extremely useful for combinatorial optimization problems because they can find good solutions for complicated problems using only basic mathematical modeling and simple operators with reasonable efficiency; see Coello Coello (2006), Deb (2001), and Horn (1997) for more information. Although EAs have been widely applied in solving practical problems, the analysis of their performance and efficiency, which often provides better modeling prediction for potential uses in practice, is much less developed, and only computer simulation results are available for most of the EAs in use; see, for example, Beyer et al. (2002), Droste et al. (1998), Garnier et al. (1999), and He and Yao (2001, 2002). We are concerned in this article with a precise probabilistic analysis of a simple algorithm called ()-EA.
A typical EA comprises several ingredients: the coding of solution, the population of individuals, the selection for reproduction, the operations for generating new individuals, and the fitness function to evaluate the new individual. The mathematical analysis of the time complexity is often challenging mostly because the stochastic dynamics is difficult to capture. It proves more insightful to look instead at simplified versions of the algorithm, seeking for a compromise between mathematical tractability and general predictability. Such a consideration was first attempted in Bäck (1992) and Mühlenbein (1992) in the early 1990s for the ()-EA, using only one individual with a single mutation operator at each stage. An outline of the procedure is as follows.
Choose an initial string uniformly at random
Repeat until a terminating condition is reached
Create by flipping each bit of (with probability of flipping), each bit independently of the others
Replace by iff
Step 1 is often realized by tossing a fair coin for each of the bits, one independently of the others, and the terminating condition is either exhausting the number of assigned iterations or reaching a state when no further improvement has been observed for a given amount of time.
While the -EA under simple fitness functions may seem too simplified to be of much practical value, the study of its complexity continues to attract the attention in the literature (see, for example, Auger and Doerr, 2011 and Neumann and Witt, 2010 for more recent developments) for several reasons. First, the -EA (under some fitness functions) represents one of the simplest models whose behavior is mathematically tractable. Second, the stochastic behaviors under such a simple formulation often have, although hard to prove, a wider range of applicability or predictability, either for more general models or for other meta-heuristics. Such a complexity robustness can on the other hand be checked by simulations, and in such a case, the theoretical results are useful in directing good guesses. For example, Neumann and Witt (2009) showed that 1-ANT behaves identically to the -EA in some situations. Also, Sudholt and Witt (2010) showed a similar behavior translation into Particle Swarm Optimization algorithms. Third, although tractable, most of the analyses of the -EAs are far from being trivial, and different mathematical tools have been introduced or developed for such a purpose, leading to more general methodological developments, which may also be useful for the analysis of other EAs or heuristics. Fourth, from a mathematical point of view, few models can be solved precisely, and those that can be often exhibit additional structural properties that are fundamental and may be of interest for further investigation. Finally, understanding the expected complexity of algorithms may help in identifying hard inputs and in improving the efficiency of the algorithm; see, for example, Doerr and Doerr (2014) and Doerr et al. (2013).
The expected optimization time required by ()-EA has undergone successive improvements, yet none of them reached the precision of Garnier et al.'s result (1); we summarize in Figure 1 some recent findings; a brief account of earlier results can be found in Garnier et al. (1999).
Note that, by a result of Doerr, Johannsen, and Winzen (2010, 2012), which showed that OneMax is the easiest function among all functions with a unique optimum (particularly, among all linear functions), any lower bound for OneMax also provides a lower bound for linear functions. Thus the precise asymptotic bounds we derive in this article may also extend as effective lower bounds for other fitness functions. On the other hand, Sudholt (2013) established that the ()-EA is, for OneMax, the fastest nonadaptive EA that uses only standard bit mutations to create new offspring, starting with a single search point.
These expressions, as well as the numerical values, are consistent with those given in Garnier et al. (1999). From the expression of , it is clear that its characterization lies much deeper than the dominant term . Numerically, such a characterization is also important because is close to being a constant for moderate values of , so that the overshoot ( being negative) from the leading term is not small in such a situation.
Finer properties, such as more precise expansions for , the variance, and limiting distribution will also be established. In particular, the study of the variance provides a measure of spread of the asymptotic distribution, and is in line with recent research on tail probabilities; see, for example, Witt (2014) and Zhou et al. (2012). The extension to does not lead to additional new phenomena as already discussed in Garnier et al. (1999); it is thus omitted in this article.
Our approach relies essentially on the asymptotic resolution of the underlying recurrence relation for the optimization time, and the method of proof is different from all previous approaches (including Markov chains, coupon collection, coupling, drift analysis, etc.). It consists of three major steps depicted in the following diagram.
Briefly, due to the recursive nature of the algorithm, we first derive the corresponding recurrence relation satisfied by the random variables that capture the remaining optimization time from different states of the algorithm. In the case when the recurrence can be solved by techniques from analytic combinatorics through the use of generating functions, the corresponding asymptotic approximations can often be obtained by suitable complex-analytic tools such as singularity analysis and saddle-point method; see the authoritative book by Flajolet and Sedgewick (2009) for more information. The analysis of ()-EA under LeadingOnes belongs to such a case; see Section 7 for details. However, when such a generating function-based approach fails to provide more manageable forms (in terms of functional or differential equations), a different route through “matched asymptotics” may be attempted, which is the one we adopt for the analysis of the -EA under OneMax. Roughly, we identify terms with the largest contribution on the right-hand side and guess the right form (the Ansatz) by matching the asymptotic expansions on both sides of the recurrence. The Ansatz is, once postulated, often easily checked by direct numerical calculations. The final stage is to justify the Ansatz by a proper error analysis, which often involves a delicate asymptotic analysis; see Wong (2014) for a recent survey of techniques for recurrences of linear type. Our recurrences are, however, of a nonlinear nature, and involve two parameters. Note that these two approaches are not exclusive, but instead complementary in many cases. For example, we rely on generating functions and complex-analytic tools for the proof of several auxiliary results in this article.
More precisely, we consider and study the random variables , which counts the number of steps taken by ()-EA before reaching the optimum state when starting with 1s (namely, ). We will derive very precise asymptotic approximations for each , . In particular, the distribution of is for large well approximated by a sum of exponential distributions, and this in turn implies a Gumbel limit law when . Then the time for to reach the optimum state by the -EA when starting with a random initial configuration (every bit being Bernoulli) can be readily characterized because the binomial distribution is highly concentrated near the mean; see Table 1 for a summary of our major results.
|Fitness .||OneMax () .||LeadingOnes () .|
|Limit law||Gumbel distribution||Gaussian distribution|
|Approach||Ansatz & error analysis||Analytic combinatorics|
|Fitness .||OneMax () .||LeadingOnes () .|
|Limit law||Gumbel distribution||Gaussian distribution|
|Approach||Ansatz & error analysis||Analytic combinatorics|
In addition to its own methodological merit of obtaining stronger asymptotic approximations and potential use in other problems in similar EAs, our approach, to the best of our knowledge, provides the first rigorous justification of the far-reaching results of Garnier et al. (1999) more than seventeen years ago. It also sheds new light on further potential use of similar techniques to related problems of a recursive nature.
This article is organized as follows. We begin with deriving the recurrence relation satisfied by the random variables (when the initial configuration is not random). From this recurrence, it is straightforward to characterize inductively the distribution of for small . The hard case when requires the development of more asymptotic tools, which we elaborate in Section 3. Asymptotics of the mean values of and are presented in Section 4 with a complete error analysis. Section 5 then addresses the asymptotics of the variance. Limit laws are established in Section 6 by an inductive argument and fine error analysis. Finally, we consider briefly in Section 7 the optimization time of the -EA for the LeadingOnes problem. Denote the corresponding optimization time by . We summarize the major results in Table 1. Note that all results for have previously been obtained in Ladret (2005) and we will sketch a different self-contained method of proof for them.
Some technical material is collected in Appendices A–F of HPRTC.
Notation.Throughout this article, all -terms are with respect to unless otherwise stated. We say that a quantity uniformly for as , or in words “ uniformly for bounded and large ,” if there exists a such that for any there is an such that for all . Here may depend on and . The definition extends similarly when holds uniformly for .
2 Recurrence and the Limit Laws of when
In this section, we derive first a recurrence relation satisfied by the probability generating function of , where denotes the number of steps taken by ()-EA to reach for the first time when starting from the initial state . From this recurrence and starting with , we can then get closed-form expressions one after another by iterating the recurrence, but the expressions soon become too cumbersome. We then use a simple inductive argument to derive the corresponding limit laws when remains bounded, together with an asymptotic approximation to the mean and one to the variance. These results not only reveal the complexity of the analytic problem when viewed from a generating function perspective but also serve to introduce the prototype forms of the mean and variance asymptotics, respectively, which we will examine in more detail later.
2.1 Recurrence for
The leading factor in Eq. (5) is the origin of the pervasive presence of “” in our asymptotic approximations.
While this simple recurrence relation is not new in the EA literature (see, for example, Bäck, 1992, Garnier et al., 1999, and He and Yao, 2003), tools have been lacking for a direct asymptotic resolution, which we will develop in detail in this article.
From a computational point of view (notably for higher moments), it is often preferable to use the following recurrence because fewer terms depending on are involved.
This follows from dividing both sides of Eq. (6) by and then rearranging terms there.
2.2 : from Geometric to Exponential
If the sequence of characteristic functions of the random variables converges pointwise to as for , and is continuous at zero, then converges in distribution to whose characteristic function is .
for . Such a limit law indeed extends to the case when , which we formulate in the next subsection.
2.3 The Distribution of when
Let denote the -th order harmonic numbers and . For convenience, we define .
Before proving this theorem, we derive a simple estimate for , which will be useful in our analysis below.
Proof of Theorem 3: Variance of . To compute the variance, one may start with the second moment and then consider the difference with the square of the mean; however, it is computationally more advantageous to study directly the recurrence satisfied by the variances themselves.
The simple inductive argument we used here extends to a wider range than (as obvious from the error terms established) but fails when, say . In order to cover the whole range , we will need more refined uniform estimates for the error terms, which will be dealt with in Section 4. Some of the tools needed are developed in the next section.
2.4 Asymptotic Expansions and Ansätze for
Asymptotic expansions for . Our uniform asymptotic approximation to was largely motivated by intensive symbolic computations for small . We briefly summarize them here, which will also be crucial in specifying the initial conditions for the differential equations satisfied by the functions () involved in the full asymptotic expansion of ; see Appendix D of HPRTC.
Formal calculations. The next formal question then is how to guess this function (before proving all assumptions). Here is the quick sketch of our ideas.
3 Asymptotics of Sums of the Form
3.1 Asymptotics of
3.2 Asymptotics of
We now show that and in Eq. (22) can be expressed in terms of linear combination of and .
4 The Expected Values of and Their Asymptotics
We will derive in this section a more precise expansion for the mean .
Our analysis will be based on the recurrence (16) for and use the idea of successive asymptotic iteration (or bootstrapping; see de Bruijn, 1981 or Flajolet and Sedgewick, 2009), which proceeds as follows. We consider first the difference , which satisfies itself a recurrence of the same type but with a different nonhomogeneous part. We bound this difference by Lemma 11 and a transfer technique, which deduces a uniform bound for the difference from that of the nonhomogeneous part. Then we repeat the same procedure by subtracting more terms and get a refined expansion. This same procedure can then be extended and yields a more precise expansion; see Appendix D of HPRTC.
Instead of starting from a state with a fixed number of 1s, the first step of the Algorithm ()-EA described in the introduction corresponds to the situation when the initial state (the number of 1s) is not fixed but random. Assume that this input follows a binomial distribution of parameter (each bit being 1 with probability and 0 with probability ). Denote by the number of steps taken by ()-EA to reach the optimum state. The following result describes precisely the asymptotic behavior of the expected optimization time.
A direct consequence of the precise estimates we derived is the following asymptotic approximation measuring the difference between (random input) and (fixed input), which improves the -bound for derived in the recent paper by Doerr and Doerr (2014).
Since the proof is straightforward either from the expansions in Theorems 8 and 9 or by the same method of proof of Theorem 9, we omit the details, which can be readily manipulated by standard symbolic computation tools. See Figure 6 for a graphical illustration of the difference .
4.1 More Asymptotic Tools
We develop here some other asymptotic tools that will be used in proving Theorem 8.
The following lemma is very helpful in obtaining error estimates to be addressed below. It also sheds new light on the occurrence of the harmonic numbers in Eq. (39).
Applying this lemma to the recurrence (16), we then get a simple upper bound for .
For , the inequality holds.
Uniformity of the estimate in the lemma and the asymptotic expansion (29) play a crucial rôle in our analysis.
The approximation can be easily extended and refined if more smoothness properties of are known, which is the case for all functions appearing in our analysis (they are all ).
4.2 Proof of Theorem 8
Our method of proof consists in three steps: first a heuristic calculation to get the dominant term, then an error analysis to justify the dominant term with an explicit error term, and finally another refined analysis (of the same inductive argument) to complete the proof of Theorem 8. The main idea of the error analysis is to express the error term as another recurrence of the same type but with a different nonhomogeneous part. Then showing the smallness of the nonhomogeneous part will then lead to the required order estimate for the error.
We start with the following identity whose proof is straightforward.
Note that the sum vanishes for . Take . Then we obtain the following identity, which is itself an asymptotic expansion for large (and ).