Abstract

We give a detailed analysis of the optimization time of the -Evolutionary Algorithm under two simple fitness functions (OneMax and LeadingOnes). The problem has been approached in the evolutionary algorithm literature in various ways and with different degrees of rigor. Our asymptotic approximations for the mean and the variance represent the strongest of their kind. The approach we develop is based on an asymptotic resolution of the underlying recurrences and can also be extended to characterize the corresponding limiting distributions. While most of our approximations can be derived by simple heuristic calculations based on the idea of matched asymptotics, the rigorous justifications are challenging and require a delicate error analysis.

1  Introduction

The last two decades or so have seen an explosion of application areas of evolutionary algorithms (EAs) in diverse scientific or engineering disciplines. An EA is a random search heuristic, using evolutionary mechanisms such as crossover and mutation, for finding a solution that often aims at optimizing an objective function. EAs proved to be extremely useful for combinatorial optimization problems because they can find good solutions for complicated problems using only basic mathematical modeling and simple operators with reasonable efficiency; see Coello Coello (2006), Deb (2001), and Horn (1997) for more information. Although EAs have been widely applied in solving practical problems, the analysis of their performance and efficiency, which often provides better modeling prediction for potential uses in practice, is much less developed, and only computer simulation results are available for most of the EAs in use; see, for example, Beyer et al. (2002), Droste et al. (1998), Garnier et al. (1999), and He and Yao (2001, 2002). We are concerned in this article with a precise probabilistic analysis of a simple algorithm called ()-EA.

A typical EA comprises several ingredients: the coding of solution, the population of individuals, the selection for reproduction, the operations for generating new individuals, and the fitness function to evaluate the new individual. The mathematical analysis of the time complexity is often challenging mostly because the stochastic dynamics is difficult to capture. It proves more insightful to look instead at simplified versions of the algorithm, seeking for a compromise between mathematical tractability and general predictability. Such a consideration was first attempted in Bäck (1992) and Mühlenbein (1992) in the early 1990s for the ()-EA, using only one individual with a single mutation operator at each stage. An outline of the procedure is as follows.

Algorithm ()-EA

1. Choose an initial string uniformly at random

2. Repeat until a terminating condition is reached

• Create by flipping each bit of (with probability of flipping), each bit independently of the others

• Replace by iff

Step 1 is often realized by tossing a fair coin for each of the bits, one independently of the others, and the terminating condition is either exhausting the number of assigned iterations or reaching a state when no further improvement has been observed for a given amount of time.

Mühlenbein (1992) considered in detail the complexity of ()-EA under the fitness function OneMax, which counts the number of 1s, namely, . More precisely, let denote the time needed to reach the optimum value (often referred to as the optimization time of OneMax). Then the expected time was argued to be of order , indicating the efficiency of the ()-EA. Bäck (1992) derived expressions for the success, failure, and stagnation probabilities of mutation. A finer asymptotic approximation of the form
1
was derived by Garnier et al. (1999), where when the mutation rate equals . They went further by characterizing the limiting distribution of in terms of a log-exponential distribution (which is indeed a double exponential or a Gumbel distribution). However, some of their proofs, notably the error analysis, seem incomplete (as indicated in their paper). Thus a precise result such as (1) has remained obscure in the EA literature.

While the -EA under simple fitness functions may seem too simplified to be of much practical value, the study of its complexity continues to attract the attention in the literature (see, for example, Auger and Doerr, 2011 and Neumann and Witt, 2010 for more recent developments) for several reasons. First, the -EA (under some fitness functions) represents one of the simplest models whose behavior is mathematically tractable. Second, the stochastic behaviors under such a simple formulation often have, although hard to prove, a wider range of applicability or predictability, either for more general models or for other meta-heuristics. Such a complexity robustness can on the other hand be checked by simulations, and in such a case, the theoretical results are useful in directing good guesses. For example, Neumann and Witt (2009) showed that 1-ANT behaves identically to the -EA in some situations. Also, Sudholt and Witt (2010) showed a similar behavior translation into Particle Swarm Optimization algorithms. Third, although tractable, most of the analyses of the -EAs are far from being trivial, and different mathematical tools have been introduced or developed for such a purpose, leading to more general methodological developments, which may also be useful for the analysis of other EAs or heuristics. Fourth, from a mathematical point of view, few models can be solved precisely, and those that can be often exhibit additional structural properties that are fundamental and may be of interest for further investigation. Finally, understanding the expected complexity of algorithms may help in identifying hard inputs and in improving the efficiency of the algorithm; see, for example, Doerr and Doerr (2014) and Doerr et al. (2013).

The expected optimization time required by ()-EA has undergone successive improvements, yet none of them reached the precision of Garnier et al.'s result (1); we summarize in Figure 1 some recent findings; a brief account of earlier results can be found in Garnier et al. (1999).

Figure 1:

Some known lower (indicated by an up-arrow) and upper (marked by a down-arrow) bounds for the optimization time of the -EA under OneMax.

Figure 1:

Some known lower (indicated by an up-arrow) and upper (marked by a down-arrow) bounds for the optimization time of the -EA under OneMax.

Note that, by a result of Doerr, Johannsen, and Winzen (2010, 2012), which showed that OneMax is the easiest function among all functions with a unique optimum (particularly, among all linear functions), any lower bound for OneMax also provides a lower bound for linear functions. Thus the precise asymptotic bounds we derive in this article may also extend as effective lower bounds for other fitness functions. On the other hand, Sudholt (2013) established that the ()-EA is, for OneMax, the fastest nonadaptive EA that uses only standard bit mutations to create new offspring, starting with a single search point.

In this article, we focus on the mutation rate and prove that the expected number of steps taken by the ()-EA to reach the optimum of OneMax function satisfies
2
where and are explicitly computable constants. More precisely,
where is Euler's constant, and
3
with an entire function defined by
See Eq. (40) for an analytic expression and numerical value for . Note that from an algorithmic point of view, a mutation rate leads to a complexity ; see Witt (2013) for more information.

These expressions, as well as the numerical values, are consistent with those given in Garnier et al. (1999). From the expression of , it is clear that its characterization lies much deeper than the dominant term . Numerically, such a characterization is also important because is close to being a constant for moderate values of , so that the overshoot ( being negative) from the leading term is not small in such a situation.

Finer properties, such as more precise expansions for , the variance, and limiting distribution will also be established. In particular, the study of the variance provides a measure of spread of the asymptotic distribution, and is in line with recent research on tail probabilities; see, for example, Witt (2014) and Zhou et al. (2012). The extension to does not lead to additional new phenomena as already discussed in Garnier et al. (1999); it is thus omitted in this article.

Our approach relies essentially on the asymptotic resolution of the underlying recurrence relation for the optimization time, and the method of proof is different from all previous approaches (including Markov chains, coupon collection, coupling, drift analysis, etc.). It consists of three major steps depicted in the following diagram.

Briefly, due to the recursive nature of the algorithm, we first derive the corresponding recurrence relation satisfied by the random variables that capture the remaining optimization time from different states of the algorithm. In the case when the recurrence can be solved by techniques from analytic combinatorics through the use of generating functions, the corresponding asymptotic approximations can often be obtained by suitable complex-analytic tools such as singularity analysis and saddle-point method; see the authoritative book by Flajolet and Sedgewick (2009) for more information. The analysis of ()-EA under LeadingOnes belongs to such a case; see Section 7 for details. However, when such a generating function-based approach fails to provide more manageable forms (in terms of functional or differential equations), a different route through “matched asymptotics” may be attempted, which is the one we adopt for the analysis of the -EA under OneMax. Roughly, we identify terms with the largest contribution on the right-hand side and guess the right form (the Ansatz) by matching the asymptotic expansions on both sides of the recurrence. The Ansatz is, once postulated, often easily checked by direct numerical calculations. The final stage is to justify the Ansatz by a proper error analysis, which often involves a delicate asymptotic analysis; see Wong (2014) for a recent survey of techniques for recurrences of linear type. Our recurrences are, however, of a nonlinear nature, and involve two parameters. Note that these two approaches are not exclusive, but instead complementary in many cases. For example, we rely on generating functions and complex-analytic tools for the proof of several auxiliary results in this article.

More precisely, we consider and study the random variables , which counts the number of steps taken by ()-EA before reaching the optimum state when starting with 1s (namely, ). We will derive very precise asymptotic approximations for each , . In particular, the distribution of is for large well approximated by a sum of exponential distributions, and this in turn implies a Gumbel limit law when . Then the time for to reach the optimum state by the -EA when starting with a random initial configuration (every bit being Bernoulli) can be readily characterized because the binomial distribution is highly concentrated near the mean; see Table 1 for a summary of our major results.

Table 1:
A summary of our findings for the time complexity of the -EA under OneMax () and LeadingOnes () fitness function, respectively, when starting from a random initial state and when the mutation probability is . The constant is defined in Eq. (2) and the function in Eq. (3). The symbol here means that the ratio tends to 1 as .
Properties
Mean
Variance
Limit law Gumbel distribution Gaussian distribution

Approach Ansatz & error analysis Analytic combinatorics
Properties
Mean
Variance
Limit law Gumbel distribution Gaussian distribution

Approach Ansatz & error analysis Analytic combinatorics

In addition to its own methodological merit of obtaining stronger asymptotic approximations and potential use in other problems in similar EAs, our approach, to the best of our knowledge, provides the first rigorous justification of the far-reaching results of Garnier et al. (1999) more than seventeen years ago. It also sheds new light on further potential use of similar techniques to related problems of a recursive nature.

This article is organized as follows. We begin with deriving the recurrence relation satisfied by the random variables (when the initial configuration is not random). From this recurrence, it is straightforward to characterize inductively the distribution of for small . The hard case when requires the development of more asymptotic tools, which we elaborate in Section 3. Asymptotics of the mean values of and are presented in Section 4 with a complete error analysis. Section 5 then addresses the asymptotics of the variance. Limit laws are established in Section 6 by an inductive argument and fine error analysis. Finally, we consider briefly in Section 7 the optimization time of the -EA for the LeadingOnes problem. Denote the corresponding optimization time by . We summarize the major results in Table 1. Note that all results for have previously been obtained in Ladret (2005) and we will sketch a different self-contained method of proof for them.

Some technical material is collected in Appendices A–F of HPRTC.

Notation.Throughout this article, all -terms are with respect to unless otherwise stated. We say that a quantity uniformly for as , or in words “ uniformly for bounded and large ,” if there exists a such that for any there is an such that for all . Here may depend on and . The definition extends similarly when holds uniformly for .

2  Recurrence and the Limit Laws of when

In this section, we derive first a recurrence relation satisfied by the probability generating function of , where denotes the number of steps taken by ()-EA to reach for the first time when starting from the initial state . From this recurrence and starting with , we can then get closed-form expressions one after another by iterating the recurrence, but the expressions soon become too cumbersome. We then use a simple inductive argument to derive the corresponding limit laws when remains bounded, together with an asymptotic approximation to the mean and one to the variance. These results not only reveal the complexity of the analytic problem when viewed from a generating function perspective but also serve to introduce the prototype forms of the mean and variance asymptotics, respectively, which we will examine in more detail later.

2.1  Recurrence for

Lemma 1:
The probability generating function satisfies the recurrence
4
with , where
5

The leading factor in Eq. (5) is the origin of the pervasive presence of “” in our asymptotic approximations.

Proof:
Start from the state and run the two steps inside the loop of Algorithm ()-EA. The new state becomes with if bits in the group and bits in the other group toggled their values, where and . Thus, the probability from state to is given by
which is identical to Eq. (5). Since (see Figure 2.1)
where the symbol denotes distributional equivalence, we see that
6
and this proves the lemma.
Figure 2:

The transition of states and their probabilities.

Figure 2:

The transition of states and their probabilities.

While this simple recurrence relation is not new in the EA literature (see, for example, Bäck, 1992, Garnier et al., 1999, and He and Yao, 2003), tools have been lacking for a direct asymptotic resolution, which we will develop in detail in this article.

From a computational point of view (notably for higher moments), it is often preferable to use the following recurrence because fewer terms depending on are involved.

Corollary 1:
For
7
Proof:

This follows from dividing both sides of Eq. (6) by and then rearranging terms there.

For convenience, define
which can be interpreted as the probability that a mutation is successful in increasing the fitness (objective function value).

2.2  ⁠: from Geometric to Exponential

As the first nontrivial case beyond , here we study in detail or, equivalently, . In this case,
so that, by Eq. (4),
This is a standard geometric distribution (assuming only positive integer values) with probability
This implies that , and it is natural to consider the normalized random variable for which we have, by taking () and by using the expansions and for bounded ,
8
as , for bounded real . Note that the error term for the last convergence is of the form , which holds uniformly for bounded , but we do not need this uniform estimate. Also observe that is the characteristic function of an exponential distribution with parameter 1. The passage from the convergence of a sequence of characteristic functions to that of the corresponding distribution functions can be justified by Lévy's continuity theorem (van der Vaart, 1998, Section 2.3; or Flajolet and Sedgewick, 2009, Section IX 4.2):

If the sequence of characteristic functions of the random variables converges pointwise to as for , and is continuous at zero, then converges in distribution to whose characteristic function is .

This and Eq. (8) imply the convergence in distribution
where denotes an exponential distribution with parameter . Equivalently, this can be rewritten as

for . Such a limit law indeed extends to the case when , which we formulate in the next subsection.

2.3  The Distribution of when

Let denote the -th order harmonic numbers and . For convenience, we define .

Theorem 1:
Assume . Then the time used by ()-EA to reach the optimum state , when starting from , converges (after normalized by ) to a sum of exponential random variables
9
moreover, the mean of is asymptotic to and the variance to .
From the integral representation (by induction) for the moment generating function of :
we see that the convergence in distribution Eq. (9) can alternatively be expressed in the more transparent form
see Figure 3 for some plots for the density functions of .
Figure 3:

Histograms of for (in left to right order) and (in topdown order when starting from the peak in each figure), and their corresponding limit laws.

Figure 3:

Histograms of for (in left to right order) and (in topdown order when starting from the peak in each figure), and their corresponding limit laws.

Before proving this theorem, we derive a simple estimate for , which will be useful in our analysis below.

Lemma 2:
Assume . Then
10
where the -term holds uniformly in and .
Proof:
When
so Eq. (10) holds. Assume now . By the sum definition (5) of , we see that when , the binomial factor is also bounded, and the other factors decrease with (for fixed and ), which means that the largest term comes from , all other terms being of a smaller order. More precisely, the term with equals
and the contribution from the remaining terms (with ) is bounded above by
The last series then satisfies
for and . So we have for in the same range
Collecting these estimates, we then obtain Eq. (10).
Corollary 2:
For , the probability that a mutation succeeds in increasing the fitness is asymptotic to
11
where the -term holds uniformly in .
Proof:
By Eq. (10)
from which Eq. (11) follows.
Proof of Theorem 3: Limit laws. Using the fact that for (being a characteristic function), we then deduce, by Equations (4), (10), and (11), that
Since is an analytic function of near unity and , there exists an interval in which for some . Thus we can rewrite the above relation as
where
the -term holding uniformly for bounded and . By iterating this relation times using , we obtain
12
uniformly for bounded and . This and Lévy's continuity theorem (van der Vaart, 1998, Section 2.3) imply Eq. (9).
Note that if we compute formally the Taylor expansion at of the right-hand side of the uniform asymptotic estimate (12), we obtain
so we expect that the mean and the variance of will be asymptotic to and , respectively, which is true and in consistency with the results obtained by the Quasi-power framework (see Hwang, 1998 or Flajolet and Sedgewick, 2009, Section IX.9). For self-containedness and to pave the way for more refined arguments later, we will instead prove these two estimates by a direct, independent approach.
Proof of Theorem 3: Mean value. We now turn to the mean , which satisfies, by taking derivative with respect to and then substituting in Eq. (4), the recurrence
13
By the same reasoning used above for based on Eq. (10), we see that the largest contribution on the left-hand side comes from terms with , and we expect the estimate
or
which, by iteration, yields . The error analysis to justify this is not difficult but less interesting. Isolating first the terms corresponding to and then rearranging all other terms, we get, again by Eq. (10),
14
Thus we can rewrite this as
where both . Note that and may be negative. Rearranging this recurrence as
A direct iteration gives, by ,
This proves the required estimate for the mean when .

Proof of Theorem 3: Variance of . To compute the variance, one may start with the second moment and then consider the difference with the square of the mean; however, it is computationally more advantageous to study directly the recurrence satisfied by the variances themselves.

For that purpose, we begin with Eq. (7) and substitute , obtaining
To derive a recurrence satisfied by the variance, we consider the moment generating function for the centered random variables
which then satisfies the recurrence
for with . Let be the variance of . Then satisfies the recurrence
15
Heuristically, the dominant terms on both sides come from when , and we expect that ()
or
iterating this recurrence times leads to the required estimate for the variance.
For the justification, we follow the same argument used above for . First, we rearrange the terms in Eq. (15) as
Then by the estimates (10) and (14), we obtain
and we are led to the form
where and when . By a direct iteration, we obtain
This implies the asymptotic estimate when is bounded, and completes the proof of Theorem 3.
To summarize, we saw that both the mean and the variance satisfy the same type of recurrence
with suitable initial conditions, for some given sequences . We also observe the transfer between the asymptotics of and that of
Such a correspondence will be useful in guiding our guess of the right Ansatz to be explored below when lies in the most interesting range (the symbol means that the left-hand side is of the same growth order as the right-hand side).

The simple inductive argument we used here extends to a wider range than (as obvious from the error terms established) but fails when, say . In order to cover the whole range , we will need more refined uniform estimates for the error terms, which will be dealt with in Section 4. Some of the tools needed are developed in the next section.

2.4  Asymptotic Expansions and Ansätze for

Normalizing the mean values. Let . For simplicity, we consider
where , so that satisfies, by Eq. (13), the simpler-looking recurrence
16
with , where
17
From these relations, we obtain , and
18
In general, the 's are all rational functions of but their expressions become long as increases. We thus turn to asymptotic approximation.

Asymptotic expansions for . Our uniform asymptotic approximation to was largely motivated by intensive symbolic computations for small . We briefly summarize them here, which will also be crucial in specifying the initial conditions for the differential equations satisfied by the functions () involved in the full asymptotic expansion of ; see Appendix D of HPRTC.

Starting from the closed-form expressions (18), we readily obtain , , and
Similarly, we have
From these expansions, we first observe that the leading sequence is exactly ()
An Ansatz for small . These also suggest the Ansatz
for some functions of . Using this form and the above expansions to match the undetermined coefficients of the polynomials (in ), we obtain successively
So we observe the general pattern
for some explicitly computable sequence and coefficients . A crucial complication arises here: the general form of each holds only for , and correction terms are needed for smaller . For example,
where we use the Iverson bracket notation if holds, and 0, otherwise. It is such a complication that makes the determination of smaller-order terms more involved.
An Ansatz for large . All the expansions here hold only for small . When grows, we write and see that
and it is exactly this form that motivated naturally our choice of the Ansatz
19
for some function . This will be seen to be equivalent to the approximation . Note that the omnipresence of the harmonic numbers may be traced to the asymptotic estimate (10); see also Lemma 11.

Formal calculations. The next formal question then is how to guess this function (before proving all assumptions). Here is the quick sketch of our ideas.

Substituting formally Equations (19) into (16) using the expansions (see Corollary 15) and (see Lemma 13), we expect that
20
We are then led to the study of sums of the form for a given sequence . As we will see in the next section (defined in the introduction), so that has to satisfy the differential equation
Choosing properly the initial condition, we then conclude that , as defined in Eq. (3). The justification of all these estimates, as well as more refined expansions, turns out to be highly nontrivial and requires several asymptotic tools that will be developed in the following two sections.

3  Asymptotics of Sums of the Form

Sums of the form
will appear frequently in our analysis. We thus digress in this section to develop tools for deriving the asymptotic behaviors of such sums. We consider first general and then specialize the discussion to the cases when for .

3.1  Asymptotics of

Observe that, by Eq. (10), we see that most contribution to comes from small , say , provided that does not grow too fast. Indeed, we expect more precisely that
21
The last step can be justified by bounding all errors involved, but the calculations become messy, especially when one needs more terms in the expansion. We use instead a more elegant approach via generating functions.
Lemma 3:
Let be a given sequence whose generating function has a nonzero radius of convergence in the complex -plane. Then
22
where and are entire functions of defined by
23
and ()
24
Proof:
Observe that the sum on the left-hand side of Eq. (17) is itself a Cauchy product, namely,
Let denote the coefficients of in the Taylor expansion of . Then the right-hand side equals
Our analytic proof then starts from the relation (Cauchy's integral representation)
25
where . The relation (25) holds a priori for , but the right-hand side becomes zero for . It follows, by multiplying both sides by and summing over for the left-hand side and over all for the right-hand side, that
where , being the radius of convergence of . By the asymptotic expansion
for large and bounded and , where the -term holds uniformly for on the integration path, and the integral representations
26
we deduce Eq. (22). The expression (23) is then obtained by first expanding (in decreasing powers of ) and then by integrating term-by-term:
Note that . For (24), we apply an integration by parts using the relation
and the decomposition
giving
27
Substituting the series expansion and then integrating term by term, we get Eq. (24).
When tends to the two boundaries 0 and 1, we have
where is the smallest integer such that .

3.2  Asymptotics of

We now discuss special sums of the form (when ) and define
which will be repeatedly encountered below. Define also , so that . For convenience, we write
28
Let denote the modified Bessel functions

We now show that and in Eq. (22) can be expressed in terms of linear combination of and .

Corollary 3:
Uniformly for
29
for , where both and are entire functions given by
and
30
In particular,
31
These are sufficient for our uses.
Proof:
By applying Equations (22), (23), and (26) with , we see that
32
where . It remains to simplify . To that purpose, we start with the integral representation (see Eq. (27))
33
When , we have . Thus
which proves Eq. (30) for . For , we have
From this and the relation Eq. (33), we obtain Eq. (30). Note that the coefficient of is zero.
The Corollary implies specially that
34
uniformly for . Since as , we have the uniform bound
35
meaning that the ratio of remains bounded away from zero and infinity for all in the specified range.
The following expansions for and as will be used later
36
for . See also Appendix A of HPRTC for other properties of .

4  The Expected Values of and Their Asymptotics

We will derive in this section a more precise expansion for the mean .

Theorem 2:
The expected optimization time of the -EA when starting with 1s satisfies the asymptotic approximation
37
uniformly for , where is defined in Eq. (3) and is an analytic function defined by
38
the integrand having a removable singularity at .
As discussed above, we consider for simplicity , and we will prove the slightly simpler expansion
39
for , which is identical to Eq. (37) by using the relation and the asymptotic expansions
and
See Figure 4 for graphic renderings. More figures are collected in Appendix C of HPRTC.
Figure 4:

Left: the differences for (normalized to the unit interval) and (in top-down order); Right: the normalized differences for (in bottom-up order when viewing from the point .

Figure 4:

Left: the differences for (normalized to the unit interval) and (in top-down order); Right: the normalized differences for (in bottom-up order when viewing from the point .

Our analysis will be based on the recurrence (16) for and use the idea of successive asymptotic iteration (or bootstrapping; see de Bruijn, 1981 or Flajolet and Sedgewick, 2009), which proceeds as follows. We consider first the difference , which satisfies itself a recurrence of the same type but with a different nonhomogeneous part. We bound this difference by Lemma 11 and a transfer technique, which deduces a uniform bound for the difference from that of the nonhomogeneous part. Then we repeat the same procedure by subtracting more terms and get a refined expansion. This same procedure can then be extended and yields a more precise expansion; see Appendix D of HPRTC.

Instead of starting from a state with a fixed number of 1s, the first step of the Algorithm ()-EA described in the introduction corresponds to the situation when the initial state (the number of 1s) is not fixed but random. Assume that this input follows a binomial distribution of parameter (each bit being 1 with probability and 0 with probability ). Denote by the number of steps taken by ()-EA to reach the optimum state. The following result describes precisely the asymptotic behavior of the expected optimization time.

Theorem 3:
The expected value of satisfies
where .
Note that is an increasing function of , which is consistent with the intuition that it takes less steps to reach the final state if we start with more 1s (small means closer to 1, or 1 occurring with higher probability). Also
The constant in Eq. (2) can now be computed and has the value ()
40
Numerically, to compute the value of for , the most natural way consists in using the Taylor expansion
and after a term-by-term integration
While is an entire function with rapidly decreasing coefficients, such an expansion converges slowly when is close to 1, the main reason being that the smallest nonzero for which occurs when (see Figure 5), implying that the radius of convergence of this series is only slightly larger than unity. Note that but the simple pole is removed by subtracting . A better idea is then expanding at and then integrating term-by-term, giving
This expansion is numerically more efficient and stable because of better convergence when . The same technique also applies to the calculation of and other functions in this article.
Figure 5:

has an infinite number of zeros on .

Figure 5:

has an infinite number of zeros on .

A direct consequence of the precise estimates we derived is the following asymptotic approximation measuring the difference between (random input) and (fixed input), which improves the -bound for derived in the recent paper by Doerr and Doerr (2014).

Corollary 4:
The difference between and satisfies
41
uniformly for , where
Note that the dominant term on the right-hand side is bounded when is so. Also the coefficients here can be completely written in terms of the 's; for example

Since the proof is straightforward either from the expansions in Theorems 8 and 9 or by the same method of proof of Theorem 9, we omit the details, which can be readily manipulated by standard symbolic computation tools. See Figure 6 for a graphical illustration of the difference .

Figure 6:

The differences when for (left) and the -term of Eq. (41) (right). The blue curves (left) correspond to the right-hand side of Eq. (41) without the -term. The periodic fluctuations come from , where denotes the fractional part of . Numerically, when is even (odd), the first term on the right-hand side is approximately (0.47564). Note here that starts with 1s.

Figure 6:

The differences when for (left) and the -term of Eq. (41) (right). The blue curves (left) correspond to the right-hand side of Eq. (41) without the -term. The periodic fluctuations come from , where denotes the fractional part of . Numerically, when is even (odd), the first term on the right-hand side is approximately (0.47564). Note here that starts with 1s.

4.1  More Asymptotic Tools

We develop here some other asymptotic tools that will be used in proving Theorem 8.

The following lemma is very helpful in obtaining error estimates to be addressed below. It also sheds new light on the occurrence of the harmonic numbers in Eq. (39).

Lemma 4:
Consider the recurrence
where is defined for and . Assume that for , where . If holds uniformly for and , where , then
Proof:
The result is true for . For , we start from the simple inequality
because all terms in the sum expression (17) are positive and taking only one term ( and ) gives the lower bound. Then, by the induction hypothesis,
proving the lemma.

Applying this lemma to the recurrence (16), we then get a simple upper bound for .

Corollary 5:

For , the inequality holds.

Lemma 5:
If is a -function (twice continuously differentiable in the unit interval), then
uniformly for .

Uniformity of the estimate in the lemma and the asymptotic expansion (29) play a crucial rôle in our analysis.

Proof:
A direct Taylor expansion with remainder gives
uniformly for , since for . The lemma follows from the estimates (29).

The approximation can be easily extended and refined if more smoothness properties of are known, which is the case for all functions appearing in our analysis (they are all ).

Another standard technique we need is Stirling's formula for the factorials
42
where denotes Euler's Gamma function.

4.2  Proof of Theorem 8

Our method of proof consists in three steps: first a heuristic calculation to get the dominant term, then an error analysis to justify the dominant term with an explicit error term, and finally another refined analysis (of the same inductive argument) to complete the proof of Theorem 8. The main idea of the error analysis is to express the error term as another recurrence of the same type but with a different nonhomogeneous part. Then showing the smallness of the nonhomogeneous part will then lead to the required order estimate for the error.

Lemma 6:
For a given sequence , let denote the backward difference operator . Then for

Note that the sum vanishes for . Take . Then we obtain the following identity, which is itself an asymptotic expansion for large (and ).

Corollary 6:
For and ,
Formal calculations. Assuming the validity of Eq. (19), we can make the formal calculations in Eq. (20) more precise by Eq. (29), Eq. (39), Lemma 13, and Corollary 15, obtaining
Thus we see that satisfies
We now specify the initial condition . Since the postulated form (19) holds for (indeed also true for ), we take and see that because . This implies that . The first few terms in the Taylor expansion of read as follows.
43
which can then be checked with the explicit expressions of for small (see Section 2.4).
Error analysis. To justify the form (19) (with ), we consider the difference
which satisfies, by Eq. (16), the recurrence
44
where
We first show that , and this will imply, by Lemma 11, the estimate .
By the asymptotic relation (29) with and the definition of , we have
and thus
By Lemma 13, we see that
uniformly for . On the other hand, we have the upper bounds (see Corollary 15)
Note that the first estimate is uniform only for . When is close to , say , where , the left-hand side blows up with but the right-hand side remains bounded. Thus we split the sum at