## Abstract

The behavior of the -Evolution Strategy (ES) with cumulative step size adaptation (CSA) on the ellipsoid model is investigated using dynamic systems analysis. At first a nonlinear system of difference equations is derived that describes the mean value evolution of the ES. This system is successively simplified to finally allow for deriving closed-form solutions of the steady state behavior in the asymptotic limit case of large search space dimensions. It is shown that the system exhibits linear convergence order. The steady state mutation strength is calculated, and it is shown that compared to standard settings in self-adaptive ESs, the CSA control rule allows for an approximately -fold larger mutation strength. This explains the superior performance of the CSA in non-noisy environments. The results are used to derive a formula for the expected running time. Conclusions regarding the choice of the cumulation parameter *c* and the damping constant *D* are drawn.

## 1 Introduction

The performance of evolution strategies (ESs) depends crucially on the optimal control of the mutation strength , which determines the step length of the search steps used to generate offspring from parents. There are basically four established methods to learn/control the mutation strength: Rechenberg’s 1/5-rule (Rechenberg, 1973), self-adaptation (SA) (Rechenberg, 1973; Schwefel, 1977), meta-ES (Herdy, 1992; Rechenberg, 1994), and cumulative step size adaptation (CSA) (Ostermeier et al., 1994). Understanding and analyzing the working principles of these adaptation techniques by considering the ES in conjunction with the objective functions to be optimized allows for a well-grounded choice of strategy-specific parameters, such as the learning parameter and the damping constant. The analysis approach, which has been the most fruitful one until now, considers the ES + objective function as a *dynamic system* (Meyer-Nieberg and Beyer, 2012). That is, it is the goal of the analysis to determine the time evolution of the system. However, since ESs are probabilistic algorithms, this analysis concerns the dynamics of stochastic, most often nonlinear, systems. Because of the difficulties of such an analysis, progress in this field has been rather slow. The first fully analyzed algorithm concerned the -SA-ES on the sphere (Beyer, 1996). In the following years progress was made in different directions concerning more complex, that is, recombinative ES and also more complex objective functions such as ridge functions and a subset of positive definite quadratic forms (PDQFs); see, for instance, Arnold and Beyer (2004); Jägersküpper (2006); Beyer and Meyer-Nieberg (2006); Arnold (2007); and Beyer and Finck (2010).

The most advanced dynamic systems analysis approach on ESs was presented by Beyer and Melkozerov (2014), who investigated the -SA-ES on the ellipsoid model. In that paper a new progress measure was introduced to model the dynamics of the quadratic distances of the parental state to the optimizer: the *quadratic progress rate*. While that work completed the analysis of the isotropic self-adaptive standard ES, a similar analysis of CSA control has not advanced that much. The fitness models considered so far concerned the cigar function (Arnold, 2007; Arnold and Beyer, 2010) and another special case of PDQFs consisting of a mixture of two sphere models (Beyer and Finck, 2010). These analyses were performed along the line developed in Arnold (2002). Because of the inherent symmetries of those fitness models (cigar and mixture of two spheres) the dynamics in the search space can be lumped together, thus reducing the dynamics to a few state variables describing the approach to the optimizer. This aggregation not only concerns the parental state in the search space but also the *evolution path cumulation* that is used to measure the average length of the actually realized change steps between two consecutive parental states, which are used to control the mutation strength . As a result, the analysis presented by Beyer and Melkozerov (2014) for the -SA-ES cannot be directly transferred to the corresponding CSA-ES. It is the aim of this paper to extend the analysis method developed by Beyer and Melkozerov (2014) to handle the path cumulation in CSA-ES. To this end, we deviate from the standard analysis developed by Arnold (2002) and fall back to the analysis method that has been successful in the analysis of SA. That is, we derive a *self-adaptation response* (SAR) function for the cumulative step size adaptation. This approach allows for an analysis of the CSA-ES similar to that of the -SA-ES in Beyer and Melkozerov (2014).

The derivation of a SAR function for CSA is challenging, since a SAR function for CSA in the proper meaning of the word does not exist. The original SAR function (Beyer, 2001) is a *one-generation* progress measure that describes the expected relative change from generation *g* to . Path cumulation is, however, a process that is *nonlocal* in time, that is, it is the result of the cumulation process that incorporates a fading record of parental steps taken. As a result, a SAR function for CSA must necessarily be a quantity that depends not only on the damping constant *D* (which corresponds to the learning parameter in the SA) but also on the cumulation time constant .

The analysis to be presented can be regarded as an important preparatory step for an analysis of the CMA-ES (Hansen and Ostermeier, 2001), since it provides a system of difference equations that describes the evolution for ellipsoid models. Since CMA-ES transforms quadratic models via the square root of the covariance matrix in another quadratic model, this analysis also holds for these transformed models. That is, the analysis presented is a building block for a complete analysis of the CMA-ES.

The paper is organized as follows. After a short recap of the ideas of CSA and the ellipsoid model, results of the analysis of the self-adaptive ES are reviewed as the basis for the derivations presented in subsequent sections. Next, the CSA relevant path cumulation is analyzed in Section 5, yielding a system of recurrence equations that are combined in Section 6 to form a system of evolution equations describing the mean value dynamics of the -CSA-ES. In a next step, simplifications are introduced in order to obtain a system of tractable evolution equations that allows for the calculation of a function describing the expected generational change in the steady state, similar to the self-adaptation response function used in the analysis of the SA-ES. The resulting system of evolution equations is then treated by an *Ansatz* similar to the one used by Beyer and Melkozerov (2014). The steady state of the normalized mutation strength dynamics is considered, and the influence of the strategy parameters *c* and *D* are discussed. In Section 8 the steady state mutation strength and the convergence rates are calculated in terms of closed-form expressions. The results are used to estimate the expected running time. Conclusions are drawn in Section 9.

## 2 The -CSA-ES Algorithm

The focus of this paper is on the dynamic behavior of the -CSA-ES. As indicated by the notation, nonelitist selection (“,” selection) rather than elitist selection (“+” selection) is considered. That is, in each generation only the offspring population is involved in the selection process. The comma selection is advantageous in real-valued parameter optimization because it allows for the use of greater mutation strengths, which is, for example, beneficial when optimizing multimodal objective functions. The -CSA-ES controls the mutation strength, also referred to as step size, within the algorithm by cumulative step size adaptation (see Ostermeier et al., 1994). CSA gathers previously successful search steps in a fading search path history. The mutation strength is then adjusted depending on the length of this search path. Another option to adapt the mutation strength of the ES is self-adaptation. In contrast to CSA, it provides each offspring with an individual mutation strength computed from the recombined mutation strengths of the best offspring of the previous generation. The SA was investigated by Beyer and Melkozerov (2014).

The pseudocode of the -CSA-ES is presented in Table 1. First the initial parental parameter vector, also referred to as the *parental centroid*, the initial mutation strength , and the initial search path are specified. Then offspring are generated (lines 3–5) by adding the product of the mutation strength and an *N*-dimensional random mutation vector to the parental centroid. The components of each are independent and identically distributed standard normal variates. The corresponding fitness function value *F _{l}* of each offspring is calculated in line 6. In line 8 the mutation vectors of the best offspring with respect to fitness are recombined to create their centroid . As indicated by the index

*I*in , this centroid is generated using intermediate recombination. In this context the subscript denotes the

*m*th best of the offspring (i.e., in case of minimization the offspring with the

*m*th smallest fitness value). The centroid of the best mutation vectors is used in line 9 to compose a new parental centroid and in line 10 to update the search path . This search path contains a fading record of the strategy’s previous steps. The length of its memory is determined by the choice of the constant parameter , referred to as the

*cumulation parameter*. The mutation strength is then updated in line 11 by multiplication with an exponential value depending on the length of the search path as well as on the

*damping parameter*

*D*. is a constant parameter that determines the magnitude of the mutation strength updates. The sign of determines whether the mutation strength is increased or decreased. Long search paths indicate that the steps made by the ES collectively point in one direction and could be replaced with fewer but longer steps. Short search paths suggest that the strategy steps back and forth and thus that smaller step sizes should be beneficial. After termination the strategy returns the current parental centroid, which is considered an approximation of the optimizer of the objective function .

## 3 The Ellipsoid Model

*N*represents the search space dimension, and

*a*are the coefficients of the ellipsoid model. Throughout the investigations the coefficients

_{i}*a*are chosen exemplarily as 1,

_{i}*i*, and

*i*

^{2}. The optimization problem is formulated as Its optimizer resides at the origin of the coordinate system, . Model (1) represents the general case of positive definite quadratic forms for the -CSA-ES. This is owing to the isotropy of the mutation vectors in Table 1, line 5, which secure the algorithm’s invariance with respect to arbitrary rotations of the coordinate system.

The dynamic behavior of the -CSA-ES on ellipsoid model (1) is illustrated in Figure 1. It presents the results of typical runs of the ES, focusing on the squared components of the parental centroid as well as on the mutation strength dynamics. Approaching the optimizer, the strategy continuously decreases the mutation strength with the passing number of generations.

## 4 Extending Previous Results to CSA-ES

*i*th axis of ellipsoid model (1) is the expected change of the parental parameter vector component

*y*from generation

_{i}*g*to generation :

*generalized progress coefficients*(Beyer, 2001) where is the cumulative distribution function of the standard normal variate. Using the first-order progress rate, one obtains good component-wise predictions of the expected approach of the parental centroid toward the optimizer as long as the distance to the optimizer is sufficiently large compared to the respective progress rate values. Otherwise the perturbations of the evolutionary process superimpose the mean value dynamics, Eq. (4). That is, the predictive quality of the first-order progress rate decreases when approaching the optimizer. As a consequence, the more stable quadratic progress measure was conceived.

*i*th axis of ellipsoid model (1) is the expected change of squared component of the parental centroid between two consecutive generations

*g*and :

*a*, that is, if the conditions are fulfilled, the normalized progress rate is asymptotically Renormalization can be obtained by applying Eqs. (3) and (5) and yields As shown by Beyer and Melkozerov (2014), the quadratic progress rate can be used to describe the expected approach to the optimizer for each squared component of the parental centroid. It exhibits the typical features of a well-defined progress measure, namely, a gain and a loss term, which depend on the mutation strength. This ensures the existence of an optimal mutation strength and of an evolution criterion (also referred to as a convergence criterion). Additionally, the loss term is inversely proportional to the parental population size . That is, the

_{i}*genetic repair effect*of recombination can be observed on the ellipsoid model (Beyer and Melkozerov, 2014). The renormalized quadratic progress rate (Eq. (13)) will be useful for the derivation of the evolution equations of the CSA-ES in the next section.

## 5 The Mutation Strength Dynamics

*g*to generation is necessary. As it turns out, unlike the -SA-ES, the CSA-ES lacks a simple formula for the expected change of the mutation strength . Because of the influence of the search path update on the mutation strength evolution, the description of the mutation strength dynamics is more complex. In the first place, we deal with this problem by describing the expected change of by use of a system of recurrence equations. The first difference equation concerns the mutation strength control. According to Table 1, line 11, the expected mutation strength in generation is In the asymptotic limit this can be expressed as Obtaining a recurrence equation requires the knowledge of the squared length of search path in generation . From Table 1, line 10, it follows that For the calculation of the expected change of the search path’s squared norm one needs to determine the expected values of the scalar product and the squared norm of the parental centroid’s mutation vector in generation

*g*. For the latter it holds: Considering the expected value of the squared single components of the mutation vector results in These sums of product moments were derived by Beyer and Melkozerov (2014). Inserting the corresponding expressions into Eq. (19) and combining the fractions yields Finally, the aggregation of all

*N*components of the mutation vector in generation yields Expressions (20) and (21) can be simplified considerably if the conditions (11) hold. That is, provided that there exist no dominating ellipsoid coefficients

*a*and that the search space dimension is large compared to the parental population size , the term in Eq. (20) can be neglected. The validity of this approach also requires that the dynamics behave “nicely.” Depending on the initialization of the strategy this condition might be violated in the beginning, but it is always fulfilled in the asymptotic limit . The consistency of the approach was checked by Beyer and Melkozerov (2014) by reinserting the final results into the quadratic progress rate. Consequently, Eq. (20) becomes Similarly, Eq. (21) becomes asymptotically The suitability of these asymptotically correct formulas will additionally be justified by comparisons of the two different iteratively generated dynamics (using Eq. (21) or Eq. (23), respectively) with the dynamics of real -CSA-ES runs (see Section 6).

_{i}*i*th component of the mutation vector of the parental centroid in generation

*g*can be modeled as According to Eq. (25), the expected value of the

*i*th component in generation is For further analysis it is desirable to represent the right-hand side of Eq. (27) by means of the ES state in the previous generation

*g*. This is achieved by taking into account Table 1, line 9, and the definition of the quadratic progress rate in Eq. (8): Provided that the quotient is sufficiently small, Taylor expansion of the square root with respect to transforms Eq. (28) into

*N*. Consequently, the small contribution of the term to the expected value in Eq. (30) is ignored in the following in order to keep the theoretical analysis tractable. Thus we obtain Analogously to Eq. (26), the

*i*th component of the mutation vector of the parental centroid in generation can be described by the sum of its expected value and a term denoting the stochastic fluctuations: Now Eqs. (25) and (32) are used to derive the difference equation for the scalar product. According to Table 1, line 10, the search path components are updated by After multiplication with , Eq. (33) changes to and by insertion of Eqs. (31) and (32), the

*i*th addend of the scalar product in generation reads Expansion of the products yields By rearranging the terms and making use of Eqs. (25) and (26) we obtain Considering expected values in Eq. (37), the perturbation terms and vanish by definition, leading to In order to justify the approximation of (37) by expected value expressions, the iterative dynamics resulting from Eq. (38) are compared to experimental runs of the ES (see Section 6). Now considering Eq. (25), we obtain With the use of (22), Eq. (39) transforms into Notice, the scalar product of the search path and the centroid of the best mutation vectors in generation

*g*is . That is, Eqs. (39) and (40), respectively, provide a component-wise representation of the difference equation that describes the one-generation change of the scalar product. The results in Eqs. (20) and (39) allow for the compilation of a difference equation, which describes the expected change of the squared length of the search path; see also Eq. (17): Together with Eq. (16) the mutation strength change from generation

*g*to generation can be described by means of the difference equations (39) and (41) taking Eq. (21) into account.

## 6 Evolution Equations

*g*to generation can be divided into mean value parts and fluctuation terms ( and ) as follows (Beyer and Melkozerov, 2014): The mean value parts in Eq. (42) are directly given by the quadratic progress rates (9) and (13), respectively. The self-adaptation response function describes the mean value dynamics of Eq. (43). Unfortunately, in the context of CSA-ES the derivation of a closed SAR formula appears to be a hard task even in the steady state case. That is, in the first step it has to be substituted by the difference equations (16), (39), and (41) before an asymptotical approximation of the SAR function is derived (see Section 7.1). In order to keep the analysis tractable, the fluctuation terms are disregarded, and asymptotically correct simplifications are derived and compared with experiments.

A first representation of the strategy’s evolution behavior is provided in Table 2, iterative scheme A. The one-generation behavior of the component-wise squared distance to the optimizer is modeled in (A.1) by use of Eq. (9). Using Eq. (20) to express the expected values in Eq. (39) yields the iterative relation (A.2) for the scalar product components . The iterative description of the squared length of the search path in (A.3) is obtained by inserting Eq. (21) into Eq. (41). Finally, the mutation strength adaptation is specified in (A.4) using Eq. (16).

Whether the modeling approach yields meaningful results can be checked by comparing the iteratively generated dynamics of the system of evolution equations A in Table 2 with experimental results of real -CSA-ES runs (see Figure 3). The typical long-term behavior of the ES is observed for , which is obtained by iteration of scheme A starting from , . Cumulation and damping parameters have been set to and . Considering a small-dimension *N*, the experimental data of the ES slightly deviate from the theoretical predictions. These deviations diminish with increasing search space dimension. In fact, on both sides a good agreement of iterative and experimental dynamics can be observed.

*D*usually increases with the search space dimension

*N*and . The simplifications within the modified iterative scheme B are justified by comparing its dynamics to the dynamics of scheme A (Table 2). In Figure 4 the iteratively generated dynamics are displayed for , , and . Both systems are iterated beginning with , and . The dynamics of the systems A and B show small deviations considering search space dimension . With a higher-dimension , the agreement of both iterative dynamics increases significantly. Both systems of evolution equations show the same long-term behavior and agree for sufficiently high-dimension

*N*. Making use of the evolution equations B in Table 3 is reasonable because it allows for a tractable investigation of the CSA-ES. Thus it is the basis for the derivations in the following sections. The compliance of the predictions from the iterative scheme B with experimental runs of the ES is illustrated in Figure 5. The good approximation quality justifies the modeling approach by the use of iterative scheme A or B, respectively. That is, for small-dimension problems the use of iterative scheme A will provide more precise predictions of the ES dynamics. Considering higher-dimension spaces, both systems provide a reliable modeling of the ES behavior.

In Figure 5 two phases in the dynamics of the -CSA-ES can be observed. After the start of the optimization the ES dynamics enter a transient phase. This is followed by approaching a steady state behavior. The transient period is characterized by a decrease of the curves and the values. The rate of this decline increases with *i*. That is, *y*_{1} decreases significantly slower than *y _{N}*. The steady state behavior is featured by a slower decrease with the same rate for all single components . In particular, the dynamics fall with a log-linear law. The steady state of the values exhibits a log-linear behavior as well, but with a different rate of decline.

## 7 Steady State Dynamics

### 7.1 Derivation of Simplified Evolution Equations via Self-Adaptation Response of CSA

*g*). Applying the mutation strength normalization (Eq. (3)), we obtain and the steady state scalar product becomes Now that we have found a description of the steady state of the scalar product, the next step is concerned with the derivation of the steady state of the squared length of the search path vector . Again the use of the steady state condition combined with Eqs. (41) and (23) (see also (B.3) in Table 3) yields an asymptotically exact expression for the dynamics in the proximity of the steady state Solving this for and plugging Eq. (49) into Eq. (51) yields The renormalized version of Eq. (52) using Eq. (3), serves as an approximation for the short-term behavior of the search path. It describes the progress of the search path between two consecutive generations sufficiently well. Applying Eq. (53) to (45), recalling Eq. (50), the mutation strength dynamics can be modeled by the single evolution equation The second addend within the braces can be interpreted as the self-adaptation response function of the -CSA-ES in the steady state. Using Eq. (3), the normalized version of reads Considering Eqs. (54) and (13), the strategy’s mean value dynamics in the steady state can be described now by iterative scheme C (Table 4). A comparison of the iteratively generated dynamics from scheme C to those of scheme B is displayed in Figure 6. The good agreement of both dynamics for small-dimension as well as for high-dimension validates the use of the SAR function (55) to model the ES dynamics.

*a*. A simplification of Eq. (55) is obtained by requiring that the second addend in the denominator be sufficiently small, leading to the condition Equation (56) has to be satisfied for all , that is, particularly for the largest coefficient , yielding Requiring positive progress in direction of the optimizer, the normalized steady state mutation strength must be bounded according to (14) by . Therefore, inserting in (57) and resolving for

_{i}*c*yields the condition That is,

*c*must not be too small. Actually, for the ellipsoid models with coefficients , choices are too small. However, choices will yield asymptotically exact results. Thus, using condition (58), the squared steady state search path length (52) becomes after a short calculation neglecting the left-hand side of (56) in the denominator of Eq. (52)

Equation (59) has an interesting interpretation considering expected values. If , then the steady state length square of the evolution path is greater than the length square of a random path (being *N*). Thus, by virtue of Table 1, line 11, is increased. As a result, can increase toward , the optimal value for the sphere model. In the opposite case , a decrease of toward happens. That is, in a static case (also referred to as scale-invariant case) the control rule in Table 1, line 11, drives to its optimal sphere model value. Note, however, in the real ES algorithm and its approximation schemes, for instance, Table 5, this does not happen, since the dynamics influence the evolution and a steady state will result. Determining the real steady state is done in the remainder of this paper.

The approximation quality of system D is validated in Figure 7 for different choices of the cumulation parameter *c* and the search space dimension *N*. In Figures 7a and 7c the cumulation parameter *c* is set to in such a way that condition (58) is not satisfied (provided that the ellipsoid coefficients are ). As a consequence, one observes larger deviations between the two iterative schemes C and D. However, these deviations are generally more pronounced in the transient phase of the evolutionary process that is emphasized in the plots because of the use of a logarithmic scale for the horizontal axes. Figures 7b and 7d display a scenario where condition (58) is fulfilled (). Especially with growing dimensionality a visually good agreement of both systems of evolution equation can be observed.

*Ansatz*is used to solve the system of evolution equations C: This

*Ansatz*was introduced by Beyer and Melkozerov (2014) in order to solve the evolution equations of the -SA-ES in the asymptotic limit (). We therefore only sketch the derivations in the remainder of this section and in Section 7.2. The

*Ansatz*(61), (62) already takes the observed different slopes of and correctly into account. Making use of the correct magnitudes and , respectively, removes the time dependence within the asymptotic solutions of system C.

### 7.2 The Eigenvalue Problem

*Ansatz*(61), (62), the squared distance to the optimizer and the mutation strength in generation can be expressed by means of their states in generation

*g*: From Figure 5 it can be deduced that must be rather small. That is, can be simplified using Taylor expansion . Neglecting higher-order terms, (61), (62) transforms to Inserting Eqs. (65) and (66) into Eqs. (42) and (54), corresponding to (C.1) and (C.2) in Table 4, yields after modification Taking Eq. (63) into account, we obtain a nonlinear system of equations: where ,

*b*, and are unknown time-independent steady state quantities. Notice, Eq. (70) contains the SAR function (55), revealing the relation Comparing with Eq. (62), we see that the rate by which evolves in the steady state is given by the (negative) value of the SAR function .

_{i}**A**component-wise given as The matrix has

*N*eigenvalues and

*N*eigenvectors . Because of the conditions of the

*Ansatz*only those solutions of the eigenvalue problem with and are acceptable. The

*Ansatz*indicates that larger eigenvalues result in a faster decay of the and values. That is, in comparison to the smallest eigenvalue, the impact of the larger values will decrease with . This corresponds to the faster decay of values in the initial and transient phase of the evolution process; see, for instance, Figure 1. Therefore, as for the steady state behavior, one is interested in the smallest eigenvalue . Additionally, the corresponding eigenvector has to satisfy the condition . Considering the models , and additionally , the corresponding smallest eigenvalues resulting from Eq. (72) depending on the normalized mutation strength are shown in Figure 8. The ellipsoid model is included to better assess the transition from the sphere model to the ellipsoid model. For , as well as , the numerically obtained points exhibit a linear growth over a wide range of values. While its coefficients are closer to those of the sphere model, even the case exhibits this behavior to a certain extent. The general tendency of the eigenvalue dependence is characterized by a linear ascent and a sudden sharp drop in the vicinity of the zero progress rate value of the normalized mutation strength at (see Eq. (14)). The numerical results presented in Figure 8 are identical to the results presented by Beyer and Melkozerov (2014) in context of the self-adaptation ES.

*N*eigenvalues are identified as . Since the steady state dynamics of the ES are governed by the smallest positive eigenvalue, the linear part for that corresponds to the smallest coefficient

*a*. Writing for the smallest coefficient, the linear parts of the curves in Figure 8 can be expressed by This linear approximation of the steady state mode eigenvalue was derived by Beyer and Melkozerov (2014) and revealed good agreement with numerically obtained results for sufficiently small values of the normalized steady state mutation strength .

_{i}### 7.3 The Normalized Mutation Strength in the Steady State

*Ansatz*(61), (62) to Eq. (84), the evolution equation after resolving the parentheses reads Notice, because of the complex form of , Eq. (55), all mixed product terms of higher orders are aggregated in the error term . Provided that , this error term vanishes at least by a factor of faster than the other terms in the parentheses of (85). This error term is neglected in the next steps in order to keep the analysis manageable.

Condition (90) allows for the calculation of the normalized steady state mutation strength . Regarding both sides of Eq. (90) as functions of , we see that the curves intersect at the normalized steady state mutation strength of the ES. For the -ES on the sphere model () as well as on the ellipsoid model , graphs using search space dimension are shown in Figure 9. In each case, three different choices of the cumulation parameter are considered: , , and . The damping parameter is . The numerically computed solution of Eq. (90) is represented by the black dots. According to Eq. (88), the right-hand side of (90) is independent of the choice of the parameters *c* and *D*. On the other hand, depends on *c* and *D*; see Eq. (55). Thus, variations in *c* lead to relocations of the intersection point. From this behavior the existence of an optimal *c* value can be conjectured that tunes the ES to operate at maximal progress rate .

According to Eq. (55), an influence of *D* is displayed in Figure 10 for the case considering the sphere and the ellipsoid. The *D* values are varied while holding the cumulation parameters and , respectively, constant. The red solid line in the figures corresponds to the right-hand side of Eq. (90) together with (88). The curves, which depend on the parameter *D*, are represented by the marked blue lines. As one can see, *D* influences the slope of . That is, increasing *D* while keeping *c* constant leads to a decrease of the slope. As a consequence, the intersection point of both curves moves to the right, that is, the normalized steady state mutation strength is increased. Independent of the choice of the damping parameter *D*, all graphs intersect at the same point on the *x*-axis that is the zero of . For the sphere model this intersection point is independent of *c*. Since , one obtains for the root of (55) . In the case of the ellipsoid model with , this zero varies with the cumulation parameter *c*. It shifts to the right for smaller *c* values. However, the corresponding steady state can only be obtained from Eq. (55) by numerical root finding.

The red solid line in Figure 10 represents the right-hand side of (90), which is by virtue of (88) and (87) equal to half the steady state . The latter determines via (61) the rate at which the ES approaches the optimizer in the steady state. Since is determined by , it depends in turn on the choice of *D* and *c*. Figure 11 displays these dependencies. To this end, is multiplied with the term in order to reduce the impact of the considered ellipsoid model as well as the impact of the population sizes on the realized progress. The resulting values are then plotted versus , the cumulation time constant that influences the fading of the search path memory within the CSA-ES. The sphere model and the ellipsoid case are considered. As one can see in Figure 11, for the ellipsoid with there is almost no influence of the different damping constant *D* formulas on the progress rate toward the optimizer in the steady state. This is different than the case of the sphere model. The ellipsoid case , not displayed in this paper, lies in between these two models.

As for the sphere model (Figures 11a and 11b), seems to be a better choice of the damping parameter than the standard recommendation of Hansen and Ostermeier (2001). However, this ignores the effect of possible oscillations that were neglected by considering the asymptotic solution of the iterative schemes using the *Ansatz* (61), (62). Using small *D* values (Table 1, line 11), such as , results in large generational changes, the driving force of oscillations observed by Hansen (1998). These oscillations can lead to considerable regression of the strategy’s progress. That is why larger *D* values, such as , are recommended.

## 8 Derivation of Closed-Form Expressions for the Steady State

### 8.1 Derivation for the Sphere Model

*D*and

*c*given by Hansen (1998), and Hansen and Ostermeier (2001), do not influence the outcome of Eq. (95) as . In all these cases, one gets and , yielding This estimate for the normalized steady state mutation strength of the -CSA-ES on the sphere model was obtained by Arnold and Beyer (2004) using another approach. This result indicates that the steady state is by a factor of too large compared to the optimal sphere model value that guarantees maximal convergence rate toward the optimizer for the sphere model. As discussed in connection with Figures 11a and 11b, choosing larger

*D*can improve the situation as far as is concerned. Equation (95) can be used to tune

*D*to a certain extent to a target mutation strength (). Resolving (95) with for

*D*yields The applicability of the resulting

*D*in real CSA-ES algorithms must, however, be taken with care. As discussed in Section 7.3,

*D*values too small can result in oscillatory behavior.

### 8.2 Derivation for the Ellipsoid Model

*D*can be derived. Requiring the convergence criterion (14), , we obtain According to Figure 8, the linear approximation (77) is valid up to a certain only. Therefore, the validity of the Eqs. (102), (103), and (104) is restricted, too. Additionally, the derivation of the equations by use of (92) is only admissible assuming that condition (58) holds, which constrains the range of the cumulation parameter, This must be kept in mind when applying these formulas. Figure 12 compares the theoretical predictions of the steady state convergence rate with measurements from real -CSA-ES runs as well as with iteratively generated results by making use of scheme A. The ellipsoid models and are considered. The experimental convergence rates were obtained by running the -CSA-ES over a sufficiently long time until it reaches the steady state. Then the values of the last of generations were averaged over 100 independent runs. After that, a linear polynomial was fitted to the experimental data, yielding in Figure 12. This curve-fitting technique was also applied to the iterative values resulting from scheme A of Table 2. As one can see, there is a good agreement of the linearized theory with the real ES runs. That is why the equations obtained can be used to estimate the expected runtime of the CSA-ES.

### 8.3 Expected Runtime of the CSA-ES on Ellipsoid Models

*y*components given by Eq. (61), where the inverse time constant is determined by Eq. (77). By inserting (61) into the ellipsoid fitness model (1) the steady state fitness dynamics can be determined. Starting from generation

_{i}*g*

_{0}sufficiently large such that transient effects have vanished, the fitness value after

*g*generations is Consequently, the objective function value decreases exponentially fast with time constant . Equation (106) allows for the estimation of the expected running time

*G*in which the fitness value is improved by a factor of . Considering , from (106) one obtains and resolving this for

*G*results in . Inserting (77) finally yields That is, the expected runtime

*G*is asymptotically proportional to the quotient of the sum of the ellipsoid coefficients and the smallest coefficient . For the two ellipsoid models this means that the expected running time increases with order

*N*

^{2}for , and with

*N*

^{3}for , respectively. The same result was obtained for the