Abstract

The behavior of the -Evolution Strategy (ES) with cumulative step size adaptation (CSA) on the ellipsoid model is investigated using dynamic systems analysis. At first a nonlinear system of difference equations is derived that describes the mean value evolution of the ES. This system is successively simplified to finally allow for deriving closed-form solutions of the steady state behavior in the asymptotic limit case of large search space dimensions. It is shown that the system exhibits linear convergence order. The steady state mutation strength is calculated, and it is shown that compared to standard settings in self-adaptive ESs, the CSA control rule allows for an approximately -fold larger mutation strength. This explains the superior performance of the CSA in non-noisy environments. The results are used to derive a formula for the expected running time. Conclusions regarding the choice of the cumulation parameter c and the damping constant D are drawn.

1  Introduction

The performance of evolution strategies (ESs) depends crucially on the optimal control of the mutation strength , which determines the step length of the search steps used to generate offspring from parents. There are basically four established methods to learn/control the mutation strength: Rechenberg’s 1/5-rule (Rechenberg, 1973), self-adaptation (SA) (Rechenberg, 1973; Schwefel, 1977), meta-ES (Herdy, 1992; Rechenberg, 1994), and cumulative step size adaptation (CSA) (Ostermeier et al., 1994). Understanding and analyzing the working principles of these adaptation techniques by considering the ES in conjunction with the objective functions to be optimized allows for a well-grounded choice of strategy-specific parameters, such as the learning parameter and the damping constant. The analysis approach, which has been the most fruitful one until now, considers the ES + objective function as a dynamic system (Meyer-Nieberg and Beyer, 2012). That is, it is the goal of the analysis to determine the time evolution of the system. However, since ESs are probabilistic algorithms, this analysis concerns the dynamics of stochastic, most often nonlinear, systems. Because of the difficulties of such an analysis, progress in this field has been rather slow. The first fully analyzed algorithm concerned the -SA-ES on the sphere (Beyer, 1996). In the following years progress was made in different directions concerning more complex, that is, recombinative ES and also more complex objective functions such as ridge functions and a subset of positive definite quadratic forms (PDQFs); see, for instance, Arnold and Beyer (2004); Jägersküpper (2006); Beyer and Meyer-Nieberg (2006); Arnold (2007); and Beyer and Finck (2010).

The most advanced dynamic systems analysis approach on ESs was presented by Beyer and Melkozerov (2014), who investigated the -SA-ES on the ellipsoid model. In that paper a new progress measure was introduced to model the dynamics of the quadratic distances of the parental state to the optimizer: the quadratic progress rate. While that work completed the analysis of the isotropic self-adaptive standard ES, a similar analysis of CSA control has not advanced that much. The fitness models considered so far concerned the cigar function (Arnold, 2007; Arnold and Beyer, 2010) and another special case of PDQFs consisting of a mixture of two sphere models (Beyer and Finck, 2010). These analyses were performed along the line developed in Arnold (2002). Because of the inherent symmetries of those fitness models (cigar and mixture of two spheres) the dynamics in the search space can be lumped together, thus reducing the dynamics to a few state variables describing the approach to the optimizer. This aggregation not only concerns the parental state in the search space but also the evolution path cumulation that is used to measure the average length of the actually realized change steps between two consecutive parental states, which are used to control the mutation strength . As a result, the analysis presented by Beyer and Melkozerov (2014) for the -SA-ES cannot be directly transferred to the corresponding CSA-ES. It is the aim of this paper to extend the analysis method developed by Beyer and Melkozerov (2014) to handle the path cumulation in CSA-ES. To this end, we deviate from the standard analysis developed by Arnold (2002) and fall back to the analysis method that has been successful in the analysis of SA. That is, we derive a self-adaptation response (SAR) function for the cumulative step size adaptation. This approach allows for an analysis of the CSA-ES similar to that of the -SA-ES in Beyer and Melkozerov (2014).

The derivation of a SAR function for CSA is challenging, since a SAR function for CSA in the proper meaning of the word does not exist. The original SAR function (Beyer, 2001) is a one-generation progress measure that describes the expected relative change from generation g to . Path cumulation is, however, a process that is nonlocal in time, that is, it is the result of the cumulation process that incorporates a fading record of parental steps taken. As a result, a SAR function for CSA must necessarily be a quantity that depends not only on the damping constant D (which corresponds to the learning parameter in the SA) but also on the cumulation time constant .

The analysis to be presented can be regarded as an important preparatory step for an analysis of the CMA-ES (Hansen and Ostermeier, 2001), since it provides a system of difference equations that describes the evolution for ellipsoid models. Since CMA-ES transforms quadratic models via the square root of the covariance matrix in another quadratic model, this analysis also holds for these transformed models. That is, the analysis presented is a building block for a complete analysis of the CMA-ES.

The paper is organized as follows. After a short recap of the ideas of CSA and the ellipsoid model, results of the analysis of the self-adaptive ES are reviewed as the basis for the derivations presented in subsequent sections. Next, the CSA relevant path cumulation is analyzed in Section 5, yielding a system of recurrence equations that are combined in Section 6 to form a system of evolution equations describing the mean value dynamics of the -CSA-ES. In a next step, simplifications are introduced in order to obtain a system of tractable evolution equations that allows for the calculation of a function describing the expected generational change in the steady state, similar to the self-adaptation response function used in the analysis of the SA-ES. The resulting system of evolution equations is then treated by an Ansatz similar to the one used by Beyer and Melkozerov (2014). The steady state of the normalized mutation strength dynamics is considered, and the influence of the strategy parameters c and D are discussed. In Section 8 the steady state mutation strength and the convergence rates are calculated in terms of closed-form expressions. The results are used to estimate the expected running time. Conclusions are drawn in Section 9.

2  The -CSA-ES Algorithm

The focus of this paper is on the dynamic behavior of the -CSA-ES. As indicated by the notation, nonelitist selection (“,” selection) rather than elitist selection (“+” selection) is considered. That is, in each generation only the offspring population is involved in the selection process. The comma selection is advantageous in real-valued parameter optimization because it allows for the use of greater mutation strengths, which is, for example, beneficial when optimizing multimodal objective functions. The -CSA-ES controls the mutation strength, also referred to as step size, within the algorithm by cumulative step size adaptation (see Ostermeier et al., 1994). CSA gathers previously successful search steps in a fading search path history. The mutation strength is then adjusted depending on the length of this search path. Another option to adapt the mutation strength of the ES is self-adaptation. In contrast to CSA, it provides each offspring with an individual mutation strength computed from the recombined mutation strengths of the best offspring of the previous generation. The SA was investigated by Beyer and Melkozerov (2014).

The pseudocode of the -CSA-ES is presented in Table 1. First the initial parental parameter vector, also referred to as the parental centroid, the initial mutation strength , and the initial search path are specified. Then offspring are generated (lines 3–5) by adding the product of the mutation strength and an N-dimensional random mutation vector to the parental centroid. The components of each are independent and identically distributed standard normal variates. The corresponding fitness function value Fl of each offspring is calculated in line 6. In line 8 the mutation vectors of the best offspring with respect to fitness are recombined to create their centroid . As indicated by the index I in , this centroid is generated using intermediate recombination. In this context the subscript denotes the mth best of the offspring (i.e., in case of minimization the offspring with the mth smallest fitness value). The centroid of the best mutation vectors is used in line 9 to compose a new parental centroid and in line 10 to update the search path . This search path contains a fading record of the strategy’s previous steps. The length of its memory is determined by the choice of the constant parameter , referred to as the cumulation parameter. The mutation strength is then updated in line 11 by multiplication with an exponential value depending on the length of the search path as well as on the damping parameterD. is a constant parameter that determines the magnitude of the mutation strength updates. The sign of determines whether the mutation strength is increased or decreased. Long search paths indicate that the steps made by the ES collectively point in one direction and could be replaced with fewer but longer steps. Short search paths suggest that the strategy steps back and forth and thus that smaller step sizes should be beneficial. After termination the strategy returns the current parental centroid, which is considered an approximation of the optimizer of the objective function .

Table 1:
Pseudocode of the -CSA-ES.
Initialize()Line
Repeat 
   For To  
      
      
      
   End For 
   
   
   10 
   11 
   12 
Until (termination condition) 13 
Initialize()Line
Repeat 
   For To  
      
      
      
   End For 
   
   
   10 
   11 
   12 
Until (termination condition) 13 

3  The Ellipsoid Model

This paper is concerned with the analysis of the -CSA-ES on the ellipsoid model
formula
1
where N represents the search space dimension, and ai are the coefficients of the ellipsoid model. Throughout the investigations the coefficients ai are chosen exemplarily as 1, i, and i2. The optimization problem is formulated as
formula
2
Its optimizer resides at the origin of the coordinate system, . Model (1) represents the general case of positive definite quadratic forms for the -CSA-ES. This is owing to the isotropy of the mutation vectors in Table 1, line 5, which secure the algorithm’s invariance with respect to arbitrary rotations of the coordinate system.

The dynamic behavior of the -CSA-ES on ellipsoid model (1) is illustrated in Figure 1. It presents the results of typical runs of the ES, focusing on the squared components of the parental centroid as well as on the mutation strength dynamics. Approaching the optimizer, the strategy continuously decreases the mutation strength with the passing number of generations.

Figure 1:

Dynamics of the -CSA-ES on the ellipsoid model with and . The quadratic deviation from the optimizer is displayed for . The blue dashed line represents the mutation strength . The CSA-specific parameters are and . All curves are averaged over 10,000 independent runs.

Figure 1:

Dynamics of the -CSA-ES on the ellipsoid model with and . The quadratic deviation from the optimizer is displayed for . The blue dashed line represents the mutation strength . The CSA-specific parameters are and . All curves are averaged over 10,000 independent runs.

4  Extending Previous Results to CSA-ES

This section deals with several results obtained within the analysis of the SA-ES on the ellipsoid model in Melkozerov and Beyer (2010) and Beyer and Melkozerov (2014). These results keep their validity in the context of the CSA-ES and provide useful equipment for the further analysis. For reasons of clarity and comprehensibility we start with the abbreviation and introduce the mutation strength normalization
formula
3
Next, the first-order progress rate is introduced.
Definition 1:
The progress rate of the -ES along the ith axis of ellipsoid model (1) is the expected change of the parental parameter vector component yi from generation g to generation :
formula
4
Using the progress rate normalization (Melkozerov and Beyer, 2010)
formula
5
the normalized progress rate reads
formula
6
The term denotes the progress coefficient of the -ES. It is a special case of the generalized progress coefficients (Beyer, 2001)
formula
7
where is the cumulative distribution function of the standard normal variate. Using the first-order progress rate, one obtains good component-wise predictions of the expected approach of the parental centroid toward the optimizer as long as the distance to the optimizer is sufficiently large compared to the respective progress rate values. Otherwise the perturbations of the evolutionary process superimpose the mean value dynamics, Eq. (4). That is, the predictive quality of the first-order progress rate decreases when approaching the optimizer. As a consequence, the more stable quadratic progress measure was conceived.
Definition 2:
The quadratic progress rate of the -ES along the ith axis of ellipsoid model (1) is the expected change of squared component of the parental centroid between two consecutive generations g and :
formula
8
It was first introduced by Beyer and Melkozerov (2014) and reads
formula
9
Normalizing with Eq. (5) and neglecting the terms in the denominators yields
formula
10
Neglecting the terms is admissible if holds, as demonstrated by Beyer and Melkozerov (2014). The quadratic progress rate formula depends on the first-order progress rate (see Eq. (6)), as well as on a negative term that corresponds to the progress rate loss.
Because expression (10) is still too complex for theoretical analysis, a simplified formula was derived by Beyer and Melkozerov (2014). Under the assumption that the search space dimension is considerably greater than the parental population size, and if there exists no dominating coefficient ai, that is, if the conditions
formula
11
are fulfilled, the normalized progress rate is asymptotically
formula
12
Renormalization can be obtained by applying Eqs. (3) and (5) and yields
formula
13
As shown by Beyer and Melkozerov (2014), the quadratic progress rate can be used to describe the expected approach to the optimizer for each squared component of the parental centroid. It exhibits the typical features of a well-defined progress measure, namely, a gain and a loss term, which depend on the mutation strength. This ensures the existence of an optimal mutation strength and of an evolution criterion (also referred to as a convergence criterion). Additionally, the loss term is inversely proportional to the parental population size . That is, the genetic repair effect of recombination can be observed on the ellipsoid model (Beyer and Melkozerov, 2014). The renormalized quadratic progress rate (Eq. (13)) will be useful for the derivation of the evolution equations of the CSA-ES in the next section.
Convergence to the optimizer in expectation requires according to Eq. (8) . Therefore, must hold for convergence in expectation. By use of the asymptotic exact () progress rate formula (Eq. (12)), this inequality yields directly the convergence criterion (Beyer and Melkozerov, 2014)
formula
14
Hence, continuous positive progress to the optimizer requires that the normalized mutation strength reside in the interval .

5  The Mutation Strength Dynamics

In order to be able to build the evolution equations of the CSA-ES on ellipsoid model (1), the description of the mutation strength adaptation from generation g to generation is necessary. As it turns out, unlike the -SA-ES, the CSA-ES lacks a simple formula for the expected change of the mutation strength . Because of the influence of the search path update on the mutation strength evolution, the description of the mutation strength dynamics is more complex. In the first place, we deal with this problem by describing the expected change of by use of a system of recurrence equations. The first difference equation concerns the mutation strength control. According to Table 1, line 11, the expected mutation strength in generation is
formula
15
In the asymptotic limit this can be expressed as
formula
16
Obtaining a recurrence equation requires the knowledge of the squared length of search path in generation . From Table 1, line 10, it follows that
formula
17
For the calculation of the expected change of the search path’s squared norm one needs to determine the expected values of the scalar product and the squared norm of the parental centroid’s mutation vector in generation g. For the latter it holds:
formula
18
Considering the expected value of the squared single components of the mutation vector results in
formula
19
These sums of product moments were derived by Beyer and Melkozerov (2014). Inserting the corresponding expressions into Eq. (19) and combining the fractions yields
formula
20
Finally, the aggregation of all N components of the mutation vector in generation yields
formula
21
Expressions (20) and (21) can be simplified considerably if the conditions (11) hold. That is, provided that there exist no dominating ellipsoid coefficients ai and that the search space dimension is large compared to the parental population size , the term in Eq. (20) can be neglected. The validity of this approach also requires that the dynamics behave “nicely.” Depending on the initialization of the strategy this condition might be violated in the beginning, but it is always fulfilled in the asymptotic limit . The consistency of the approach was checked by Beyer and Melkozerov (2014) by reinserting the final results into the quadratic progress rate. Consequently, Eq. (20) becomes
formula
22
Similarly, Eq. (21) becomes asymptotically
formula
23
The suitability of these asymptotically correct formulas will additionally be justified by comparisons of the two different iteratively generated dynamics (using Eq. (21) or Eq. (23), respectively) with the dynamics of real -CSA-ES runs (see Section 6).
The next step is concerned with the calculation of the expected value of the scalar product . Coming up with a closed-form solution turns out to be very difficult. Nonetheless, the change of the expected scalar product value between two consecutive generations can be formulated as a recurrence equation. In order to establish the difference equation the expected values of and need to be calculated. By use of the renormalized version of the first-order progress rate (Eq. (4)) we obtain
formula
24
or, respectively,
formula
25
Adding a term , which describes the stochastic fluctuations to the expected value , the ith component of the mutation vector of the parental centroid in generation g can be modeled as
formula
26
According to Eq. (25), the expected value of the ith component in generation is
formula
27
For further analysis it is desirable to represent the right-hand side of Eq. (27) by means of the ES state in the previous generation g. This is achieved by taking into account Table 1, line 9, and the definition of the quadratic progress rate in Eq. (8):
formula
28
Provided that the quotient
formula
29
is sufficiently small, Taylor expansion of the square root with respect to transforms Eq. (28) into
formula
30
Figure 2 provides a verification of the assumption by plotting the experimentally generated mean values of the quotient against the search space dimension N. Consequently, the small contribution of the term to the expected value in Eq. (30) is ignored in the following in order to keep the theoretical analysis tractable. Thus we obtain
formula
31
Analogously to Eq. (26), the ith component of the mutation vector of the parental centroid in generation can be described by the sum of its expected value and a term denoting the stochastic fluctuations:
formula
32
Now Eqs. (25) and (32) are used to derive the difference equation for the scalar product. According to Table 1, line 10, the search path components are updated by
formula
33
After multiplication with , Eq. (33) changes to
formula
34
and by insertion of Eqs. (31) and (32), the ith addend of the scalar product in generation reads
formula
35
Expansion of the products yields
formula
36
By rearranging the terms and making use of Eqs. (25) and (26) we obtain
formula
37
Considering expected values in Eq. (37), the perturbation terms and vanish by definition, leading to
formula
38
In order to justify the approximation of (37) by expected value expressions, the iterative dynamics resulting from Eq. (38) are compared to experimental runs of the ES (see Section 6). Now considering Eq. (25), we obtain
formula
39
With the use of (22), Eq. (39) transforms into
formula
40
Notice, the scalar product of the search path and the centroid of the best mutation vectors in generation g is . That is, Eqs. (39) and (40), respectively, provide a component-wise representation of the difference equation that describes the one-generation change of the scalar product. The results in Eqs. (20) and (39) allow for the compilation of a difference equation, which describes the expected change of the squared length of the search path; see also Eq. (17):
formula
41
Together with Eq. (16) the mutation strength change from generation g to generation can be described by means of the difference equations (39) and (41) taking Eq. (21) into account.
Figure 2:

Steady state mean values of plotted against the search space dimension N. Each data point is obtained by averaging over 100,000 generations and independent runs of a -CSA-ES.

Figure 2:

Steady state mean values of plotted against the search space dimension N. Each data point is obtained by averaging over 100,000 generations and independent runs of a -CSA-ES.

6  Evolution Equations

In the case of the SA-ES on the general quadratic fitness model, the framework of the dynamic systems approach (Beyer, 2001) can be applied. It states that the stochastic process of the ES from generation g to generation can be divided into mean value parts and fluctuation terms ( and ) as follows (Beyer and Melkozerov, 2014):
formula
42
formula
43
The mean value parts in Eq. (42) are directly given by the quadratic progress rates (9) and (13), respectively. The self-adaptation response function describes the mean value dynamics of Eq. (43). Unfortunately, in the context of CSA-ES the derivation of a closed SAR formula appears to be a hard task even in the steady state case. That is, in the first step it has to be substituted by the difference equations (16), (39), and (41) before an asymptotical approximation of the SAR function is derived (see Section 7.1). In order to keep the analysis tractable, the fluctuation terms are disregarded, and asymptotically correct simplifications are derived and compared with experiments.

A first representation of the strategy’s evolution behavior is provided in Table 2, iterative scheme A. The one-generation behavior of the component-wise squared distance to the optimizer is modeled in (A.1) by use of Eq. (9). Using Eq. (20) to express the expected values in Eq. (39) yields the iterative relation (A.2) for the scalar product components . The iterative description of the squared length of the search path in (A.3) is obtained by inserting Eq. (21) into Eq. (41). Finally, the mutation strength adaptation is specified in (A.4) using Eq. (16).

Table 2:
Iterative scheme A summarizes the evolution equations of the -CSA-ES using Eqs. (9), (39), (41), and (16).
Iterative Scheme of Evolution Equations A
formula
A.1
 
formula
A.2
 
formula
A.3
 
formula
A.4
 
Iterative Scheme of Evolution Equations A
formula
A.1
 
formula
A.2
 
formula
A.3
 
formula
A.4
 

Whether the modeling approach yields meaningful results can be checked by comparing the iteratively generated dynamics of the system of evolution equations A in Table 2 with experimental results of real -CSA-ES runs (see Figure 3). The typical long-term behavior of the ES is observed for , which is obtained by iteration of scheme A starting from , . Cumulation and damping parameters have been set to and . Considering a small-dimension N, the experimental data of the ES slightly deviate from the theoretical predictions. These deviations diminish with increasing search space dimension. In fact, on both sides a good agreement of iterative and experimental dynamics can be observed.

Figure 3:

Dynamics generated by scheme A, Table 2, compared to real runs of the -CSA-ES with , , and . The results of the evolution equations in A for the dynamics as well as the dynamics are depicted by solid lines. The discrete data points display the corresponding experimental results, which are averaged over 10,000 independent runs of the ES.

Figure 3:

Dynamics generated by scheme A, Table 2, compared to real runs of the -CSA-ES with , , and . The results of the evolution equations in A for the dynamics as well as the dynamics are depicted by solid lines. The discrete data points display the corresponding experimental results, which are averaged over 10,000 independent runs of the ES.

However, with regard to further theoretical investigations we are interested in a more convenient system of evolution equations. To this end, different approximative schemes are developed. In order to obtain iterative scheme B (Table 3), previously obtained simplifications are applied as follows. The progress rate representation (13) is utilized in (B.1) to express the dynamics. Instead of using Eq. (39), we obtain the dynamics of the scalar product components in (B.2) by using recurrence equation (40). Inserting (23) into Eq. (41) transforms (A.3) into (B.3). Moreover, using the first two terms of the Taylor expansion of the exponential function
formula
44
allows for approximating the mutation strength dynamics by
formula
45
see also (B.4). Notice, higher-order terms in Eq. (44) can be neglected assuming . This is true for standard CSA implementations in the asymptotic limit case, since D usually increases with the search space dimension N and . The simplifications within the modified iterative scheme B are justified by comparing its dynamics to the dynamics of scheme A (Table 2). In Figure 4 the iteratively generated dynamics are displayed for , , and . Both systems are iterated beginning with , and . The dynamics of the systems A and B show small deviations considering search space dimension . With a higher-dimension , the agreement of both iterative dynamics increases significantly. Both systems of evolution equations show the same long-term behavior and agree for sufficiently high-dimension N. Making use of the evolution equations B in Table 3 is reasonable because it allows for a tractable investigation of the CSA-ES. Thus it is the basis for the derivations in the following sections. The compliance of the predictions from the iterative scheme B with experimental runs of the ES is illustrated in Figure 5. The good approximation quality justifies the modeling approach by the use of iterative scheme A or B, respectively. That is, for small-dimension problems the use of iterative scheme A will provide more precise predictions of the ES dynamics. Considering higher-dimension spaces, both systems provide a reliable modeling of the ES behavior.
Table 3:
Iterative scheme B is obtained by simplification of iterative scheme A using Eqs. (13), (40), (41) with (23), and (45).
Iterative Scheme of Evolution Equations B
formula
B.1
 
formula
B.2
 
formula
B.3
 
formula
B.4
 
Iterative Scheme of Evolution Equations B
formula
B.1
 
formula
B.2
 
formula
B.3
 
formula
B.4
 
Figure 4:

Comparison of the dynamics of the iterative schemes A and B simulating a -CSA-ES with , , and . The iteratively generated dynamics (for ) as well as the iterative dynamics are displayed. The data points illustrate the results of the evolution equations from A; the results of B are represented by the solid lines. The CSA-specific parameters are and .

Figure 4:

Comparison of the dynamics of the iterative schemes A and B simulating a -CSA-ES with , , and . The iteratively generated dynamics (for ) as well as the iterative dynamics are displayed. The data points illustrate the results of the evolution equations from A; the results of B are represented by the solid lines. The CSA-specific parameters are and .

In Figure 5 two phases in the dynamics of the -CSA-ES can be observed. After the start of the optimization the ES dynamics enter a transient phase. This is followed by approaching a steady state behavior. The transient period is characterized by a decrease of the curves and the values. The rate of this decline increases with i. That is, y1 decreases significantly slower than yN. The steady state behavior is featured by a slower decrease with the same rate for all single components . In particular, the dynamics fall with a log-linear law. The steady state of the values exhibits a log-linear behavior as well, but with a different rate of decline.

Figure 5:

Iteratively generated dynamics of scheme B, Table 3, compared to real runs of the -CSA-ES with , , and . The results of the evolution equations in B for the dynamics as well as the dynamics are illustrated by solid lines. The discrete data points display the corresponding experimental results, which are averaged over 10,000 independent runs of the ES.

Figure 5:

Iteratively generated dynamics of scheme B, Table 3, compared to real runs of the -CSA-ES with , , and . The results of the evolution equations in B for the dynamics as well as the dynamics are illustrated by solid lines. The discrete data points display the corresponding experimental results, which are averaged over 10,000 independent runs of the ES.

7  Steady State Dynamics

7.1  Derivation of Simplified Evolution Equations via Self-Adaptation Response of CSA

After having validated the modeling of the system of evolution equations B (Table 3), finding a closed-form solution to the system of nonlinear difference equations still appears to be a difficult task. The goal is to substitute the equations that model the dynamics, that is, (B.2)–(B.4), by one single evolution equation. As it turns out, considering the steady state dynamics allows for the derivation of a suitable approximation. That is, the search path dynamics within the strategy’s steady state can be used to provide an asymptotically exact approximation of a function analogous to the self-adaptation response function of self-adaptation ES (Beyer and Melkozerov, 2014). To this end, the steady state dynamics of the difference equations (B.2) and (B.3) are investigated. In the first step the steady state dynamics of the scalar product components are examined. Then the focus is on the resulting steady state dynamics of the complete search path. Using the steady state condition for the scalar product components
formula
46
and solving (B.2) for yields
formula
47
Here, denotes the steady state mutation strength that is attained by the ES after the transient phase (i.e., after a sufficiently long number of generations g). Applying the mutation strength normalization (Eq. (3)), we obtain
formula
48
and the steady state scalar product becomes
formula
49
Now that we have found a description of the steady state of the scalar product, the next step is concerned with the derivation of the steady state of the squared length of the search path vector . Again the use of the steady state condition
formula
50
combined with Eqs. (41) and (23) (see also (B.3) in Table 3) yields an asymptotically exact expression for the dynamics in the proximity of the steady state
formula
51
Solving this for and plugging Eq. (49) into Eq. (51) yields
formula
52
The renormalized version of Eq. (52) using Eq. (3),
formula
53
serves as an approximation for the short-term behavior of the search path. It describes the progress of the search path between two consecutive generations sufficiently well. Applying Eq. (53) to (45), recalling Eq. (50), the mutation strength dynamics can be modeled by the single evolution equation
formula
54
The second addend within the braces can be interpreted as the self-adaptation response function of the -CSA-ES in the steady state. Using Eq. (3), the normalized version of reads
formula
55
Considering Eqs. (54) and (13), the strategy’s mean value dynamics in the steady state can be described now by iterative scheme C (Table 4). A comparison of the iteratively generated dynamics from scheme C to those of scheme B is displayed in Figure 6. The good agreement of both dynamics for small-dimension as well as for high-dimension validates the use of the SAR function (55) to model the ES dynamics.
Table 4:
Iterative scheme C is composed of the difference equations describing the dynamics (see Table 3, (B.1)) as well as the dynamic, which is now modeled by use of Eq. (54).
Iterative Scheme of Evolution Equations C
formula
C.1
 
formula
C.2
 
Iterative Scheme of Evolution Equations C
formula
C.1
 
formula
C.2
 
Figure 6:

Comparison of the iterative systems B and C simulating a -CSA-ES. The dynamics of the component-wise squared distances to the optimizer yi resulting from system B are illustrated by solid lines. The results of scheme C are indicated by the discrete data points.

Figure 6:

Comparison of the iterative systems B and C simulating a -CSA-ES. The dynamics of the component-wise squared distances to the optimizer yi resulting from system B are illustrated by solid lines. The results of scheme C are indicated by the discrete data points.

The second equation in iterative scheme C (Table 4) is still too complicated to allow for the calculation of closed-form expressions of the steady state variables. The critical part in (C.2) is the sum in the SAR function (55), the denominator of which depends on ai. A simplification of Eq. (55) is obtained by requiring that the second addend in the denominator be sufficiently small, leading to the condition
formula
56
Equation (56) has to be satisfied for all , that is, particularly for the largest coefficient , yielding
formula
57
Requiring positive progress in direction of the optimizer, the normalized steady state mutation strength must be bounded according to (14) by . Therefore, inserting in (57) and resolving for c yields the condition
formula
58
That is, c must not be too small. Actually, for the ellipsoid models with coefficients , choices are too small. However, choices will yield asymptotically exact results. Thus, using condition (58), the squared steady state search path length (52) becomes after a short calculation neglecting the left-hand side of (56) in the denominator of Eq. (52)
formula
59

Equation (59) has an interesting interpretation considering expected values. If , then the steady state length square of the evolution path is greater than the length square of a random path (being N). Thus, by virtue of Table 1, line 11, is increased. As a result, can increase toward , the optimal value for the sphere model. In the opposite case , a decrease of toward happens. That is, in a static case (also referred to as scale-invariant case) the control rule in Table 1, line 11, drives to its optimal sphere model value. Note, however, in the real ES algorithm and its approximation schemes, for instance, Table 5, this does not happen, since the dynamics influence the evolution and a steady state will result. Determining the real steady state is done in the remainder of this paper.

Table 5:
Iterative scheme D is composed of the difference equations (see Table 4 (C.1)), as well as the dynamic, which is modeled using the simplified SAR function in Eq. (60) under the assumption that condition (58) holds.
Iterative Scheme of Evolution Equations D
formula
D.1
 
formula
D.2
 
Iterative Scheme of Evolution Equations D
formula
D.1
 
formula
D.2
 
Provided that (58) holds, also the SAR function (55) can be simplified yielding after renormalization with Eq. (3):
formula
60
In turn, the evolution equation (54) simplifies as well. Thus, (C.2) in Table 4 changes to (D.2) in the new scheme D in Table 5.

The approximation quality of system D is validated in Figure 7 for different choices of the cumulation parameter c and the search space dimension N. In Figures 7a and 7c the cumulation parameter c is set to in such a way that condition (58) is not satisfied (provided that the ellipsoid coefficients are ). As a consequence, one observes larger deviations between the two iterative schemes C and D. However, these deviations are generally more pronounced in the transient phase of the evolutionary process that is emphasized in the plots because of the use of a logarithmic scale for the horizontal axes. Figures 7b and 7d display a scenario where condition (58) is fulfilled (). Especially with growing dimensionality a visually good agreement of both systems of evolution equation can be observed.

Figure 7:

Comparison of the dynamics simulating a -CSA-ES by use of the iterative schemes C and D for the ellipsoid model . The cumulation parameter is set to in graphs (a) and (c), and to in graphs (b) and (d). Graphs (a) and (b) show the results for search space dimension , and graphs (c) and (d), for N = 400.

Figure 7:

Comparison of the dynamics simulating a -CSA-ES by use of the iterative schemes C and D for the ellipsoid model . The cumulation parameter is set to in graphs (a) and (c), and to in graphs (b) and (d). Graphs (a) and (b) show the results for search space dimension , and graphs (c) and (d), for N = 400.

Now that we have qualitatively validated the modeling approach, the deduction of closed-form solutions to system C can be performed according to the approach introduced by Beyer and Melkozerov (2014). Taking a look at Figure 6, one observes that after transition periods of different length all curves approach a log-linear behavior with the same slopes. The same log-linear trend holds true for the dynamics, but it exhibits a different slope. This observation suggests that system C asymptotically reaches a linear systems behavior. Therefore, modeling closed-form solutions to the system by exponential functions is reasonable. Thus, in the proximity of the steady state the following Ansatz is used to solve the system of evolution equations C:
formula
61
formula
62
This Ansatz was introduced by Beyer and Melkozerov (2014) in order to solve the evolution equations of the -SA-ES in the asymptotic limit (). We therefore only sketch the derivations in the remainder of this section and in Section 7.2. The Ansatz (61), (62) already takes the observed different slopes of and correctly into account. Making use of the correct magnitudes and , respectively, removes the time dependence within the asymptotic solutions of system C.
As a consequence, by inserting (61) and (62) into the normalization formula (3), one obtains the constant normalized steady state mutation strength:
formula
63
In the next step, system C is solved with the help of the Ansatz (61), (62). To that point, an eigenvalue problem is established.

7.2  The Eigenvalue Problem

In order to keep this work self-contained, the eigenvalue problem arising from (61), (62) is considered in detail. This is done analogously to (Beyer and Melkozerov, 2014). Making use of the Ansatz (61), (62), the squared distance to the optimizer and the mutation strength in generation can be expressed by means of their states in generation g:
formula
64
From Figure 5 it can be deduced that must be rather small. That is, can be simplified using Taylor expansion . Neglecting higher-order terms, (61), (62) transforms to
formula
65
formula
66
Inserting Eqs. (65) and (66) into Eqs. (42) and (54), corresponding to (C.1) and (C.2) in Table 4, yields after modification
formula
67
formula
68
Taking Eq. (63) into account, we obtain a nonlinear system of equations:
formula
69
formula
70
where , bi, and are unknown time-independent steady state quantities. Notice, Eq. (70) contains the SAR function (55), revealing the relation
formula
71
Comparing with Eq. (62), we see that the rate by which evolves in the steady state is given by the (negative) value of the SAR function .
Equation (69) can be rewritten in matrix form, resulting in an eigenvalue problem
formula
72
with eigenvector , steady state mutation strength , and an matrix A component-wise given as
formula
73
formula
74
The matrix has N eigenvalues and N eigenvectors . Because of the conditions of the Ansatz only those solutions of the eigenvalue problem with and are acceptable. The Ansatz indicates that larger eigenvalues result in a faster decay of the and values. That is, in comparison to the smallest eigenvalue, the impact of the larger values will decrease with . This corresponds to the faster decay of values in the initial and transient phase of the evolution process; see, for instance, Figure 1. Therefore, as for the steady state behavior, one is interested in the smallest eigenvalue . Additionally, the corresponding eigenvector has to satisfy the condition . Considering the models , and additionally , the corresponding smallest eigenvalues resulting from Eq. (72) depending on the normalized mutation strength are shown in Figure 8. The ellipsoid model is included to better assess the transition from the sphere model to the ellipsoid model. For , as well as , the numerically obtained points exhibit a linear growth over a wide range of values. While its coefficients are closer to those of the sphere model, even the case exhibits this behavior to a certain extent. The general tendency of the eigenvalue dependence is characterized by a linear ascent and a sudden sharp drop in the vicinity of the zero progress rate value of the normalized mutation strength at (see Eq. (14)). The numerical results presented in Figure 8 are identical to the results presented by Beyer and Melkozerov (2014) in context of the self-adaptation ES.
Figure 8:

Dependence of the numerically calculated smallest eigenvalues (multiplied by ) on the steady state mutation strength . The results of the -CSA-ES are presented for the ellipsoid model with coefficients . The search space dimensions are (left) and (right).

Figure 8:

Dependence of the numerically calculated smallest eigenvalues (multiplied by ) on the steady state mutation strength . The results of the -CSA-ES are presented for the ellipsoid model with coefficients . The search space dimensions are (left) and (right).

Because of the linear behavior of the numerically obtained data points for sufficiently small values of it is possible to derive an analytical expression that describes the early growth behavior of the eigenvalues. Therefore, the quadratic terms within the eigenvalue problem (72) are neglected. This transforms Eqs. (72) and (73) into a diagonalized problem:
formula
75
Consequently, the N eigenvalues are identified as . Since the steady state dynamics of the ES are governed by the smallest positive eigenvalue, the linear part for that corresponds to the smallest coefficient ai. Writing
formula
76
for the smallest coefficient, the linear parts of the curves in Figure 8 can be expressed by
formula
77
This linear approximation of the steady state mode eigenvalue was derived by Beyer and Melkozerov (2014) and revealed good agreement with numerically obtained results for sufficiently small values of the normalized steady state mutation strength .

7.3  The Normalized Mutation Strength in the Steady State

This section aims at the calculation of the normalized steady state mutation strength . To this end, a difference equation describing the change of the normalized mutation strength between two consecutive generations of the ES is derived. Starting point is the iterative scheme C of evolution equations. Considering (C.2) and the respective Eqs. (54) and (55), and proceeding to normalized quantities, Eq. (3), leads to
formula
78
This equation can be rearranged to
formula
79
Concentrating on the quotient of square roots taking Eq. (42) into account (neglecting fluctuations) yields
formula
80
On the understanding that the term is sufficiently small (see also Eq. (29)), Eq. (80) is approximated by
formula
81
Making use of Eq. (12) and the normalization (5), the fraction on the right-hand side of (81) becomes
formula
82
and in conclusion
formula
83
Thus, the evolution equation (79) of the normalized mutation strength becomes asymptotically
formula
84
Applying the Ansatz (61), (62) to Eq. (84), the evolution equation after resolving the parentheses reads
formula
85
Notice, because of the complex form of , Eq. (55), all mixed product terms of higher orders are aggregated in the error term . Provided that , this error term vanishes at least by a factor of faster than the other terms in the parentheses of (85). This error term is neglected in the next steps in order to keep the analysis manageable.
By demanding , Eq. (85) can now be used to determine the steady state . This implies that the expression in the large parentheses becomes 1. Assuming and taking into account Eq. (71) yields the condition
formula
86
By factorization of the quotient this transforms into
formula
87
The bracketed term on the right-hand side of (87),
formula
88
shares some similarities with the progress rate of the sphere model (Beyer, 2001, p. 217),
formula
89
Actually, Eq. (89) is recovered in (88) for the sphere model, that is, .
Recalling the identity (see Eq. (71)) finally yields the steady state condition
formula
90
Introducing into (90) and considering the corresponding SAR function yields the well-known sphere model steady state condition (Meyer-Nieberg and Beyer, 2005)
formula
91
This is remarkable. The novel analysis approach presented describes the steady state by an equation (90) that is formally similar to the sphere model theory of the SA-ES.

Condition (90) allows for the calculation of the normalized steady state mutation strength . Regarding both sides of Eq. (90) as functions of , we see that the curves intersect at the normalized steady state mutation strength of the ES. For the -ES on the sphere model () as well as on the ellipsoid model , graphs using search space dimension are shown in Figure 9. In each case, three different choices of the cumulation parameter are considered: , , and . The damping parameter is . The numerically computed solution of Eq. (90) is represented by the black dots. According to Eq. (88), the right-hand side of (90) is independent of the choice of the parameters c and D. On the other hand, depends on c and D; see Eq. (55). Thus, variations in c lead to relocations of the intersection point. From this behavior the existence of an optimal c value can be conjectured that tunes the ES to operate at maximal progress rate .

Figure 9:

Steady state condition (Eq. (90)) for a -CSA-ES with . Both sides of Eq. (90) are plotted against the normalized mutation strength . The normalized steady state mutation strength is localized at the intersection of the respective curves.

Figure 9:

Steady state condition (Eq. (90)) for a -CSA-ES with . Both sides of Eq. (90) are plotted against the normalized mutation strength . The normalized steady state mutation strength is localized at the intersection of the respective curves.

According to Eq. (55), an influence of D is displayed in Figure 10 for the case considering the sphere and the ellipsoid. The D values are varied while holding the cumulation parameters and , respectively, constant. The red solid line in the figures corresponds to the right-hand side of Eq. (90) together with (88). The curves, which depend on the parameter D, are represented by the marked blue lines. As one can see, D influences the slope of . That is, increasing D while keeping c constant leads to a decrease of the slope. As a consequence, the intersection point of both curves moves to the right, that is, the normalized steady state mutation strength is increased. Independent of the choice of the damping parameter D, all graphs intersect at the same point on the x-axis that is the zero of . For the sphere model this intersection point is independent of c. Since , one obtains for the root of (55) . In the case of the ellipsoid model with , this zero varies with the cumulation parameter c. It shifts to the right for smaller c values. However, the corresponding steady state can only be obtained from Eq. (55) by numerical root finding.

Figure 10:

Influence of the damping parameter D on the numerical solution of Eq. (90). The red solid line depicts the right-hand side of Eq. (90). The -CSA-ES operates on the sphere, , and the ellipsoid model with , using cumulation parameters (graphs (a) and (b)) and (graphs (c) and (d)) for .

Figure 10:

Influence of the damping parameter D on the numerical solution of Eq. (90). The red solid line depicts the right-hand side of Eq. (90). The -CSA-ES operates on the sphere, , and the ellipsoid model with , using cumulation parameters (graphs (a) and (b)) and (graphs (c) and (d)) for .

The red solid line in Figure 10 represents the right-hand side of (90), which is by virtue of (88) and (87) equal to half the steady state . The latter determines via (61) the rate at which the ES approaches the optimizer in the steady state. Since is determined by , it depends in turn on the choice of D and c. Figure 11 displays these dependencies. To this end, is multiplied with the term in order to reduce the impact of the considered ellipsoid model as well as the impact of the population sizes on the realized progress. The resulting values are then plotted versus , the cumulation time constant that influences the fading of the search path memory within the CSA-ES. The sphere model and the ellipsoid case are considered. As one can see in Figure 11, for the ellipsoid with there is almost no influence of the different damping constant D formulas on the progress rate toward the optimizer in the steady state. This is different than the case of the sphere model. The ellipsoid case , not displayed in this paper, lies in between these two models.

Figure 11:

Numerically calculated steady state mode eigenvalue multiplied by the term is plotted against the parameter . The curves for three different choices of D in search space dimensions and are illustrated using a -CSA-ES on the sphere (graphs (a) and (b)) and the ellipsoid with (graphs (c) and (d)). Note that the curves regarding the ellipsoid overlap almost completely.

Figure 11:

Numerically calculated steady state mode eigenvalue multiplied by the term is plotted against the parameter . The curves for three different choices of D in search space dimensions and are illustrated using a -CSA-ES on the sphere (graphs (a) and (b)) and the ellipsoid with (graphs (c) and (d)). Note that the curves regarding the ellipsoid overlap almost completely.

As for the sphere model (Figures 11a and 11b), seems to be a better choice of the damping parameter than the standard recommendation of Hansen and Ostermeier (2001). However, this ignores the effect of possible oscillations that were neglected by considering the asymptotic solution of the iterative schemes using the Ansatz (61), (62). Using small D values (Table 1, line 11), such as , results in large generational changes, the driving force of oscillations observed by Hansen (1998). These oscillations can lead to considerable regression of the strategy’s progress. That is why larger D values, such as , are recommended.

8  Derivation of Closed-Form Expressions for the Steady State

In this section the normalized steady state mutation strength is derived, which in turn yields the convergence rate and the expected runtime of the CSA-ES. In order to find a closed-form solution it is necessary to start with assumption (56),
formula
This allows for the use of the simplified SAR function (60) from iterative scheme D (Table 5), which after normalization (3) reads
formula
92
in the strategy’s steady state. In the first step, considering only the sphere model (), we are able to replicate previous findings from Arnold and Beyer (2004). Then, with the linear approximation of the steady state mode eigenvalue, the results are extended to more general ellipsoid models.

8.1  Derivation for the Sphere Model

On the basis of Eq. (90), using (92) and (88) and inserting the coefficients of the sphere model, , the equality
formula
93
is obtained. In order to solve Eq. (93) for it is rearranged to
formula
94
Resolving for the normalized steady state mutation strength yields after a short calculation
formula
95
The different recommendations for the choice of D and c given by Hansen (1998),
formula
96
and Hansen and Ostermeier (2001),
formula
97
do not influence the outcome of Eq. (95) as . In all these cases, one gets and , yielding
formula
98
This estimate for the normalized steady state mutation strength of the -CSA-ES on the sphere model was obtained by Arnold and Beyer (2004) using another approach. This result indicates that the steady state is by a factor of too large compared to the optimal sphere model value that guarantees maximal convergence rate toward the optimizer for the sphere model. As discussed in connection with Figures 11a and 11b, choosing larger D can improve the situation as far as is concerned. Equation (95) can be used to tune D to a certain extent to a target mutation strength (). Resolving (95) with for D yields
formula
99
The applicability of the resulting D in real CSA-ES algorithms must, however, be taken with care. As discussed in Section 7.3, D values too small can result in oscillatory behavior.

8.2  Derivation for the Ellipsoid Model

Considering other ellipsoid models than the special case of the sphere model, the analytical derivation of the normalized steady state mutation strength from Eq. (90) is not possible because of the nonavailability of a closed-form expression for (88). Therefore, Eq. (71) is used as starting point, and the linear approximation for the smallest eigenvalue, Eq. (77), is applied. That is, must be solved for . Using expressions (92) and (77), we obtain
formula
100
Equation (100) can then be solved for the normalized steady state mutation strength ,
formula
101
yielding finally the normalized steady state mutation strength:
formula
102
This equation depends only on the ellipsoid model coefficients as well as on the choice of strategy parameters of the ES. Inserting (102) into Eq. (77) yields the linear approximation of the steady state mode eigenvalue,
formula
103
That is, provided that the linear eigenvalue approximation holds, the CSA-ES approaches by virtue of Eq. (61) the optimizer with the convergence rate (103). With the help of (102), a condition on the choice of D can be derived. Requiring the convergence criterion (14), , we obtain
formula
104
According to Figure 8, the linear approximation (77) is valid up to a certain only. Therefore, the validity of the Eqs. (102), (103), and (104) is restricted, too. Additionally, the derivation of the equations by use of (92) is only admissible assuming that condition (58) holds, which constrains the range of the cumulation parameter,
formula
105
This must be kept in mind when applying these formulas. Figure 12 compares the theoretical predictions of the steady state convergence rate with measurements from real -CSA-ES runs as well as with iteratively generated results by making use of scheme A. The ellipsoid models and are considered. The experimental convergence rates were obtained by running the -CSA-ES over a sufficiently long time until it reaches the steady state. Then the values of the last of generations were averaged over 100 independent runs. After that, a linear polynomial was fitted to the experimental data, yielding in Figure 12. This curve-fitting technique was also applied to the iterative values resulting from scheme A of Table 2. As one can see, there is a good agreement of the linearized theory with the real ES runs. That is why the equations obtained can be used to estimate the expected runtime of the CSA-ES.
Figure 12:

Convergence rates (steady state eigenvalues) realized by the -CSA-ES are plotted against the search space dimension N for the ellipsoid models (left) and (right). Parameters are and . The experimental data are averaged over 100 independent runs.

Figure 12:

Convergence rates (steady state eigenvalues) realized by the -CSA-ES are plotted against the search space dimension N for the ellipsoid models (left) and (right). Parameters are and . The experimental data are averaged over 100 independent runs.

8.3  Expected Runtime of the CSA-ES on Ellipsoid Models

The strategy’s steady state dynamics are governed by an exponential decrease of the yi components given by Eq. (61), where the inverse time constant is determined by Eq. (77). By inserting (61) into the ellipsoid fitness model (1) the steady state fitness dynamics can be determined. Starting from generation g0 sufficiently large such that transient effects have vanished, the fitness value after g generations is
formula
106
Consequently, the objective function value decreases exponentially fast with time constant . Equation (106) allows for the estimation of the expected running time G in which the fitness value is improved by a factor of . Considering , from (106) one obtains and resolving this for G results in . Inserting (77) finally yields