The behavior of the -Evolution Strategy (ES) with cumulative step size adaptation (CSA) on the ellipsoid model is investigated using dynamic systems analysis. At first a nonlinear system of difference equations is derived that describes the mean value evolution of the ES. This system is successively simplified to finally allow for deriving closed-form solutions of the steady state behavior in the asymptotic limit case of large search space dimensions. It is shown that the system exhibits linear convergence order. The steady state mutation strength is calculated, and it is shown that compared to standard settings in self-adaptive ESs, the CSA control rule allows for an approximately -fold larger mutation strength. This explains the superior performance of the CSA in non-noisy environments. The results are used to derive a formula for the expected running time. Conclusions regarding the choice of the cumulation parameter c and the damping constant D are drawn.
The performance of evolution strategies (ESs) depends crucially on the optimal control of the mutation strength , which determines the step length of the search steps used to generate offspring from parents. There are basically four established methods to learn/control the mutation strength: Rechenberg’s 1/5-rule (Rechenberg, 1973), self-adaptation (SA) (Rechenberg, 1973; Schwefel, 1977), meta-ES (Herdy, 1992; Rechenberg, 1994), and cumulative step size adaptation (CSA) (Ostermeier et al., 1994). Understanding and analyzing the working principles of these adaptation techniques by considering the ES in conjunction with the objective functions to be optimized allows for a well-grounded choice of strategy-specific parameters, such as the learning parameter and the damping constant. The analysis approach, which has been the most fruitful one until now, considers the ES + objective function as a dynamic system (Meyer-Nieberg and Beyer, 2012). That is, it is the goal of the analysis to determine the time evolution of the system. However, since ESs are probabilistic algorithms, this analysis concerns the dynamics of stochastic, most often nonlinear, systems. Because of the difficulties of such an analysis, progress in this field has been rather slow. The first fully analyzed algorithm concerned the -SA-ES on the sphere (Beyer, 1996). In the following years progress was made in different directions concerning more complex, that is, recombinative ES and also more complex objective functions such as ridge functions and a subset of positive definite quadratic forms (PDQFs); see, for instance, Arnold and Beyer (2004); Jägersküpper (2006); Beyer and Meyer-Nieberg (2006); Arnold (2007); and Beyer and Finck (2010).
The most advanced dynamic systems analysis approach on ESs was presented by Beyer and Melkozerov (2014), who investigated the -SA-ES on the ellipsoid model. In that paper a new progress measure was introduced to model the dynamics of the quadratic distances of the parental state to the optimizer: the quadratic progress rate. While that work completed the analysis of the isotropic self-adaptive standard ES, a similar analysis of CSA control has not advanced that much. The fitness models considered so far concerned the cigar function (Arnold, 2007; Arnold and Beyer, 2010) and another special case of PDQFs consisting of a mixture of two sphere models (Beyer and Finck, 2010). These analyses were performed along the line developed in Arnold (2002). Because of the inherent symmetries of those fitness models (cigar and mixture of two spheres) the dynamics in the search space can be lumped together, thus reducing the dynamics to a few state variables describing the approach to the optimizer. This aggregation not only concerns the parental state in the search space but also the evolution path cumulation that is used to measure the average length of the actually realized change steps between two consecutive parental states, which are used to control the mutation strength . As a result, the analysis presented by Beyer and Melkozerov (2014) for the -SA-ES cannot be directly transferred to the corresponding CSA-ES. It is the aim of this paper to extend the analysis method developed by Beyer and Melkozerov (2014) to handle the path cumulation in CSA-ES. To this end, we deviate from the standard analysis developed by Arnold (2002) and fall back to the analysis method that has been successful in the analysis of SA. That is, we derive a self-adaptation response (SAR) function for the cumulative step size adaptation. This approach allows for an analysis of the CSA-ES similar to that of the -SA-ES in Beyer and Melkozerov (2014).
The derivation of a SAR function for CSA is challenging, since a SAR function for CSA in the proper meaning of the word does not exist. The original SAR function (Beyer, 2001) is a one-generation progress measure that describes the expected relative change from generation g to . Path cumulation is, however, a process that is nonlocal in time, that is, it is the result of the cumulation process that incorporates a fading record of parental steps taken. As a result, a SAR function for CSA must necessarily be a quantity that depends not only on the damping constant D (which corresponds to the learning parameter in the SA) but also on the cumulation time constant .
The analysis to be presented can be regarded as an important preparatory step for an analysis of the CMA-ES (Hansen and Ostermeier, 2001), since it provides a system of difference equations that describes the evolution for ellipsoid models. Since CMA-ES transforms quadratic models via the square root of the covariance matrix in another quadratic model, this analysis also holds for these transformed models. That is, the analysis presented is a building block for a complete analysis of the CMA-ES.
The paper is organized as follows. After a short recap of the ideas of CSA and the ellipsoid model, results of the analysis of the self-adaptive ES are reviewed as the basis for the derivations presented in subsequent sections. Next, the CSA relevant path cumulation is analyzed in Section 5, yielding a system of recurrence equations that are combined in Section 6 to form a system of evolution equations describing the mean value dynamics of the -CSA-ES. In a next step, simplifications are introduced in order to obtain a system of tractable evolution equations that allows for the calculation of a function describing the expected generational change in the steady state, similar to the self-adaptation response function used in the analysis of the SA-ES. The resulting system of evolution equations is then treated by an Ansatz similar to the one used by Beyer and Melkozerov (2014). The steady state of the normalized mutation strength dynamics is considered, and the influence of the strategy parameters c and D are discussed. In Section 8 the steady state mutation strength and the convergence rates are calculated in terms of closed-form expressions. The results are used to estimate the expected running time. Conclusions are drawn in Section 9.
2 The -CSA-ES Algorithm
The focus of this paper is on the dynamic behavior of the -CSA-ES. As indicated by the notation, nonelitist selection (“,” selection) rather than elitist selection (“+” selection) is considered. That is, in each generation only the offspring population is involved in the selection process. The comma selection is advantageous in real-valued parameter optimization because it allows for the use of greater mutation strengths, which is, for example, beneficial when optimizing multimodal objective functions. The -CSA-ES controls the mutation strength, also referred to as step size, within the algorithm by cumulative step size adaptation (see Ostermeier et al., 1994). CSA gathers previously successful search steps in a fading search path history. The mutation strength is then adjusted depending on the length of this search path. Another option to adapt the mutation strength of the ES is self-adaptation. In contrast to CSA, it provides each offspring with an individual mutation strength computed from the recombined mutation strengths of the best offspring of the previous generation. The SA was investigated by Beyer and Melkozerov (2014).
The pseudocode of the -CSA-ES is presented in Table 1. First the initial parental parameter vector, also referred to as the parental centroid, the initial mutation strength , and the initial search path are specified. Then offspring are generated (lines 3–5) by adding the product of the mutation strength and an N-dimensional random mutation vector to the parental centroid. The components of each are independent and identically distributed standard normal variates. The corresponding fitness function value Fl of each offspring is calculated in line 6. In line 8 the mutation vectors of the best offspring with respect to fitness are recombined to create their centroid . As indicated by the index I in , this centroid is generated using intermediate recombination. In this context the subscript denotes the mth best of the offspring (i.e., in case of minimization the offspring with the mth smallest fitness value). The centroid of the best mutation vectors is used in line 9 to compose a new parental centroid and in line 10 to update the search path . This search path contains a fading record of the strategy’s previous steps. The length of its memory is determined by the choice of the constant parameter , referred to as the cumulation parameter. The mutation strength is then updated in line 11 by multiplication with an exponential value depending on the length of the search path as well as on the damping parameterD. is a constant parameter that determines the magnitude of the mutation strength updates. The sign of determines whether the mutation strength is increased or decreased. Long search paths indicate that the steps made by the ES collectively point in one direction and could be replaced with fewer but longer steps. Short search paths suggest that the strategy steps back and forth and thus that smaller step sizes should be beneficial. After termination the strategy returns the current parental centroid, which is considered an approximation of the optimizer of the objective function .
|Initialize() .||Line .|
|Until (termination condition)||13|
|Initialize() .||Line .|
|Until (termination condition)||13|
3 The Ellipsoid Model
The dynamic behavior of the -CSA-ES on ellipsoid model (1) is illustrated in Figure 1. It presents the results of typical runs of the ES, focusing on the squared components of the parental centroid as well as on the mutation strength dynamics. Approaching the optimizer, the strategy continuously decreases the mutation strength with the passing number of generations.
4 Extending Previous Results to CSA-ES
5 The Mutation Strength Dynamics
6 Evolution Equations
A first representation of the strategy’s evolution behavior is provided in Table 2, iterative scheme A. The one-generation behavior of the component-wise squared distance to the optimizer is modeled in (A.1) by use of Eq. (9). Using Eq. (20) to express the expected values in Eq. (39) yields the iterative relation (A.2) for the scalar product components . The iterative description of the squared length of the search path in (A.3) is obtained by inserting Eq. (21) into Eq. (41). Finally, the mutation strength adaptation is specified in (A.4) using Eq. (16).
Whether the modeling approach yields meaningful results can be checked by comparing the iteratively generated dynamics of the system of evolution equations A in Table 2 with experimental results of real -CSA-ES runs (see Figure 3). The typical long-term behavior of the ES is observed for , which is obtained by iteration of scheme A starting from , . Cumulation and damping parameters have been set to and . Considering a small-dimension N, the experimental data of the ES slightly deviate from the theoretical predictions. These deviations diminish with increasing search space dimension. In fact, on both sides a good agreement of iterative and experimental dynamics can be observed.
In Figure 5 two phases in the dynamics of the -CSA-ES can be observed. After the start of the optimization the ES dynamics enter a transient phase. This is followed by approaching a steady state behavior. The transient period is characterized by a decrease of the curves and the values. The rate of this decline increases with i. That is, y1 decreases significantly slower than yN. The steady state behavior is featured by a slower decrease with the same rate for all single components . In particular, the dynamics fall with a log-linear law. The steady state of the values exhibits a log-linear behavior as well, but with a different rate of decline.
7 Steady State Dynamics
7.1 Derivation of Simplified Evolution Equations via Self-Adaptation Response of CSA
Equation (59) has an interesting interpretation considering expected values. If , then the steady state length square of the evolution path is greater than the length square of a random path (being N). Thus, by virtue of Table 1, line 11, is increased. As a result, can increase toward , the optimal value for the sphere model. In the opposite case , a decrease of toward happens. That is, in a static case (also referred to as scale-invariant case) the control rule in Table 1, line 11, drives to its optimal sphere model value. Note, however, in the real ES algorithm and its approximation schemes, for instance, Table 5, this does not happen, since the dynamics influence the evolution and a steady state will result. Determining the real steady state is done in the remainder of this paper.
The approximation quality of system D is validated in Figure 7 for different choices of the cumulation parameter c and the search space dimension N. In Figures 7a and 7c the cumulation parameter c is set to in such a way that condition (58) is not satisfied (provided that the ellipsoid coefficients are ). As a consequence, one observes larger deviations between the two iterative schemes C and D. However, these deviations are generally more pronounced in the transient phase of the evolutionary process that is emphasized in the plots because of the use of a logarithmic scale for the horizontal axes. Figures 7b and 7d display a scenario where condition (58) is fulfilled (). Especially with growing dimensionality a visually good agreement of both systems of evolution equation can be observed.
7.2 The Eigenvalue Problem
7.3 The Normalized Mutation Strength in the Steady State
Condition (90) allows for the calculation of the normalized steady state mutation strength . Regarding both sides of Eq. (90) as functions of , we see that the curves intersect at the normalized steady state mutation strength of the ES. For the -ES on the sphere model () as well as on the ellipsoid model , graphs using search space dimension are shown in Figure 9. In each case, three different choices of the cumulation parameter are considered: , , and . The damping parameter is . The numerically computed solution of Eq. (90) is represented by the black dots. According to Eq. (88), the right-hand side of (90) is independent of the choice of the parameters c and D. On the other hand, depends on c and D; see Eq. (55). Thus, variations in c lead to relocations of the intersection point. From this behavior the existence of an optimal c value can be conjectured that tunes the ES to operate at maximal progress rate .
According to Eq. (55), an influence of D is displayed in Figure 10 for the case considering the sphere and the ellipsoid. The D values are varied while holding the cumulation parameters and , respectively, constant. The red solid line in the figures corresponds to the right-hand side of Eq. (90) together with (88). The curves, which depend on the parameter D, are represented by the marked blue lines. As one can see, D influences the slope of . That is, increasing D while keeping c constant leads to a decrease of the slope. As a consequence, the intersection point of both curves moves to the right, that is, the normalized steady state mutation strength is increased. Independent of the choice of the damping parameter D, all graphs intersect at the same point on the x-axis that is the zero of . For the sphere model this intersection point is independent of c. Since , one obtains for the root of (55) . In the case of the ellipsoid model with , this zero varies with the cumulation parameter c. It shifts to the right for smaller c values. However, the corresponding steady state can only be obtained from Eq. (55) by numerical root finding.
The red solid line in Figure 10 represents the right-hand side of (90), which is by virtue of (88) and (87) equal to half the steady state . The latter determines via (61) the rate at which the ES approaches the optimizer in the steady state. Since is determined by , it depends in turn on the choice of D and c. Figure 11 displays these dependencies. To this end, is multiplied with the term in order to reduce the impact of the considered ellipsoid model as well as the impact of the population sizes on the realized progress. The resulting values are then plotted versus , the cumulation time constant that influences the fading of the search path memory within the CSA-ES. The sphere model and the ellipsoid case are considered. As one can see in Figure 11, for the ellipsoid with there is almost no influence of the different damping constant D formulas on the progress rate toward the optimizer in the steady state. This is different than the case of the sphere model. The ellipsoid case , not displayed in this paper, lies in between these two models.
As for the sphere model (Figures 11a and 11b), seems to be a better choice of the damping parameter than the standard recommendation of Hansen and Ostermeier (2001). However, this ignores the effect of possible oscillations that were neglected by considering the asymptotic solution of the iterative schemes using the Ansatz (61), (62). Using small D values (Table 1, line 11), such as , results in large generational changes, the driving force of oscillations observed by Hansen (1998). These oscillations can lead to considerable regression of the strategy’s progress. That is why larger D values, such as , are recommended.