## Abstract

Theoretical analyses of evolution strategies are indispensable for gaining a deep understanding of their inner workings. For constrained problems, rather simple problems are of interest in the current research. This work presents a theoretical analysis of a multi-recombinative evolution strategy with cumulative step size adaptation applied to a conically constrained linear optimization problem. The state of the strategy is modeled by random variables and a stochastic iterative mapping is introduced. For the analytical treatment, fluctuations are neglected and the mean value iterative system is considered. Nonlinear difference equations are derived based on one-generation progress rates. Based on that, expressions for the steady state of the mean value iterative system are derived. By comparison with real algorithm runs, it is shown that for the considered assumptions, the theoretical derivations are able to predict the dynamics and the steady state values of the real runs.

## 1  Introduction

Thorough theoretical investigations of evolution strategies (ESs) are necessary for gaining a deep understanding of how they work. Much research has been done for analyzing ESs applied to unconstrained problems. For the constrained setting, there are still many aspects for which a deep theoretical understanding is missing. Theoretical comparisons of different infeasibility handling techniques are of interest. In addition, theoretical investigations of parameter choices have not yet been deeply investigated for constrained problems. Furthermore, theoretical comparisons of mutation strength control mechanisms are not fully explored. Moreover, as all of these heavily depend on the problem under consideration, more problems that model specific parts of practical constrained optimization problems need to be considered. As a step in that direction, this work theoretically analyzes a $(μ/μI,λ)$-ES with cumulative step size adaptation (CSA) applied to a conically constrained linear problem.

Regarding related work, a $(1,λ)$-ES with constraint handling by discarding infeasible offspring has been analyzed by Arnold (2011b) for a single linear constraint. Repair by projection has been considered (Arnold, 2011a) and a comparison with repair by reflection and repair by truncation has been performed by Hellwig and Arnold (2016). In all of that work, analysis of the respective algorithms' one-generation and steady state behavior considering CSA is performed. Expressions with integrals are numerically integrated. Based on Lagrangian constraint handling, Arnold and Porter (2015) presented a $(1+1)$-ES applied to a single linear inequality constraint with the sphere model. The one-generation behavior has been analyzed in that work. The insights gained from that analysis have been used to develop an algorithm that is experimentally evaluated.

A theoretical investigation based on Markov chains for a multirecombinative variant with Lagrangian constraint handling has been presented by Atamna et al. (2016). Investigation of a single linear constraint in that work has been extended to multiple linear constraints (Atamna et al., 2017).

The simple linearly constrained optimization problem with a single linear constraint used in Arnold (2011b), Arnold (2011a), and Hellwig and Arnold (2016) has been used as one step toward a better understanding of ESs for constrained optimization. However, as it does not have an optimal solution, it does model only parts of the difficulties in handling constrained optimization problems. In particular, if the linear objective function gradient points toward the constraint, the ES has to avoid premature convergence. As a step further, Arnold (2013) has considered a conically constrained linear problem in which the linear objective function gradient coincides with the cone axis. As an extension, Arnold (2014) has proposed the more general problem that does not assume that the objective function gradient points in direction of the cone axis. However, in the work presented in our article, the choice of the objective function gradient in direction of the cone axis is used because it simplifies the theoretical analysis. This problem models the situation of the optimum lying on the boundary of the feasibility region in addition to the premature convergence difficulty. Hence, the ES has to avoid premature convergence but at the same time the steps need to get smaller and smaller during the process of approaching the optimum. Arnold (2013) has considered a $(1,λ)$-ES applied to that problem by discarding infeasible offspring. Integral expressions for the one-generation behavior have been derived. Numerical integration has been used to analyze the steady state behavior. For the $σ$ control strategy, CSA has been used. Spettel and Beyer (2018a) have considered the same problem and have analyzed a $(1,λ)$-$σ$-Self-Adaptation ES ($σ$SA-ES). It has been extended to the multirecombinative $(μ/μI,λ)$ variant (Spettel and Beyer, 2018b). In contrast to the work by Arnold, closed-form expressions have been derived under asymptotic assumptions for the one-generation behavior, the mean value dynamics, and the steady state behavior. This enables one to directly analyze the influence of the different algorithmic parameters for the algorithm behavior.

The contribution of this article is the analysis considering CSA instead of $σ$SA for the mutation strength control mechanism in the $(μ/μI,λ)$-ES with repair by projection applied to the conically constrained linear problem. Based on the derivations for the microscopic behavior in Spettel and Beyer (2018b), approximate closed-form expressions are derived for predicting the mean value dynamics and the steady state behavior of the multirecombinative CSA-ES with projection. Multirecombinative ESs with projection have already shown their suitability for constrained problems in real-world applications. Beyer and Finck (2012) have considered repair by projection in the context of a portfolio optimization problem. This design gave rise to a more principled design using projection for optimization problems with linear constraints (Spettel et al., 2018). In line with those empirical investigations, the theoretical analysis in this work concerns multirecombinative ESs. In order to simplify the analysis, isotropic mutations are considered. This removes the need to analyze the covariance matrix adaptation. Again, the aim is to derive closed-form expressions under asymptotic considerations for predicting the mean value dynamics and the steady state behavior. Because the CSA is a completely different approach for adapting the mutation strength $σ$ in comparison with $σ$SA, one main interest is the comparison of those two $σ$ control mechanisms. CSA uses information gained from the observed steps taken in the search space, whereas $σ$SA uses the information from the objective function space. Furthermore, it is not immediately apparent how the cumulation parameter in the CSA should be chosen. The theoretical analysis gives insights in this regard. A particular focus of the analysis lies on the difference between the choice of $c=1N$ proposed in early publications (Hansen and Ostermeier, 1997) on the covariance matrix adaptation evolution strategy (CMA-ES) and $c=Θ1N$ that is proposed in newer publications (Hansen and Ostermeier, 2001; Hansen, 2016).

The remainder of the article is organized as follows. Section 2 introduces the optimization problem under consideration and describes the algorithm that is analyzed. Section 3 concerns the theoretical analysis. First, a mean value iterative system that models the dynamics of the ES is derived in Section 3.1. Second, steady state considerations are shown in Section 3.2. For the theoretical considerations, plots comparing them with results of real ES runs are presented to show the approximation quality. Section 4 discusses the results and Section 5 concludes the article.

## 2  Problem and Algorithm

Minimization of
$f(x)=x1$
(1)
subject to constraints
$x12-ξ∑k=2Nxk2≥0$
(2)
$x1≥0$
(3)
is considered in this work ($x=(x1,…,xN)T∈RN$ and $ξ>0$).

The state of an ES individual can be uniquely described in the $(x,r)T$-space. It consists of $x$, the distance from 0 in $x1$-direction (cone axis), and $r$, the distance from the cone axis. Because isotropic mutations are considered in the ES, the coordinate system can be rotated (w.l.o.g.) such that $(x˜,r˜)T$ corresponds to $(x˜,r˜,0,…,0)T$ in the parameter space. Figure 1 visualizes the problem. The equation for the cone boundary is $r=xξ$, which follows from Equation (2). The projection line can be derived using the cone direction vector $1,1ξT$ and its counterclockwise rotation by 90 degrees $-1ξ,1T$ yielding $r=-ξx+qξ+1ξ$. The values $x$ and $x˜$ denote a parental individual and an offspring individual, respectively. The corresponding mutation is indicated as $σ˜z$. The values of $x$ and $r$ after projection are denoted by $q$ and $qr$, respectively.

Figure 1:

The conically constrained optimization problem in $N$ dimensions shown in the $(x,r)T$-space. As shown in the figure, the offspring individual $x˜$ is infeasible and therefore projected onto the cone boundary at $(q,qr)T$.

Figure 1:

The conically constrained optimization problem in $N$ dimensions shown in the $(x,r)T$-space. As shown in the figure, the offspring individual $x˜$ is infeasible and therefore projected onto the cone boundary at $(q,qr)T$.

As mentioned in the introduction, the algorithm to be analyzed is a $(μ/μI,λ)$-CSA-ES with repair by projection applied to the problem introduced above. Its pseudocode is shown in Algorithm 1. In the beginning, the parameters are initialized (Lines 1 to 2). In the generational loop, $λ$ offspring are created (Lines 6 to 16). Each offspring's parameter vector is sampled from a multivariate normal distribution with mean $x(g)$ and standard deviation $σ(g)$ in Lines 7 and 8. If the generated offspring is infeasible ($isFeasible(x)=x1≥0∧x12-ξ∑k=2Nxk2≥0$), its parameter vector is projected onto the point on the boundary of the feasible region that minimizes the Euclidean distance to the offspring point. The corresponding mutation vector leading to this repaired point is calculated back (Lines 9 to 12). Projection means solving the optimization problem
$y^=argminy∥y-x∥2s.t.y12-ξ∑k=2Nyk2≥0y1≥0,$
(4)
where $x$ is the individual to be projected. The function
$y^=repair(x)$
(5)
is introduced, which returns $y^$ of the problem stated in (4). Appendix A in the supplementary material of Spettel and Beyer (2018b) shows a geometrical approach for deriving a closed-form solution to the projection optimization problem (4) that this work builds on. Given an infeasible individual $x$, it reads
$y^=ξξ+1x1+||r||ξ1,x2ξ||r||,…,xNξ||r||Tifξξ+1x1+||r||ξ>00otherwise,$
(6)
where $||r||=∑k=2Nxk2$. After possible repair, the offspring's fitness is determined in Line 13. The next generation's parental individual $x(g+1)$ (Line 18) and the next generation's mutation strength $σ(g+1)$ (Line 20) are computed next. The next generation's parental parameter vector is set to the mean of the $μ$ best (w.r.t. fitness) offspring parameter vectors.1 For the mutation strength update, first the cumulative $s$-vector is updated. The cumulation parameter $c$ determines the fading strength. The mutation strength is then updated using this $s$-vector. The parameter $D$ acts as a damping factor. If the squared length of the $s$-vector is smaller than $N$, the step size is decreased. Otherwise, the step size is increased. Intuitively, this means that multiple correlated steps allow a larger step size and vice versa. The update of the generation counter ends one iteration of the generation loop. The values $x(g)$, $r(g)$, $ql$, $ql'$, $〈q〉$, and $〈qr〉$ are only needed in the theoretical analysis and can be removed in practical applications of the ES. They are indicated in the algorithm in Lines 4, 5, 14, 15, 21, and 22, respectively.

Figure 2 shows an example of the $x$- and $r$-dynamics of Algorithm 1 (solid line) in comparison with results of the closed-form approximate iterative system (dotted line) that is derived in the sections that follow. As one can see, the theoretical prediction comes close to the real dynamics for the case shown.

Figure 2:

Real $(3/3I,10)$-CSA-ES run mean value dynamics (solid line) in comparison to the iteration of the closed-form (approximate) iterative system (dotted line).

Figure 2:

Real $(3/3I,10)$-CSA-ES run mean value dynamics (solid line) in comparison to the iteration of the closed-form (approximate) iterative system (dotted line).

## 3  Theoretical Analysis

To completely describe the state of the ES, the random variables $σ$, $s$, and the squared length $||s||2$ need to be modeled in addition to the variables for the position in the parameter space, $x$ and $r$. The random vector $s$ is decomposed into its component along the cone axis $s1(g)$ and the remaining part in direction of the parental individual's $2..N$ components
$s⊙(g):=1r(g)∑k=2N(x(g))k(s(g))k.$
(7)

### 3.1  Derivation of a Mean Value Iterative System for Modeling the Dynamics of the ES

The aim of this section is to derive (approximate) closed-form expressions for predicting the mean value dynamics of the ES. To make the distinction between the random variable and its mean value in the iterative system clear, $z¯:=E[z]$ is used to denote the expected value of a random variate $z$. Thus, the iterative system can be written as
$x(g+1)¯r(g+1)¯s1(g+1)¯s⊙(g+1)¯||s(g+1)||2¯σ(g+1)¯←x(g)¯r(g)¯s1(g)¯s⊙(g)¯||s(g)||2¯σ(g)¯+εx(g)εr(g)εs1(g)εs⊙(g)ε||s(g)||2εσ(g),$
(8)
where $x(g)¯$, $r(g)¯$, $s1(g)¯$, $s⊙(g)¯$, $||s(g)||2¯$, and $σ(g)¯$ represent the expected value parts and $εx(g)$, $εr(g)$, $εs1(g)$, $εs⊙(g)$, $ε||s(g)||2$, and $εσ(g)$ represent the fluctuation parts. Assuming that the fluctuation parts are negligible, the mean value iterative system
$x(g+1)¯r(g+1)¯s1(g+1)¯s⊙(g+1)¯||s(g+1)||2¯σ(g+1)¯←x(g)¯r(g)¯s1(g)¯s⊙(g)¯||s(g)||2¯σ(g)¯$
(9)
can be stated. This section presents derivations of difference equations for the system (9). Using those derived difference equations allows one to approximately predict the mean value behavior by iterating the mean value system. In Section 3.1.1, difference equations are presented for expressing $x(g+1)¯$ with $x(g)¯$ and $r(g+1)¯$ with $r(g)¯$ by using the respective local progress rates. Sections 3.1.2, 3.1.3, and 3.1.4 deal with the derivation of difference equations for $s1¯$, $s⊙¯$, and $||s(g)||2¯$, respectively. They are derived from the corresponding steps of Section 1 and they also make use of the local progress rates. Finally, the difference equation for $σ¯$ is stated in Section 3.1.5, the derived system of equations is summarized, and it is compared to real ES runs in Section 3.1.6.

#### 3.1.1  Derivation of Mean Value Difference Equations for $x$ and $r$

The starting points for the derivation of mean value difference equations for $x$ and $r$ are the progress rates in $x$ and $r$ direction. Their definitions read
$φx(x(g)¯,r(g)¯,σ(g)¯):=E[x(g)¯-x(g+1)|x(g)¯,r(g)¯,σ(g)¯]$
(10)
$φr(x(g)¯,r(g)¯,σ(g)¯):=E[r(g)¯-r(g+1)|x(g)¯,r(g)¯,σ(g)¯].$
(11)
They describe the one-generation expected change in the parameter space. The normalizations
$φx*:=Nφxx(g)¯,$
(12)
$φr*:=Nφrr(g)¯,$
(13)
and
$σ*:=Nσr(g)¯$
(14)
are introduced in order to have quantities that are independent of the position in the search space. Using Equation (10) with Equation (12) and Equation (11) with Equation (13), the equations
$x(g+1)¯=x(g)¯-x(g)¯φx(g)*N=x(g)¯1-φx(g)*N$
(15)
$r(g+1)¯=r(g)¯-r(g)¯φr(g)*N=r(g)¯1-φr(g)*N$
(16)
follow. Approximations for $φx(g)*$ and $φr(g)*$ have already been derived by Spettel and Beyer (2018b, Equations (37) and (38)). They are recapped here to make this work self-containing. Note that in this regard the present work builds on Spettel and Beyer (2018b), in which expressions for $φx(g)*$ and $φr(g)*$ have been derived under the asymptotic assumptions of sufficiently large values of $ξ$ and $N$. In those derivations, two cases have been distinguished. If one considers the ES being far from the cone boundary, offspring are feasible with overwhelming probability. The opposite case of being in the vicinity of the cone boundary results in infeasible offspring almost surely. These observations allow simplifications for the former case because the projection can be ignored. Both cases are combined into single equations by weighting the feasible and infeasible cases with an approximation for the offspring feasibility and offspring infeasibility probability, respectively. The $r$-distribution in those derivations has been approximated by a normal distribution $N(r¯,σr2)$ where
$r¯=r(g)¯1+σ(g)*¯2N1-1N$
(17)
and
$σr=r(g)¯σ(g)*¯N1+σ(g)*¯22N1-1N1+σ(g)*¯2N1-1N$
(18)
(refer to Appendix B in the supplementary material of Spettel and Beyer (2018b) for the detailed derivation). The results that build the basis for the following CSA analysis are briefly recapped here in order to make the proposed work self-containing.2 The expression for $φx(g)*$ has been derived as
$φx(g)*≈Pfeas(x(g)¯,r(g)¯,σ(g)¯)r(g)¯x(g)¯σ(g)*¯cμ/μ,λ+[1-Pfeas(x(g)¯,r(g)¯,σ(g)¯)]×N1+ξ1-ξr(g)¯x(g)¯1+σ(g)*¯2N+ξ1+ξξr(g)¯x(g)¯σ(g)*¯cμ/μ,λ1+1ξ1+σ(g)*¯22N1+σ(g)*¯2N︸=:φx*infeas(g)$
(19)
in Spettel and Beyer (2018b) and the one for $φr(g)*$ has been derived as
$φr(g)*≈Pfeas(x(g)¯,r(g)¯,σ(g)¯)N1-1+σ(g)*¯2μN+[1-Pfeas(x(g)¯,r(g)¯,σ(g)¯)]N1-x(g)¯ξr(g)¯1-φx*infeas(g)N1+σ(g)*¯2μN1+σ(g)*¯2N$
(20)
in Spettel and Beyer (2018b). An approximation for the offspring feasibility probability has been derived in Spettel and Beyer (2018b) as
$Pfeas(x(g)¯,r(g)¯,σ(g)¯)≃Φ1σ(g)¯x(g)¯ξ-r¯,$
(21)
where $Φ(·)$ denotes the cumulative distribution function of the standard normal distribution. $φx*infeas(g)$ denotes the infeasible part of Equation (19). The constant $cμ/μ,λ$ is a so-called progress coefficient. A definition is given in Beyer (2001, Eq. 6.102, p. 247). It reads
$cμ/μ,λ:=λ-μ2πλμ∫t=-∞t=∞e-t2[Φ(t)]λ-μ-1[1-Φ(t)]μ-1dt.$
(22)

#### 3.1.2  Derivation of a Mean Value Difference Equation for $s1$

For $s1$, a mean value difference equation can be derived using the update rule from Line 19 of Algorithm 1. Computation of the expected value with $s1(g+1)¯:=E[s1(g+1)]$ directly yields
$s1(g+1)¯=(1-c)s1(g)¯+μc(2-c)E[(〈z˜(g)〉)1].$
(23)
$E[(〈z˜(g)〉)1]$ can be expressed with the progress rate in $x$-direction $φx$. From the definition of the $x$ progress rate,
$φx(g)=E[x(g)-x(g+1)]=E[x(g)-(x(g)+σ(g)(〈z˜(g)〉)1)]=-σ(g)¯E[(〈z˜(g)〉)1]$
(24)
follows. Therefore,
$E[(〈z˜(g)〉)1]=-φx(g)σ(g)¯$
(25)
holds. Using Equation (25) and Equation (14),
$s1(g+1)¯=(1-c)s1(g)¯+μc(2-c)-Nφx(g)σ(g)*¯r(g)¯$
(26)
follows.

#### 3.1.3  Derivation of a Mean Value Difference Equation for $s⊙$

The update rule from Line 19 of Algorithm 1 with consideration of Equation (7) can be used to derive a mean value difference equation for $s⊙$. The detailed derivation is provided in the supplementary material (Appendix A), available at https://www.mitpressjournals.org/doi/suppl/10.1162/evco_a_00261 leading to
$s⊙(g+1)¯≃(1-c)1+σ(g)*¯N-Nφr2(g)2σ(g)*¯r(g)¯2-σ(g)*¯2μs⊙(g)¯+μc(2-c)-Nφr2(g)2σ(g)*¯r(g)¯2+σ(g)*¯2μ.$
(27)

#### 3.1.4  Derivation of a Mean Value Difference Equation for $||s||2$

Again, using the update rule from Line 19 of Algorithm 1,
$||s(g+1)||2=||(1-c)s(g)+μc(2-c)〈z˜(g)〉||2$
(28)
$=(1-c)2||s(g)||2+2(1-c)μc(2-c)s(g)T〈z˜(g)〉+μc(2-c)||〈z˜(g)〉||2$
(29)
can be derived. For treating $s(g)T〈z˜(g)〉$, the vector $s(g)$ can be decomposed into a sum of vectors in direction of the cone axis $e1(g)$, in direction of the parental individual's $2..N$ components $e⊙(g)$, and in a direction $e⊖(g)$ that is orthogonal to $e1(g)$ and $e⊙(g)$. Formally, this can be written as
$s(g)=s1(g)e1(g)+s⊙(g)e⊙(g)+s⊖(g)e⊖(g),$
(30)
where $||e1(g)||=||e⊙(g)||=||e⊖(g)||=1$ and $e1(g)Te⊙(g)=e1(g)Te⊖(g)=e⊙(g)Te⊖(g)=0$. $s1(g)$, $s⊙(g)$, and $s⊖(g)$ denote the corresponding projections in those directions. Consequently,
$s(g)T〈z˜(g)〉=s1(g)(〈z˜(g)〉)1+s⊙(g)z⊙(g)+s⊖(g)e⊖(g)T〈z˜(g)〉$
(31)
and subsequently
$E[s(g)T〈z˜(g)〉]=E[s1(g)(〈z˜(g)〉)1]+E[s⊙(g)z⊙(g)]+E[s⊖(g)e⊖(g)T〈z˜(g)〉]$
(32)
follow. Taking into account the statistical independence between the cumulation path and a particular generation's mutations allows writing
$E[s(g)T〈z˜(g)〉]=E[s1(g)]E[(〈z˜(g)〉)1]+E[s⊙(g)]E[z⊙(g)]+E[s⊖(g)]E[e⊖(g)T〈z˜(g)〉]$
(33)
$≃E[s1(g)]E[(〈z˜(g)〉)1]+E[s⊙(g)]E[z⊙(g)].$
(34)
Again, $E[e⊖(g)T〈z˜(g)〉]$ vanishes because those mutations are selectively neutral and isotropic. Taking expectation of Equation (29), considering Equations (14), (25), (34), and (A.25), and using $E[||〈z˜(g)〉||2]≃Nμ$,
$||s(g+1)||2¯≃(1-c)2||s(g)||2¯+2(1-c)μc(2-c)×s1(g)¯E[(〈z˜(g)〉)1]+s⊙(g)¯E[z⊙(g)]+c(2-c)N$
(35)
$≃(1-c)2||s(g)||2¯+2(1-c)μc(2-c)×s1(g)¯-Nφx(g)σ(g)*¯r(g)¯+s⊙(g)¯-Nφr2(g)2σ(g)*¯r(g)¯2-σ(g)*¯2μ+c(2-c)N$
(36)
follows.

#### 3.1.5  Derivation of a Mean Value Difference Equation for $σ$

From the update rule of $σ$ in Line 20 of Algorithm 1, $σ(g+1)=σ(g)exp||s(g+1)||2-N2DN$ follows for the update of the mutation strength. Taking expected values and knowing that $σ(g)$ is constant w.r.t. $||s(g+1)||2$, this writes $σ(g+1)¯=σ(g)¯Eexp||s(g+1)||2-N2DN$. Assuming that the fluctuations of $||s(g+1)||2$ around its expected value are sufficiently small, the expected value can be pulled into the exponential function yielding
$σ(g+1)¯≃σ(g)¯exp||s(g+1)||2¯-N2DN.$
(37)

#### 3.1.6  Summary of the Mean Value Difference Equations

$x(g+1)¯=x(g)¯1-φx(g)*N$
(38)
$r(g+1)¯=r(g)¯1-φr(g)*N$
(39)
$s1(g+1)¯≃(1-c)s1(g)¯+μc(2-c)-Nφx(g)σ(g)*¯r(g)¯$
(40)
$s⊙(g+1)¯≃(1-c)1+σ(g)*¯N-Nφr2(g)2σ(g)*¯r(g)¯2-σ(g)*¯2μs⊙(g)¯+μc(2-c)-Nφr2(g)2σ(g)*¯r(g)¯2+σ(g)*¯2μ$
(41)
$||s(g+1)||2¯≃(1-c)2||s(g)||2¯+2(1-c)μc(2-c)×s1(g)¯-Nφx(g)σ(g)*¯r(g)¯+s⊙(g)¯-Nφr2(g)2σ(g)*¯r(g)¯2-σ(g)*¯2μ+c(2-c)N$
(42)
$σ(g+1)¯≃σ(g)¯exp||s(g+1)||2¯-N2DN$
(43)
$σ(g+1)*¯=Nσ(g+1)¯r(g+1)¯.$
(44)
The mean value dynamics of the $(3/3I,10)$-CSA-ES on the conically constrained problem are shown in Figure 3 for $N=400$, $ξ=10$, $c=1N$, and $D=1c$. The agreement of the simulations and the derived expressions is not perfect due to the asymptotic assumptions used. But the prediction comes near to the real behavior. In particular, one observes that the lines of the iteration with one-generation experiments are very similar to the lines generated by real ES runs. Consequently, the modeling of the system with Equations (38) to (44) is appropriate and the deviations for the theoretically derived expressions are mainly due to approximations in the derivations of the local progress rates. For this, refer to the additional figures provided in the supplementary material (Appendix B). They show a larger deviation for smaller values of $ξ$ and smaller values of $N$. But notice that in those figures the iteration with one-generation experiments for the local progress measures coincides well with the results of real ES runs. This again shows the appropriateness of the modeling in Equations (38) to (44). The deviations for small $N$ stem from asymptotic assumptions using $N→∞$. They help simplifying expressions resulting in a theoretical analysis that is tractable. The deviations for small $ξ$ are due to approximations in the derivation of the offspring cumulative distribution function after the projection step in $x1$-direction $PQ(q)$ (for the details, refer to Section 3.1.2.1.2.3 in Spettel and Beyer (2018c), in particular to the step from Equations (3.73) to (3.74)).
Figure 3:

Real run and approximation comparison of the $(3/3I,10)$-CSA-ES mean value dynamics ($N=400$, $ξ=10$). The agreement of the iteration with the theoretically derived expressions and the real ES runs is not perfect due to the asymptotic assumptions used. But the prediction comes near to the real behavior. In addition, the iteration with the one-generation experiments for the local progress rates is very similar to the mean value dynamics of the real ES runs. Consequently, the modeling of the system with Equations (38) to (44) is appropriate.

Figure 3:

Real run and approximation comparison of the $(3/3I,10)$-CSA-ES mean value dynamics ($N=400$, $ξ=10$). The agreement of the iteration with the theoretically derived expressions and the real ES runs is not perfect due to the asymptotic assumptions used. But the prediction comes near to the real behavior. In addition, the iteration with the one-generation experiments for the local progress rates is very similar to the mean value dynamics of the real ES runs. Consequently, the modeling of the system with Equations (38) to (44) is appropriate.

For the figures, results of 100 real runs of the ES have been averaged for generating the solid lines. The lines for the iteration by approximation have been computed by iterating the mean value iterative system (Equations (38) to (44)) with Equations (19), (20), and (A.21) for $φx(g)$ (and $φx(g)*$), $φr(g)$ (and $φr(g)*$), and $φr2(g)$, respectively. The lines for the iteration with one-generation experiments have been generated by iterating the system (Equations (38) to (44)) and simulating $φx(g)$ (and $φx(g)*$), $φr(g)$ (and $φr(g)*$), and $φr2(g)$. It can happen that in a generation of iterating the system (Equations (38) to (44)), infeasible $(x(g),r(g))T$ are created. In such circumstances, the corresponding $(x(g),r(g))T$ have been projected back.

### 3.2  Behavior of the ES in the Steady State

The goal of this section is to analyze the steady state behavior of the mean value iterative system that is summarized in Section 3.1.6. A working ES should steadily decrease $x$ and $r$ (Equations (38) and (39), respectively) in order to move toward the optimizer. For determining the steady state normalized mutation strength value, the fixed point of the system of nonlinear equations (Equations (40) to (44)) is to be computed.

Section 3.2.1 comprises a first step toward closed-form approximations for the steady state values of the system summarized in Section 3.1.6. Expressions are derived that finally lead to a steady state equation for the normalized mutation strength. A closed-form solution of that derived equation (Equation (62)) is not apparent. Hence, further assumptions for different cases are considered in Sections 3.2.2 and 3.2.3.

Section 3.2.2 considers $c=1/N$ that has been proposed in early CMA-ES papers such as Hansen and Ostermeier (1997). Section 3.2.3 uses $c=Θ1N$ proposed in newer publications (see Hansen and Ostermeier, 2001 and Hansen, 2016). Again, to arrive at closed-form expressions, asymptotic considerations ($N→∞$) are taken into account.

#### 3.2.1  Derivations Toward Closed-Form Steady State Expressions

To compute the fixed point of the system described by Equations (40) to (44), stationary state expressions $φxss*$, $φrss*$, $-Nφx(g)σ(g)*¯r(g)¯ss$, and $-Nφr2(g)2σ(g)*¯r(g)¯2ss$ for $φx(g)*$, $φr(g)*$, $-Nφx(g)σ(g)*¯r(g)¯$, and $-Nφr2(g)2σ(g)*¯r(g)¯2$, respectively, need to be derived first because they are dependent on the position in the parameter space. The bottom left subplot of Figure 3 shows that the ES moves in the vicinity of the cone boundary in the steady state. This can be seen because the dynamics of $x$ and $r$ are plotted by converting them into each other for the cone boundary case. Notice that those lines coincide in the steady state. In this situation, $Pfeas≃0$ for $N→∞$. This follows from Equation (21). By the cone boundary equation (Equation (2)), a parental individual $(x(g),r(g))T$ is on the cone boundary for $r(g)=x(g)/ξ$. Using this together with Equations (14) and (21) yields
$Pfeas≃ΦNσ(g)*r(g)(r(g)-r¯).$
(45)
By taking into account Equation (17),
$Pfeas≃ΦNσ(g)*1-1+σ(g)*2N1-1N$
(46)
follows. If $Nσ(g)*$ is sufficiently large, $Pfeas≃0$.
Steady state expressions for the distance ratio and the progress rates have already been derived in Spettel and Beyer (2018b). This work builds on those derived expressions. They are recapped here for making the present work self-containing. The steady state distance ratio reads
$xξrss=11+σss*2μN1+σss*2N$
(47)
$φr*ss=φx*ss≈N1+ξ-σss*22μN+σss*cμ/μ,λ1+ξ.$
(48)
A steady state expression for $-Nφx(g)σ(g)*¯r(g)¯$ is derived next. With Equations (12) and (48),
$-Nφxσ*rss=-xrssφx*ssσss*$
(49)
can be derived. Use of Equation (47) for the fraction $xrss$ results in
$-Nφxσ*rss=-ξ1+σss*2μN1+σss*2Nφx*ssσss*.$
(50)
Similarly, a steady state expression for $-Nφr2(g)2σ(g)*¯r(g)¯2$ can be derived. Considering the infeasible case (because in the steady state $Pfeas≃0$) of Equation (A.21), we have
$-Nφr22σ*r2ss=-N2σss*1-1ξr2ssE〈q〉infeas21+σss*2μN1+σss*2N.$
(51)
According to Lines 4 and 21 of Algorithm 1, $E〈q〉=Ex(g+1)=x(g+1)¯$. Hence, Equation (51) can be rewritten using Equation (15) for the infeasible $〈q〉$ case $E〈q〉infeas$ and Equation (47) for $x2ξr2ss$, resulting in
$-Nφr22σ*r2ss=-N2σss*1-1-φx*ssN2$
(52)
Using Equations (50) and (52), steady state expressions for Equations (40) to (44) can be derived. Requiring $s1(g+1)¯=s1(g)¯=s1ss$ in Equation (40) using Equation (50) yields
$s1ss=-μc(2-c)cξ1+σss*2μN1+σss*2Nφx*ssσss*.$
(53)
Analogously, requiring $s⊙(g+1)¯=s⊙(g)¯=s⊙ss$ in Equation (41) using Equation (52) results in
$s⊙ss=μc(2-c)-N2σ*ss1-1-φxss*N2+σss*2μc-(1-c)σss*N-N2σ*ss1-1-φxss*N2-σss*2μ.$
(54)
In the same way, setting $||s(g+1)||2¯=||s(g)||2¯=||s||ss2$ in Equation (42) using Equations (50) and (52) gives
$||s||ss2=N-2(1-c)μc(2-c)-2c+c2×s1ss-ξ1+σss*2μN1+σss*2Nφx*ssσss*+s⊙ss-N2σ*ss1-1-φxss*N2-σss*2μ.$
(55)
For the mutation strength,
$σ(g+1)*r(g+1)N=σ(g)*r(g)Nexp||s(g+1)||2-N2DN$
(56)
follows from Line 20 of Algorithm 1 with the use of Equation (14). Rewriting Equation (56) and using Equation (A.8) together with Equation (A.9) for the fraction $r(g)/r(g+1)$, we have
$σ(g+1)*≃σ(g)*11+2σ(g)*Nz⊙(g)+σ(g)*2μNexp||s(g+1)||2-N2DN$
(57)
$σ(g+1)*21+2σ(g)*Nz⊙(g)+σ(g)*2μN≃σ(g)*2exp||s(g+1)||2-NDN.$
(58)
Use of the Taylor expansion $exp(x)≃1+x$ (around zero and neglecting terms of quadratic and higher order) results in
$σ(g+1)*21+2σ(g)*Nz⊙(g)+σ(g)*2μN≃σ(g)*21+||s(g+1)||2-NDN.$
(59)
Computing the expectation of Equation (59) and requiring $σ(g+1)*¯=σ(g)*¯=σss*$, we get
$σss*212+1Nσss*E[z⊙]+σss*22μ=σss*212+||s||ss2-N2DN$
(60)
$σss*E[z⊙]+σss*22μ=||s||ss2-N2D.$
(61)
Usage of Equation (A.25) together with the steady state expression derived in Equation (52) for $E[z⊙]$ results in
$σss*-N2σ*ss1-1-φxss*N2-σss*2μ+σss*22μ=||s||ss2-N2D.$
(62)
Consideration of Equations (48) and (53) to (55) allows numerically solving Equation (62) for $σss*$. By doing this, the influence of the cumulation parameter can be further investigated. To this end, Figure 4 shows the influence of the cumulation parameter for the considered $(3/3I,10)$-CSA-ES for the dimensions 40 and 400. Results for dimensions 1000 and 10000 are provided in the supplementary material (Appendix C, Figure 23). The steady state normalized $x$ progress rate (Equation (48)) is plotted against the steady state normalized mutation strength for different values of $ξ$. Steady state values achieved for different settings of $c$ with $D=1c$ have been computed numerically (using Equation (62) with Equations (48) and (53) to (55)) and are shown with the filled black markers. A particular emphasis in the comparison is on the choices $c=Θ(1N)$ and $c=Θ(1N)$ that have been proposed in CMA-ES papers (see Hansen and Ostermeier, 1997, Hansen and Ostermeier, 2001, and Hansen, 2016, respectively).
Figure 4:

Influence of the cumulation parameter for the considered $(3/3I,10)$-CSA-ES for $N=40$ (top) and $N=400$ (bottom). For different values of $ξ$, the steady state progress rate is plotted against the steady state normalized mutation strength. The filled black markers indicate the steady state values achieved by different $c$ parameter settings with $D=1/c$.

Figure 4:

Influence of the cumulation parameter for the considered $(3/3I,10)$-CSA-ES for $N=40$ (top) and $N=400$ (bottom). For different values of $ξ$, the steady state progress rate is plotted against the steady state normalized mutation strength. The filled black markers indicate the steady state values achieved by different $c$ parameter settings with $D=1/c$.

Interestingly, the choice of $c=Θ(1N)$ results in steady state normalized $x$ progress values that are independent of $ξ$. They stay almost constant with increasing $ξ$ for the dimensions considered whereas for $c=Θ(1N)$ the steady state normalized $x$ progress values decrease with increasing $ξ$ for larger dimensions. For the considered dimensions, only for $N=40$ the choice $c=1N$ achieves constant progress with increasing $ξ$.

In addition, the choice of $c=Θ(1N)$ results in better steady state normalized $x$ progress compared to $c=Θ(1N)$ the higher $N$ and $ξ$ get. For example, for the configurations considered in the plots, the choice $c=μ+2N+μ+5$ results in the highest steady state $x$ progress for $N=400$ and $ξ≥25$. For $N=40$, the choices $c=1N$ and $c=2N$ for $ξ≥25$ and $ξ<25$, respectively, result in the highest steady state $x$ progress.

In order to analyze the influence of the parameters further theoretically, the aim of the following two sections is to derive closed-form expressions using asymptotic assumptions.

#### 3.2.2  Derivation of Closed-Form Approximations for the Steady State with the Assumptions $c=Θ(1N)$ and $N→∞$

The goal of this section is to simplify the expressions derived in Section 3.2.1 further using additional asymptotic assumptions in order to arrive at closed-form steady state approximations.

The expression derived for $-Nφr22σ*r2ss$ as Equation (52) is simplified further yielding
$-Nφr22σ*r2ss=-N2σ*ss1-1-φxss*N2=-N2σss*2φx*ssN-φx*ss2N2$
(63)
$=-φx*ssσ*ss+φx*ss22σ*ssN≃-φx*ssσ*ss.$
(64)
In Equation (64), $σ*ssN≫φx*ss2$ has been assumed and therefore the second summand has been neglected.
Insertion of Equation (64) into Equation (62) replacing $-N2σ*ss1-1-φxss*N2$ yields (after simplification)
$-φx*ss=||s||ss2-N2D$
(65)
for the steady state mutation strength equation. Equation (64) can also be inserted into Equation (54) replacing $-N2σ*ss1-1-φxss*N2$. This results in
$s⊙ss≃μc(2-c)-φx*ssσ*ss+σss*2μc-(1-c)σss*N-φx*ssσ*ss-σss*2μ=μc(2-c)-φx*ssσss*+σss*2μc-(1-c)-φx*ssN-σss*22μN.$
(66)
$=μc(2-c)-φx*ssσss*+σss*2μc+φx*ssN+σss*22μN-cφx*ssN-cσss*22μN.$
(67)
With the assumptions $N→∞$ and $c=Θ1N$, the expression $φx*ssN+σss*22μN-cφx*ssN-cσss*22μN$ is an order of magnitude smaller than c and can therefore be neglected w.r.t. c. Hence, Equation (67) simplifies to
$s⊙ss≃μc(2-c)c-φx*ssσss*+σss*2μ.$
(68)
Similarly, Equation (64) inserted into Equation (55) replacing $-N2σ*ss1-1-φxss*N2$ results in
$||s||ss2=N-2(1-c)μc(2-c)-2c+c2s1ss-ξ1+σss*2μN1+σss*2Nφx*ssσss*+s⊙ss-φx*ssσss-σss*2μ.$
(69)
Insertion of Equations (68) and (53) into Equation (69) yields (after straight-forward simplification)
$||s||ss2≃N+2(1-c)μcξ1+σss*2N1+σss*2μN+1φx*ss2σss*2-σss*24μ2.$
(70)
$ξ1+σss*2N1+σss*2μN≃ξ$ for $N→∞$ allows writing
$||s||ss2≃N+2(1-c)μcξ+1φx*ss2σss*2-σss*24μ2$
(71)
$=N+2(1-c)μccμ/μ,λ2-σss*cμ/μ,λμ1+ξ+σss*2(1+ξ)4μ2-σss*24μ2.$
(72)
From Equations (71) to (72), $φx*ss$ has been substituted by Equation (48), its square has been calculated, and the resulting expression has been simplified.
With this, insertion of Equations (48) and (72) into Equation (65) yields the quadratic equation
$σss*2(1+ξ)2μ-σss*cμ/μ,λ1+ξ=2(1-c)μ2cDcμ/μ,λ2-σss*cμ/μ,λμ1+ξ+σss*2(1+ξ)4μ2-σss*24μ2$
(73)
for the steady state normalized mutation strength equation. By solving Equation (73) for the positive root (because $σss*>0$) with subsequent simplification of the result we get
$σss*=2μξ+1cμ/μ,λ(cD+c-1)+c2D2+ξ+1-2c(ξ+1)+ξ+12cD-cξ+ξ$
(74)
as an asymptotic ($N→∞$) closed-form expression for the steady state normalized mutation strength. Insertion of $c=1/N$ and $D=1/c=N$ into Equation (74) results in the expression
$σss*=2μξ+1cμ/μ,λ1+1N-1+1+ξN+1N-2ξN-2N+ξ+12-ξN+ξ.$
(75)
Assuming $N→∞$ and $ξN→0$ allows a further asymptotic simplification of Equation (75) (neglecting $1N$, $ξN$, $1N$, $2ξN$, and $ξN$) resulting in
$σss*≃2μξ+1ξ+2cμ/μ,λξ+2=2μξ+1cμ/μ,λξ+2.$
(76)
For sufficiently large $ξ$, $ξ+1≃ξ+2$, and Equation (76) writes $σss*≃2μcμ/μ,λ$. Back-insertion of Equation (74) (or Equation (76)) into Equations (47), (48), and (53) to (55) allows calculating the steady state distance from the cone boundary, the normalized steady state progress, $s1ss$, $s⊙ss$, and $||s||ss2$.

Figure 5 shows plots of the steady state computations. Results computed by Equation (74) have been compared to real $(μ/μI,λ)$-CSA-ES runs for $λ=10$ and $μ∈{1,2,3}$. Further experimental results (including results for $μ∈{4,6,8}$) are provided in the supplementary material (Figures 15 to 18 in Appendix B). The values for the points denoting the approximations have been determined by computing the normalized steady state mutation strength $σss*$ using Equation (74) for different values of $ξ$. The results for $φx*$ and $φr*$ have been determined by using the computed steady state $σss*$ values with Equation (48). The approximations for $xξrss$ have been determined by evaluating Equation (47). The values for the points denoting the experiments have been determined by computing the averages of the particular values in real ES runs. The figures show that the derived expressions get better for larger values of $ξ$ and $N$. Again, the deviations for small $ξ$ are due to approximations in the derivation of the local progress rates. The deviations for small $N$ stem from the use of asymptotic assumptions $N→∞$.

Figure 5:

Steady state closed-form approximation and real run comparison of the $(μ/μI,λ)$-CSA-ES with repair by projection applied to the conically constrained problem for $c=1N$, $D=1c$, $λ=10$, $μ∈{1,2,3}$.

Figure 5:

Steady state closed-form approximation and real run comparison of the $(μ/μI,λ)$-CSA-ES with repair by projection applied to the conically constrained problem for $c=1N$, $D=1c$, $λ=10$, $μ∈{1,2,3}$.

#### 3.2.3  Derivation of Closed-Form Approximations for the Steady State with the Assumptions $c=Θ1N$ and $N→∞$

In Section 3.2.2, it has been assumed that $c=Θ(1N)$ from Equations (67) to (68). This section presents a derivation for the case $c=Θ(1N)$. To this end, Equation (66) is rewritten to
$s⊙ss≃μc(2-c)c1σss*-φx*ss+σss*22μ1+1N(1c-1)φx*ss+σss*22μ.$
(77)
With $c=Θ(1N)$, we have $1c≫1$. This together with the assumption $|φx*ss|≪σss*22μ$ allows rewriting Equation (77) to
$s⊙ss≃μc(2-c)cσss*2μ+σss*2cN.$
(78)
With the additional assumption $2μ≪σss*2cN$, Equation (78) simplifies to
$s⊙ss≃μc(2-c)ccNσss*.$
(79)
Insertion of Equations (79) and (53) into Equation (69) yields
$||s||ss2≃N-2(1-c)μc(2-c)(-2c+c2)cξ1+σss*2N1+σss*2μNφx*ss2σss*2+cNσss*-φx*ssσss*-σss*2μ$
(80)
$=N+2(1-c)μcξφx*ss2σss*2-cNφx*ssσss*2-cN2μ.$
(81)
In the step from Equations (80) to (81), $ξ1+σss*2N1+σss*2μN≃ξ$ for $N→∞$ has been used.
Insertion of Equations (48) and (81) into Equation (65) yields
$σss*2(1+ξ)2μ-σss*cμ/μ,λ1+ξ=2(1-c)μ2cDξcμ/μ,λ21+ξ-σss*cμ/μ,λμ1+ξ(1+ξ)+σss*2(1+ξ)24μ2-cNcμ/μ,λ1+ξσss*+cN(1+ξ)2μ-cN2μ$
(82)
for the steady state mutation strength equation. By assuming $c≪1$, Equation (82) simplifies. Together with grouping the powers of $σss*$ it writes
$σss*2(1+ξ)2μ-μξσss*2cD(1+ξ)24μ2-σss*cμ/μ,λ1+ξ+μξσss*cμ/μ,λcDμ1+ξ(1+ξ)-μξcμ/μ,λ2cD(1+ξ)-cN2cD(1+ξ)+cN2cD+μcNcμ/μ,λcD1+ξσss*=0.$
(83)
Introducing common denominators allows rewriting Equation (83) to
$σss*22cD(1+ξ)-ξcD(1+ξ)24μ+σss*1ξcμ/μ,λ-cμ/μ,λcD(1+ξ)cD1+ξ(1+ξ)+σss*0-2μξcμ/μ,λ2-cN+cN(1+ξ)2cD(1+ξ)+σss*-1μcNcμ/μ,λcD1+ξ=0.$
(84)
Simplification of Equation (84) using $cD=1$ and $cN≃1$ results in
$σss*22+ξ(1+ξ)24μ+σss*-cμ/μ,λ1+ξ(1+ξ)+ξ-2μξcμ/μ,λ22(1+ξ)+1σss*μcμ/μ,λ1+ξ=0.$
(85)
Note that multiplying Equation (85) by $σss*$ results in a cubic equation that can be solved. However, the expressions for the closed-form solutions are rather long. Hence, a quadratic equation is aimed for. To this end, Equation (85) is approximated quadratically. Neglecting $σss*-1μcμ/μ,λ1+ξ$ and $σss*-cμ/μ,λ1+ξ(1+ξ)$ in Equation (85) results in
$σss*22+ξ(1+ξ)24μ+ξ-2μξcμ/μ,λ22(1+ξ)=0.$
(86)
Solving Equation (86) for the positive root with subsequent simplification yields
$σss*≈2μξ(2μcμ/μ,λ2-1)(1+ξ)2+ξ.$
(87)

Figure 6 shows plots of the steady state computations comparing the $(μ/μI,λ)$-CSA-ES for $λ=10$ and $μ∈{1,2,3}$. Further experimental results (including results for $μ∈{4,6,8}$) are provided in the supplementary material (Figures 19 to 22 in Appendix B). Results computed by Equation (87) have been compared to real ES runs. The values for the points denoting the approximations have been determined by computing the normalized steady state mutation strength $σss*$ using Equation (87) for different values of $ξ$. The results for $φx*$ and $φr*$ have been determined by using the computed steady state $σss*$ values with Equation (48). The approximations for $xξrss$ have been determined by evaluating Equation (47). The values for the points denoting the experiments have been determined by computing the averages of the particular values in real ES runs. The rather large deviations for $φx*ss$ and $φr*ss$ for values of $ξ$ ranging from around 3 to 30 stem from the $σss*$ approximation (Equation (87)). A further investigation resulted in the plots shown in Figure 13 and Figure 14 in the supplementary material. They compare the numerical solution of Equation (62) with real ES runs and show a better agreement.

Figure 6:

Steady state closed-form approximation and real run comparison of the $(μ/μI,λ)$-CSA-ES with repair by projection applied to the conically constrained problem for $c=1N$, $D=1c$, $λ=10$, $μ∈{1,2,3}$.

Figure 6:

Steady state closed-form approximation and real run comparison of the $(μ/μI,λ)$-CSA-ES with repair by projection applied to the conically constrained problem for $c=1N$, $D=1c$, $λ=10$, $μ∈{1,2,3}$.

## 4  Discussion

The comparison of the mean value iterative system summarized in Section 3.1.6 with real ES runs shows that the theoretical prediction is not completely perfect because of the asymptotic assumptions used in the derivations (see Figure 3). For large $ξ$ and large $N$ the prediction comes near to the real behavior. The deviations for small $N$ are due to the asymptotic assumptions $N→∞$ that are used in the derivations of the microscopic and macroscopic aspects of the ES. They are used to simplify the expressions and thus make a theoretical analysis tractable. The deviations for small $ξ$ stem from the derivation of the offspring cumulative distribution function after the projection step in $x1$-direction $PQ(q)$ (for the details, refer to Section 3.1.2.1.2.3 in Spettel and Beyer (2018c), in particular to the step from Equations (3.73) to (3.74)). The same observations regarding the deviations can be made for the derived steady state expressions (see Figures 5 and 6).

For the steady state derivations, it is of particular interest to compare the results obtained in this work for the CSA-ES with the results obtained for the $σ$SA-ES. The $(μ/μI,λ)$-$σ$SA-ES has been theoretically analyzed by Spettel and Beyer (2018b) applied to the same conically constrained problem. In that work, the microscopic and macroscopic aspects of the $(μ/μI,λ)$-$σ$SA-ES have been investigated. For the microscopic aspects, expressions for the local progress for $x$ and $r$ and the self-adaptation response (SAR) function have been derived using asymptotic assumptions. These results have then been used for the macroscopic analysis. The mean value dynamics generated by iteration using those local measures have been compared to real runs. In addition, steady state expressions have been derived and discussed. They show that the $σ$SA-ES is able to achieve sufficiently high mutation strengths to keep the progress almost constant for increasing $ξ$. Surprisingly, for the CSA-ES, the choice of the cumulation parameter $c$ has a qualitative influence on the behavior. As detailed in the following paragraphs, of the two cases $c=1/N$ and $c=Θ1N$ considered, only the choice of $c=Θ1N$ results in a qualitative behavior of the CSA-ES that is similar to the $σ$SA-ES.

Considering the choice of $c=1/N$ proposed in early publications on CMA-ES (Hansen and Ostermeier, 1997), the steady state mutation strengths attained flatten with increasing $ξ$. As a consequence, the steady state progress decreases with higher values of $ξ$. This can be seen by considering Equation (76) that leads to $σss*≃2μcμ/μ,λ$ for sufficiently large $ξ$. For $μ=3$ and $λ=10$, this results in a steady state normalized mutation strength of approximately 6.39. Note that this value corresponds to the approximations for the larger values of $ξ$ shown in Figure 5 (subplot showing $σss*$ for $N=10000$). Equation (76) can be inserted into the steady state progress rate (Equation (48)) yielding
$φr*ss=φx*ss≈2μξ+1cμ/μ,λ2ξ+1ξ+2-4μ2(ξ+1)cμ/μ,λ2(ξ+2)(ξ+1)2μ=2μcμ/μ,λ2ξ+2-2μcμ/μ,λ2ξ+2.$
(88)
From the simplified result of Equation (88) one immediately notices that $φss*→0$ for $ξ→∞$ (respecting $ξN→0$ that was used in the derivations leading to Equation (76)). This is exactly what one sees in Figure 5, the stationary state progress decreases with increasing $ξ$.
In contrast, for the case $c=Θ1N$ that is proposed in newer publications ($4N+4$ by Hansen and Ostermeier (2001) or $μ+2N+μ+5$ by Hansen (2016), both of which are in $Θ1N$ for $μ$ independent of $N$), the steady state mutation strength increases with increasing $ξ$. It is therefore able to achieve a constant progress rate for increasing $ξ$. The steady state progress is less than that of the $σ$SA-ES. Due to the increase of $σss*$ with increasing $ξ$, the increasing deviations of the approximation from the simulations can be explained: In the derivations leading to Equation (A.10), it has been assumed that $μN≫σ(g)*2$. As the steady state $σ*$ increases with $ξ$, $N$ must be increased in order to have the same approximation quality for higher values of $ξ$. This can be explained more formally. Assuming large $ξ$, $(1+ξ)/(2+ξ)≃1$ holds in Equation (87). Hence,
$σss*≃ξ2μ(2μcμ/μ,λ2-1)$
(89)
follows, which, for large $ξ$, is of the same order as the one of the $σ$SA-ES (see Eq. (62) in Spettel and Beyer (2018b)). While it is common practice to use $c=Θ1N$ since the seminal CMA-ES paper (Hansen and Ostermeier, 2001), this is the first theoretical result that shows an advantage of the $Θ1N$ choice theoretically.

A further aspect for discussion is the special case of the non-recombinative $(1,λ)$-CSA-ES that is contained in the derivations for the $(μ/μI,λ)$-CSA-ES. The iterative system for the nonrecombinative ($μ=1$) case can be derived analogously to the multirecombinative ($μ>1$) case. The resulting equations differ in the expressions for $φx(g)$, $φr(g)$, and $φr2(g)$. It has been investigated further (not shown here) and for $N2(1+1/ξ)≫σ(g)*2$ the mean value iterative systems of the $(1,λ)$-CSA-ES and the $(μ/μI,λ)$-CSA-ES agree. An interesting observation between the case $μ=1$ and the case $μ>1$ is the evolution near the cone boundary. Whereas the CSA-ES with $μ=1$ evolves on the boundary, the CSA-ES with $μ>1$ attains a certain steady state distance from the boundary (refer to the subplots of Figures 5 and 6 showing $(xξr)ss$). This can be explained: Considering a parental individual on the cone boundary, the offspring are infeasible with overwhelming probability for sufficiently large $N$. Hence, they are repaired by projection and are on the boundary after projection. In particular, the best of them is on the boundary. Therefore, for $μ=1$, the ES evolves on the boundary. For $μ>1$, the centroid computation after projection results in offspring that are inside the feasible region due to its convexity.

## 5  Conclusion

In this work, the $(μ/μI,λ)$-CSA-ES has been theoretically analyzed. For this, a mean value iterative system has been introduced and compared to real ES runs. Based on this derived system, steady state expressions have been derived and compared to ES simulations. It has been shown that for $c=Θ1N$ a behavior that is similar to the $σ$SA can be achieved. For $c=1N$, the mutation strength achieved is not high enough to reach a similar steady state progress as for $σ$SA. Similar to the $σ$SA-ES, the CSA-ES evolves on the boundary for the non-recombinative case and it evolves at a certain distance from the boundary for the multirecombinative case.

To conclude the article, topics for future work are outlined. In addition to the $σ$SA and the CSA for the $σ$ control mechanism, it is of interest to investigate the behavior of Meta-ESs applied to the conically constrained problem. Meta-ESs can be used as a further approach to control the mutation strength. The main idea is to evolve the mutation strength by competition of two ESs running with different mutation strengths for an isolation period. A Meta-ES controls the mutation strength evolution by picking and mutating the mutation strength of the better performing ES.

Comparison of the repair by projection approach with other repair methods is another topic for further research. Analysis of ESs applied to other constrained problems is another research direction for the future. A possibility for a direct next step is the analysis of the conically constrained linear optimization problem for which the objective function gradient does not coincide with the cone axis (Arnold, 2014).

## Acknowledgments

This work was supported by the Austrian Science Fund FWF under grant P29651-N32.

## Notes

1

Note that the order statistic notation $m;λ$ is used to denote the $m$-th best (w.r.t. fitness) out of $λ$ values. The notation $(x)k$ is used to denote the $k$-th element of a vector $x$. It is equivalent to writing $xk$.

2

In the further considerations, the symbols “$≃$” and “$≈$” are used. Expressions in the form of $lhs≃rhs$ denote that $lhs$ is asymptotically equal to $rhs$ for given asymptotical assumptions (e.g. $N→∞$). The particular assumptions are stated explicitly for every use of “$≃$”. That is, in the limit case of the given assumptions, $lhs$ is equal to $rhs$. The form $lhs≈rhs$ is used for cases where $rhs$ is an approximation for $lhs$ with given assumptions that are not of asymptotical nature. In this sense, “$≈$” is weaker than “$≃$”.

## References

Arnold
,
D. V
. (
2011a
). Analysis of a repair mechanism for the (1, $λ$)-ES applied to a simple constrained problem. In
Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation
, pp.
853
860
.
Arnold
,
D. V
. (
2011b
). On the behaviour of the (1, $λ$)-ES for a simple constrained problem. In
Proceedings of the 11th Workshop Proceedings on Foundations of Genetic Algorithms
, pp.
15
24
.
Arnold
,
D. V
. (
2013
). On the behaviour of the (1, $λ$)-ES for a conically constrained problem. In
Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation
, pp.
423
430
.
Arnold
,
D. V
. (
2014
).
On the behaviour of the (1, $λ$)-ES for conically constrained linear problems
.
Evolutionary Computation
,
22
(
3
):
503
523
.
Arnold
,
D. V.
, and
Porter
,
J
. (
2015
). Towards an augmented Lagrangian constraint handling approach for the (1+1)-ES. In
Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO)
, pp.
249
256
.
Atamna
,
A.
,
Auger
,
A.
, and
Hansen
,
N
. (
2016
). Augmented Lagrangian constraint handling for CMA-ES—Case of a single linear constraint. In
International Conference on Parallel Problem Solving from Nature
, pp.
181
191
.
Atamna
,
A.
,
Auger
,
A.
, and
Hansen
,
N
. (
2017
). Linearly convergent evolution strategies via augmented Lagrangian constraint handling. In
Proceedings of the 14th ACM/SIGEVO Conference on Foundations of Genetic Algorithms
, pp.
149
161
.
Beyer
,
H.-G
. (
2001
).
The theory of evolution strategies
.
Berlin, Springer
:
Natural Computing Series
.
Beyer
,
H.-G.
, and
Finck
,
S
. (
2012
).
On the design of constraint covariance matrix self-adaptation evolution strategies including a cardinality constraint
.
IEEE Transactions on Evolutionary Computation
,
16
(
4
):
578
596
.
Hansen
,
N.
(
2016
).
The CMA evolution strategy: A tutorial
.
.
Hansen
,
N.
, and
Ostermeier
,
A.
(
1997
).
Convergence properties of evolution strategies with the derandomized covariance matrix adaptation: The ($μ/μI$, $λ$)-CMA-ES
.
Eufit
,
97:650
654
.
Hansen
,
N.
, and
Ostermeier
,
A
. (
2001
).
Completely derandomized self-adaptation in evolution strategies
.
Evolutionary Computation
,
9
(
2
):
159
195
.
Hellwig
,
M.
, and
Arnold
,
D. V
. (
2016
).
Comparison of constraint-handling mechanisms for the (1, $λ$)-ES on a simple constrained problem
.
Evolutionary Computation
,
24
(
1
):
1
23
.
Spettel
,
P.
, and
Beyer
,
H.-G.
(
2018a
).
Analysis of the $(1,λ)$-$σ$-self-adaptation evolution strategy with repair by projection applied to a conically constrained problem
.
Theoretical Computer Science
.
http://www.sciencedirect.com/science/article/pii/S0304397518306571
Spettel
,
P.
, and
Beyer
,
H.-G.
(
2018b
).
Analysis of the $(μ/μI,λ)$-$σ$-self-adaptation evolution strategy with repair by projection applied to a conically constrained problem
.
.
Spettel
,
P.
, and
Beyer
,
H.-G.
(
2018c
).
Analysis of the $(1,λ)$-$σ$-self-adaptation evolution strategy with repair by projection applied to a conically constrained problem
.
Spettel
,
P.
,
Beyer
,
H.-G.
, and
Hellwig
,
M.
(
2018
).
A covariance matrix self-adaptation evolution strategy for optimization under linear constraints
.
IEEE Transactions on Evolutionary Computation
, p.
1
.