## Abstract

We study the behaviour of multi-recombination evolution strategies for the problem of maximising a linear function with a single linear constraint. Two variants of the algorithm are considered: a strategy that resamples infeasible candidate solutions and one that applies a simple repair mechanism. Integral expressions that describe the strategies’ one-generation behaviour are derived and used in a simple zeroth order model for the steady state attained when operating with constant step size. Applied to the analysis of cumulative step size adaptation, the approach provides an intuitive explanation for the qualitative difference in the algorithm variants’ behaviour. The findings have implications for the design of constraint handling techniques to be used in connection with cumulative step size adaptation.

## 1 Introduction

The number of constraint handling techniques used in evolutionary algorithms is large and growing. Approaches range from penalty methods and repair mechanisms to algorithms based on ideas from multi-objective optimisation, and they include those described by Oyman et al. (1999), Runarsson and Yao (2000, 2005), Mezura-Montes and Coello Coello (2005), and Kramer and Schwefel (2006). A survey of techniques was compiled by Coello Coello (2008). Knowledge with regard to the relative capabilities of the approaches is almost always gained by comparing their performances on sets of test functions that are believed to be difficult. The influence of the choice of those functions as well as of parameters implicit in the performance criteria is often not obvious. The results are not easy to interpret. If a mechanism fails, it often remains unclear why. Likewise, it is not always straightforward what “good” performance for an involved problem means. Test function generators such as the one proposed by Michalewicz et al. (2000) that automatically generate test functions with parameters controlling properties such as the connectedness of the feasible region and the degree of multimodality of the objective provide some extra insights with regard to scaling properties. However, the functions generated are too complex to afford an understanding of algorithm behaviour on a microscopic level.

The approach pursued here is complementary to benchmarking studies on large testbeds. We study the behaviour of algorithms on a very simple class of test functions for which analytically based findings can be derived. The results offer the advantage of being easy to interpret and reveal scaling properties and the influence of parameters on optimisation performance. It has been known at least since the early work of Schwefel (1975) that the presence of constraints can cause step size adaptation mechanisms to fail even for the simplest of problems. Understanding the interplay of step size adaptation mechanisms and constraint handling techniques on a microscopic level will benefit the developer of evolutionary algorithms for constrained optimisation.

A sizable number of studies pursue a similar approach in the realm of unconstrained evolutionary optimisation. Much less such work has been published in constrained settings. Among the sparse references is the early work by Rechenberg (1973), who studies the performance of the (1+1)-ES^{1} for the axis-aligned corridor model, and upon which the one-fifth success rule for step size adaptation is partly based. Schwefel (1975) considers the performance of the -ES in the same environment. Beyer (1989) analyses the behaviour of the (1+1)-ES for a constrained discus-like function. All of those analyses have in common that the normal vectors of the constraint planes are oriented such that they are perpendicular to the gradient vector of the objective function.

More recent work considers constrained problems that do not have that specific property. Arnold and Brauer (2008) study the behaviour of the (1+1)-ES for a linear problem with a single linear constraint of general orientation. They describe the distance of the parental candidate solution from the constraint plane and investigate its limit behaviour using two simple statistical models. Schwefel (1975) points out that using the one-fifth rule, the presence of constraints may result in the step size being reduced in situations where the angle between the gradient direction and the normal vector of the active constraint is small, leading to convergence to a nonsingular point. The findings derived by Arnold and Brauer (2008) provide a quantitative confirmation of this. Arnold (2011b) and Arnold (2011a) study the behaviour of the -ES with cumulative step size adaptation in the same environment. The former reference considers the case that infeasible candidate solutions are resampled until they are feasible; the latter one uses a simple repair mechanism that orthogonally projects infeasible candidate solutions onto the boundary of the feasible region. It is found that cumulative step size adaptation may fail in scenarios similar to those where the success probability based mechanism fails, though for different reasons, and that employing the repair mechanism results in more robust performance in the sense that convergence to a nonsingular point is avoided, provided that the cumulation parameter used in the step size adaptation mechanism is set appropriately.

In contrast to the work in Arnold (2011a, 2011b), this paper considers the behaviour of evolution strategies employing nonsingleton populations and recombination. The results derived earlier appear as special cases. Being able to study the influence of selection pressure (quantified by the truncation ratio) on algorithm performance adds a new dimension to the analysis. The remainder of the paper is organised as follows. Section 2 briefly formalises the algorithms and problem and introduces notational conventions used throughout the paper. Section 3 investigates the behaviour of the strategy for fixed step size employing a simple zeroth-order model for the distribution of distances from the constraint plane. Section 4 considers cumulative step size adaptation. Section 5 concludes with a brief discussion and directions for future research. Calculations that underlie Sections 3 and 4 but that are not essential to the understanding of the analysis have been relegated to Appendices A and B in order to improve readability.

## 2 Problem and Algorithms

^{2}a linear function , , with a single linear constraint. We assume that the gradient vector of the objective function forms an acute angle with the normal vector of the constraint plane. Without loss of generality, we choose a Euclidean coordinate system with its origin located on the constraint plane, and with its axes oriented such that the

*x*

_{1}axis coincides with the gradient direction , and the

*x*

_{2}axis lies in the two-dimensional plane spanned by the gradient vector and the normal vector of the constraint plane. The angle between those two vectors is denoted by as illustrated in Figure 1(a), and it is referred to as the constraint angle. Constraint angles of interest are in . The unit normal vector of the constraint plane expressed in the coordinate system described above is . The signed distance of a point from the constraint plane is thus , resulting in the optimisation problem for some constant . Notice that due to the choice of coordinate system, variables enter neither the objective function nor the constraint inequality. While simple, the linear environment models other constrained optimisation problems in the immediate vicinity of the constraint plane as illustrated in Figure 1(b).

*t*denotes iteration number, while index

*i*numbers offspring. The loop in lines 1 through 14 implements mutation, resampling, and repair. samples from an

*N*-dimensional normal distribution with mean

**x**and covariance matrix , and denotes the identity matrix. Selection and recombination are implemented in lines 15 and 16. Lines 17 and 18 update the search path and mutation strength, respectively, and implement cumulative step size adaptation as proposed by Ostermeier et al. (1994).

^{3}Constant is referred to as the cumulation parameter and determines the effective length of the memory implemented in the search path, and

*D*>0 is a damping constant that determines the magnitude of the mutation strength updates. Clearly, the sign of determines whether the mutation strength is increased or decreased. The underlying idea is that long search paths indicate that the steps made by the strategy point predominantly in one direction and could beneficially be replaced with fewer but longer steps. A short search path indicates that the strategy steps back and forth and that it should operate with a smaller step size. The search path has an expected squared length of

*N*if in the unconstrained setting offspring candidate solutions are selected at random (i.e., the ordering of offspring candidate solutions in line 15 of Algorithm 1 is random), in which case no change in mutation strength is effected. It is not unexpected that in constrained settings cumulative step size adaptation may fail under some conditions. When combined with the adaptive encoding mechanism described by Hansen (2008), the algorithm considered here in the unconstrained case in essence yields the CMA-ES as proposed by Hansen and Ostermeier (2001).

## 3 Behaviour for Fixed Mutation Strength

*i*th offspring candidate solution after line 12 in Algorithm 1 (i.e., after resampling or repair). We are interested in the evolution of the normalised distance of the population centroid from the constraint plane. The update of that distance resulting from a single iteration of Algorithm 1 is described by where, letting denote the index of the

*k*th best of the offspring generated, is the average of the mutation vectors corresponding to the offspring candidate solutions selected for survival. From Equation (1), where

*p*

^{(avg)}

_{1,2}is the (time independent) joint density of the first and second components

*z*

^{(avg,t)}

_{1}and

*z*

^{(avg,t)}

_{2}of

**z**

^{(avg,t)}conditional on the normalised distance of the population centroid from the constraint plane. Using Leibniz's integral rule to compute the derivative with respect to

*x*yields for the probability density of the normalised distance from the constraint plane after a step conditional on that distance before the step. The probability density of the normalised distance from the constraint plane at time step

*t*+1 is thus where is the density of values at time step

*t*. As a result of iterating the update in Equation (1), the distribution of normalised distances from the constraint plane approaches a stationary limit distribution the density of which is a fixed point of the mapping in Equation (2).

## 4 Mutation Strength Adaptation

We now abandon the assumption that the mutation strength of the strategy is fixed. For the constrained linear environment considered here, the -ES with cumulative step size adaptation does not assume a stationary limit state. The mutation strength is either increased or decreased indefinitely. As decreasing mutation strengths lead to stagnation and convergence to a nonsingular point, increasing is desirable. Figure 3 shows four runs of the (3/3, 10)-ES with cumulative step size adaptation for each of the two constraint handling approaches. In all cases , *N*=2, *c*=0.5, **s**^{(0)}=**0**, and . While the mutation strength and objective function value of the population centroid increase exponentially quickly for the strategy that uses the repair mechanism, the strategy that employs the resampling approach decreases its mutation strength exponentially quickly and objective function values stagnate. It will be seen below that a lower setting for the cumulation parameter would enable the resampling strategy to avoid stagnation.

**s**(at least and components

*s*

_{1}and

*s*

_{2}). Rather than attempting to solve the difficult problem of examining its dynamic behaviour, we consider the logarithmic adaptation response of the strategy operating out of a stationary state. A positive logarithmic adaptation response indicates that the mutation strength is increased; a negative logarithmic adaptation response implies a decrease in step size. The expected logarithmic adaptation response depends on the normalised distance of the parental centroid from the constraint plane as well as on the search path. In order to predict whether the strategy will converge or diverge, we assume that it has been run with a fixed step size until time step

*t*, where

*t*is large enough for the strategy to operate in the limit state considered in Section 3, and we compute the expected logarithmic adaptation response at time step

*t*+1. Experiments show that the resulting predictions of the strategy's dynamic behaviour are accurate across a wide range of parameter settings.

**s**

^{(0)}=

**0**, from line 17 in Algorithm 1 it follows that the search path at time step

*t*+1 can be expressed as Its expected squared length can be computed from the squares of its components. Due to the assumption that the strategy is operating out of a stationary state, the distribution of the

*z*

^{(avg,t)}

_{i}is independent of time. Omitting time superscripts, using the identities and for the geometric series, and for brevity writing for the

*j*th moment about zero of

*z*

^{(avg)}

_{i}yields in the limit of large

*t*. Components of mutation vectors are independently standard normally distributed and thus the

*z*

^{(avg)}

_{i}for

*i*>2 are independently normally distributed with mean zero and variance . Thus,

*e*

_{i,1}=0 and for . As a result, from Equation (5) it follows that Note that this result depends on the search space dimensionality only through the cumulation parameter

*c*, which is usually set to an

*N*-dependent value.

Equations for *e*_{i,j} for are derived in Appendices A and B. Numerically evaluating Equations (19), (20), (21), and (22) using the Dirac model for the average distance of the population centroid from the constraint plane yields the curves in Figure 4. Figure 4(a) shows the same results as Figure 2(d), but with logarithmic scales for both axes. Comparison of the curves with the values obtained experimentally in runs of the respective evolution strategies suggests good visual agreement for the first moments *e*_{1,1} and *e*_{2,1}. The quality of the approximation relying on the Dirac model is not as good for moments *e*_{1,2} when constraint angles are highly acute. However, the values in that scenario are generally small and have limited impact on the results below. The agreement of values for *e*_{2,2} derived based on the Dirac model with experimental results deteriorates with increasing constraint angle but appears visually good for small values of .

*c*. For a given constraint angle and values of

*c*close enough to zero, the term that involves the squared first moments is large enough for the sum of the first two terms to outweigh the negative third term, ensuring positive expected logarithmic adaptation response and thus an expected increase in the step size. As

*c*approaches 1.0, however, the second term in Equation (6) decreases to zero. Depending on the magnitude of the first term, the expected logarithmic adaptation response may be negative, resulting in decreasing step sizes. Demanding that be nonnegative yields the condition for the cumulation parameter. The maximum value of

*c*for which convergence to a nonstationary point is avoided (obtained by considering equality in Equation (7), where the

*e*

_{i,j}are computed using the Dirac model for the average distance from the constraint plane) is illustrated in Figure 5. It can be seen from Figures 5(a) that for the strategy that resamples infeasible offspring candidate solutions, that value decreases to zero as the constraint angle becomes increasing acute. As a result, the strategy will converge to a nonstationary limit point, no matter what the value of

*c*is. The strategy that repairs infeasible candidate solutions avoids this type of behaviour for any , as the maximum value of

*c*for which convergence is avoided tends toward a finite limit value. That limit value is shown in Figure 5(b) for several values of the population size parameters and . As , values exceeding 1.0 are not shown. It can be seen that for a fixed truncation ratio, larger values of allow larger values for

*c*and thus faster adaptation. Only for truncation ratios exceeding one-half do admissible maximum values of

*c*become very small.

An intuitive explanation for the failure of cumulative step size adaptation for small constraint angles in the strategy that resamples infeasible offspring candidate solutions can be derived from Figure 4. In the limit of very small constraint angles, both *e*_{1,1} and *e*_{1,2} assume very small values for both strategies as both stay in the immediate vicinity of the constraint plane (compare Figures 2(b) and 4(a) and 4(b)). In order to have positive expected logarithmic adaptation response, the terms in Equation (6) that involve the moments *e*_{2,1} and *e*_{2,2} of the step in the direction of the *x*_{2} axis must be large enough to compensate for the lack of a contribution to the expected squared length of the search path from the components in the direction of the *x*_{1} axis. The strategy that resamples infeasible offspring in the limit of very small constraint angles performs an almost random walk in the direction of the *x*_{2} axis, resulting in and (compare Figures 4(e) and 4(f)), which is insufficient to ensure positive expected logarithmic adaptation response unless *c* is small enough to sufficiently amplify the term involving *e*^{2}_{1,1} in Equation (6).

The strategy that repairs infeasible offspring does not perform a random walk in the direction of the *x*_{2} axis. In the limit of very small , on average, half of the offspring are infeasible and will be projected onto the constraint plane. On average, half of those that are projected have a negative *z*_{2} component and are likely to prevail under selection as their *z*_{1} components are positive. Unless is very small or the truncation ratio is large, the majority of mutation vectors being averaged through recombination will thus point in the direction of the negative *x*_{2} axis and lead to relatively large values of *e*^{2}_{2,1} and *e*_{2,2} that ensure positive expected logarithmic adaptation response (compare Figures 4(e) and 4(f)). It is important to note that this is not a consequence of a bias built into the step size adaptation mechanism as a result of changing the length of individual steps through projection. It is easy to see that projection of mutation vectors never increases their length, and that thus any bias as a result of projection will be toward smaller step sizes, not toward larger ones. Positive expected logarithmic adaptation response is a result of the coherence of the selected steps in the direction of the negative *x*_{2} axis, not the length of the mutation vectors.

Finally, the accuracy of the predictions made on the basis of the Dirac model remain to be verified. The lines in Figure 6 were obtained from Equation (7) for several population size parameter settings. For and constraint handling by repair, the values computed consistently exceed 1.0, resulting in no curves in the respective plots. Figure 7 shows limit values for obtained from the same equation. Each dot in either figure marks the results from 10 independent runs of the respective evolution strategies with cumulative step size adaptation. Each run was initialised with , , and **s**^{(0)}=**0** and uses *N*=2 and *D*=1.0. We ran further experiments with other settings for the search space dimensionality and damping parameter and observed virtually identical results. A run was terminated when the mutation strength reached a value either smaller than 10^{-20} or greater than 10^{20}. The very small step sizes in the former case suggest that stagnation is likely to ensue while the large step sizes in the latter case indicate continued progress at increasingly faster rates. The + symbol in the plots indicates that at least nine out of the 10 runs terminated with ; the × symbol indicates that upon termination in at least nine out of the 10 runs. If neither symbol is present at a grid location in the plots, then either termination criterion was satisfied in at least two out of the 10 runs. It can be seen that the agreement of predictions based on Equation (7) with experimentally observed values is very good across the entire range of population size parameters and constraint angles considered.

## 5 Summary and Conclusions

We studied the behaviour of the -ES with cumulative step size adaptation applied to a linear problem with a single linear inequality constraint. Two constraint handling techniques were considered: the resampling of infeasible candidate solutions, and their repair through projection onto the constraint plane. We have argued that while it is simple, the scenario of a linear function with a single linear constraint is an interesting one to consider as it reflects microscopic properties of more general constrained, differentiable problems.

Using a simple model, we derived an approximation to the average distance of the population centroid from the constraint plane if the strategy is run with fixed mutation strength. We then used that approximation in the context of adaptive step sizes and derived an expression for the expected logarithmic adaptation response of the strategy if it operates out of a stationary state. Experiments showed that predictions of whether the strategy is capable of sustained progress or converges to a nonsingular point can be made using those results across a wide range of population size parameter settings and constraint angles. Most interestingly, the findings revealed a qualitative difference between the behaviours resulting from the two constraint handling mechanisms for highly acute constraint angles. While the strategy that resamples infeasible candidate solutions will converge to a nonsingular point for any setting of the cumulation parameter *c*, the strategy that employs a simple repair mechanism is capable of sustained progress as long as that parameter is set to not too large a value as governed by Equation (7). The difference in the behaviour of the two strategy variants has been explained by considering the components of the search path in the two-dimensional subspace spanned by the gradient of the objective function and the normal vector of the constraint plane. For small constraint angles, both constraint handling mechanisms result in very small steps in the direction of the gradient vector being made. However, while for the strategy that resamples infeasible offspring steps in the perpendicular direction become increasingly random as decreases, they tend to point in the same direction for any for the strategy that applies a repair mechanism, resulting in longer search paths and consequently in a systematic increase of the step size.

Clearly, these results have strong implications for the choice of constraint handling approach to be used in combination with cumulative step size adaptation. They also provide guidance to the algorithm engineer developing new constraint handling techniques, suggesting that if step size adaptation is to be based on the (squared) length of the search path, the impact of constraint handling on that path needs to be considered. It is desirable to extend the analyses presented here to investigate properties of further constraint handling techniques as well as of other step size adaptation mechanisms, such as mutative self-adaptation. Future work will also consider nonlinear objective functions where the local constraint angle is not constant, nonlinear constraints, and problems with multiple constraints.

## Acknowledgments

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).

## References

## Appendix A Single Step Behaviour: Mutation

This appendix computes the marginal density of the *z*_{1} components as well as the first two moments about zero of the *z*_{2} components of mutation vectors for both constraint handling techniques. Both will be used in Appendix B in order to compute the corresponding moments after selection and averaging. Superscripts denoting time or indexing offspring candidate solutions are omitted as only individual offspring are considered.

### A.1 Resampling

*z*

_{1}and

*z*

_{2}components for the strategy that resamples infeasible offspring is a truncated normal density described by where denotes the cumulative distribution function of the standard normal distribution. The normalising factor in the denominator is the probability of an offspring being feasible. The marginal density of the

*z*

_{1}component of feasible mutation vectors is thus The corresponding cumulative distribution function cannot be expressed in closed form.

### A.2 Repair

*z*

_{1}component less than some

*x*is obtained by integration of the initial density as where equals one if cond holds true and zero otherwise. Computing the derivative with respect to

*x*yields after some straightforward simplifications Together with the contribution from the initially feasible offspring, the marginal density of

*z*

_{1}components is thus Integrating with respect to

*x*yields as the corresponding cumulative distribution function.

*z*

_{2}components of mutation vectors after repair conditional on

*z*

_{1}=

*x*. The probability of an offspring being infeasible and thus in need of repair conditional on

*z*

_{1}=

*x*is obtained by computing the relative weight of the second term in the marginal density

*p*

^{repair}

_{1}(

*x*) in Equation (12) and equals Initially feasible candidate solutions are not projected and have moments described by Equations (10) and (11). Initially infeasible candidate solutions after projection have

*z*

_{2}component and thus and Weighting terms with the probabilities of their occurrence yields after some straightforward simplifications and for the first two moments about zero of the

*z*

_{2}component of mutation vectors after repair conditional on

*z*

_{1}=

*x*.

## Appendix B Single Step Behaviour: Selection and Recombination

*z*

^{(avg)}

_{i}for and both the cases of resampling and repair. Due to the choice of coordinate system, is the th order statistic of a sample of independent realisations of a random variable distributed according to Equation (9) for the case of resampling and according to Equation (13) for the case of repair. The probability density function of the

*z*

_{1}component of the

*k*th best of the offspring is thus where and superscripts indicating the type of constraint handling have been omitted as the formula holds in both cases. The joint density of the

*z*

_{1}components of the

*k*th and

*l*th best of the offspring is where and .

### B.1 Computing First Moments

*z*

^{(avg)}

_{1}equals Identity where formally

*Q*

_{0}=0, , , and , holds for integer and for any integers and with and real numbers

*Q*, (see Arnold, 2002, p. 113). Using it with to replace the summation with an integration yields A change of variables by defining

_{k}*z*=

*P*

_{1}(

*y*) and subsequently exchanging the order of the integrations results in Computing the inner integral

*I*

_{1}(

*y*) using Equation (8) for the case of resampling and Equation (12) for the case of repair is straightforward. The results of the calculations are given in Figure B1.

### B.2 Computing Second Moments

*z*

^{(avg)}

_{1}equals Using Equation (18) with and yields Changing variables and exchanging the order of the integrations analogously to Appendix B.1 results in Considering

*K*

_{1}(

*y*), exchanging the order of the integrations and subsequently renaming the variables yields Adding the integral expressions in the first and third rows results in and thus

*K*

_{1}(

*y*)=(

*I*

_{1}(

*y*))

^{2}/2. Similar calculations for the second moment about zero of

*z*

^{(avg)}

_{2}yield where analogously to the above

*K*

_{2}(

*y*)=(

*I*

_{2}(

*y*))

^{2}/2. The solutions for the inner integrals

*J*(

_{i}*y*) for are again collected in Figure B1.

## Notes

^{1}

See Beyer and Schwefel (2002) for an overview of evolution strategy terminology.

^{2}

Strictly speaking, the task is one of amelioration rather than maximisation, as a finite maximum does not exist. We do not make that distinction here.

^{3}

The update in Algorithm 1 differs from the original prescription in that Ostermeier et al. (1994) adapt the mutation strength based on the length of the search path rather than on its squared length. With appropriately chosen parameters, both variants behave similarly. Basing the update on the squared length simplifies the analysis.