## Abstract

This paper investigates constraint-handling techniques used in nonelitist single-parent evolution strategies for the problem of maximizing a linear function with a single linear constraint. Two repair mechanisms are considered, and the analytical results are compared to those of earlier repair approaches in the same fitness environment. The first algorithm variant applies reflection to initially infeasible candidate solutions, and the second repair method uses truncation to generate feasible solutions from infeasible ones. The distributions describing the strategies’ one-generation behavior are calculated and used in a zeroth-order model for the steady state attained when operating with fixed step size. Considering cumulative step size adaptation, the qualitative differences in the behavior of the algorithm variants can be explained. The approach extends the theoretical knowledge of constraint-handling methods in the field of evolutionary computation and has implications for the design of constraint-handling techniques in connection with cumulative step size adaptation.

## 1 Introduction

In the field of evolutionary algorithms, approaches to constrained optimization problems make use of different types of constraint-handling techniques. These include penalty methods, repair mechanisms, and methods based on principles of multiobjective optimization. In this context the work of Oyman et al. (1999), Runarsson and Yao (2000), Mezura-Montes and Coello Coello (2005), and Kramer and Schwefel (2006) should be referenced. An overview of constraint-handling techniques is provided in Mezura-Montes and Coello Coello (2011).

Different approaches are usually evaluated by direct comparison of the performances on test functions considered to be difficult. A disadvantage is that the effects of the test environment on the performance of the algorithm are often difficult to interpret. Also, the configuration of specific strategy parameters is not obvious. That is, complex test functions do not contribute insight into the behavior of the algorithm on a microscopic level. In contrast to benchmark studies on large testbeds, this work concentrates on the analysis of the behavior of algorithms on a very simple class of test functions. Simple test environments often provide a description of local regions within more complex environments. This way, analytical results can be obtained, allowing for a deeper understanding of the influence of strategy parameters on algorithm performance.

Early contributions to constrained optimization with evolution strategies (ES) include the work of Rechenberg (1973), who investigated the performance of the -ES for the axis-aligned corridor model. The same environment has been studied by Schwefel (1975) examining the behavior of the -ES, and by Beyer (1989) investigating the dynamics of the -ES for a constrained discus-like function.

Recent work (see Arnold [2011a, 2011b, 2013]) examines the behavior of nonelitist ES using cumulative step size adaptation (CSA) on a linear problem with a single linear constraint. Two constraint-handling techniques are compared. It is found that the resampling of infeasible candidate solutions results in premature convergence of the strategy in the face of small constraint angles, that is, small angles between the objective function’s gradient direction and the normal vector of the constraint plane. This is due to short search paths that result in a systematic reduction of the step size. Handling constraints by projecting infeasible solutions onto the boundary of the feasible region allows setting the parameters of CSA in such a way that premature convergence is avoided for any constraint angle. This is due to successful projected candidate solutions predominantly lying in one direction, resulting in long search paths.

Further repair mechanisms exist. Helwig et al. (2013), among other methods, suggest a reflection method as well as a hyperbolic approach, in conjunction with particle swarm optimization in order to deal with infeasible candidate solutions. These approaches serve as the basis of the repair methods considered here. The first one reflects initially infeasible candidate solutions into the feasible region. The other repair mechanism truncates the mutation vector of infeasible offspring in such a way that the offspring is then located on the boundary of the feasible region.

The goal of this paper is to analyze these constraint-handling techniques in the context of the linear optimization problem suggested by Arnold (2013) and to examine their performance relative to those repair mechanisms considered in that paper. Thereby the theoretical analysis of the behavior of CSA-ES is extended by the analysis of two additional repair mechanisms. This reveals further insight into the potential of constraint-handling techniques to maintain a sufficient population diversity in order to prevent premature convergence when approaching a constraint boundary.

The paper is organized as follows. In Section 2 the optimization problem, the algorithm, and the two suggested constraint-handling techniques are described. Section 3 investigates the behavior of the ES assuming that the mutation strength is kept constant. A simple zeroth-order model is established to characterize the average distance from the constraint plane. Section 4 considers cumulative step size adaptation within the strategy. The paper concludes with a discussion of the results and suggestions for future research in Section 5. For reasons of clarity and comprehensibility calculations that underlie Section 3 are presented in the appendix.

## 2 Problem Formulation and Algorithm

*x*

_{1}-axis matches with the gradient direction . Additionally, the

*x*

_{2}-axis should lie in the two-dimensional subspace spanned by the gradient vector and the normal vector. The corresponding angle is referred to as the constraint angle . Constraint angles of interest are in the interval . The unit normal vector of the constraint plane in terms of the considered coordinate system is . The signed distance of a point from the constraint plane is . An illustration of the problem definition can be found in Figure 1. The resulting optimization problem reads for some constant . Because of the choice of the coordinate system, the variables have no effect on either the objective function or the constraint inequality. Despite its simplicity the linear environment serves as a model for other constrained optimization problems in the immediate vicinity of the constraint plane (see Arnold [2013]).

At this point, the general algorithm of the -CSA-ES dealing with problem (P) is introduced. It replaces infeasible offspring candidate solutions with feasible ones applying a specific repair mechanism. The two repair mechanisms investigated in this paper are explained in the corresponding subsections. Beginning with a feasible candidate solution , the -ES generates offspring by performing the following steps per iteration:

For ,

Evaluate for , and let denote the (possibly repaired) mutation vector of the best offspring, that is, of the offspring with the largest objective function value.

Replace the parental candidate solution with the best offspring candidate solution according to .

Modify the mutation strength using cumulative step size adaptation.

*t*to conforming to the rule with damping parameter that scales the magnitude of the updates. Notice, update rule (3) is different from the original mutation strength update in Hansen and Ostermeier (2001) in the way that it adapts based on the squared length of the search path rather than based on search path’s length. With appropriately chosen parameters both variants often show a similar behavior. In accordance with Arnold (2013), this work uses the mutation strength update (3) because it simplifies the CSA analysis.

The decision whether the mutation strength is decreased or increased depends on the sign of the numerator in Equation (3). The basic idea is that long search paths indicate that selected mutation vectors predominantly point in one direction and could be replaced with fewer but longer steps. A short search path suggests that the strategy is moving back and forth and thus should benefit from smaller step sizes. If the unconstrained setting with randomly selected offspring candidate solutions (i.e., the ordering in step 2 of the algorithm is random) is considered, the search path has the expected squared length of *N*. In this case the mutation strength performs a random walk on log scale. That is, in expectation the logarithm of the mutation strength does not change at all. It is not unexpected that CSA in constrained settings may fail under some conditions.

### 2.1 Repair by Reflection

In this section the repair mechanism denoted as reflection is considered. It enables the -ES to deal with a single linear constraint (see Figure 1). Initially infeasible offspring candidate solutions are mirrored into the feasible region of the search space. The repair step in the algorithm reads

### 2.2 Repair by Truncation

We now consider the second repair mechanism, which truncates the mutation vector of an initially infeasible offspring candidate solution at the edge of the feasible region of the search space. In the following we refer to this method as truncation. As well as repair by reflection, truncation enables the -ES to deal with the previously introduced single linear constraint. The procedure is illustrated in Figure 2. Accordingly, the repair step 1b in the algorithm can be formulated as

## 3 Behavior for Fixed Mutation Strength

*t*.

In the following a characterization of the distribution of offspring candidate solutions conditional on is provided, and the expected step made in a single iteration of the algorithm is calculated considering both repair mechanisms. Finally, the average normalized distance from the constraint plane realized by the respective repair methods when operating with constant mutation strength is investigated.

### 3.1 Single Step Behavior: Mutation

*z*

_{1}components and the distribution of the

*z*

_{2}components conditional on

*z*

_{1}after repair are derived in the appendix for both repair mechanisms. Taking into account Equation (34), in the case of reflection the marginal density of the

*z*

_{1}components after reflection reads Notice, in this context denotes the cumulative distribution function of the standard normal distribution and the abbreviation is used. According to Equation (36), the corresponding cumulative distribution function is

### 3.2 Single Step Behavior: Selection

*z*

_{1}are presented in the appendix for both constraint-handling methods. In the case of reflection they can be found in Equations (40) and (41). For repair by truncation the respective expressions are derived in Equations (60) and (61).

### 3.3 Steady State Behavior

## 4 Mutation Strength Control

In this section we drop the assumption that the mutation strength of the strategy is fixed. That is, the strategy’s step size is adapted using cumulative step size adaptation as described in Section 2. This affects the long-term behavior of the -ES in the constrained linear environment considered in such a way that the strategy either increases or decreases the mutation strength on average. With respect to the optimization problem (P), increasing is required because decreasing would lead to premature convergence.

*t*. Then the expected logarithmic adaptation response at time step is computed. The derivation of the expected value was completed in Arnold (2013), resulting in This equation depends on the first two moments of the first and second components of the mutation vector that provides the best offspring candidate solution in the respective generation. The moments , are specified in Equations (17)–(20) for both repair mechanisms considered. Notice, Equation (24) depends on the search space dimension only through the choice of the cumulation parameter

*c*, which is usually specified as an

*N*-dependent value.

The theoretical results of the moments , can be validated by comparing them to measurements from experimental runs of the evolution strategies. With the Dirac model to compute the average distance of the parental candidate solution from the constraint plane by solving (22), the moments , can be evaluated numerically. For reflection and truncation the resulting curves are plotted against the constraint angle in Figure 4. The comparison of the curves with the corresponding values obtained experimentally in runs of the respective -ES with fixed mutation strength shows a good visual agreement for the first moments and . The approximation quality of the Dirac model deteriorates for the second moments , and . In the first case the approximation in case of highly acute constraint angles exhibits small deviations. However, the values in that scenario are generally small and have limited impact on the results. Regarding the deviations occur with increasing values of but appear to be visually good for small constraint angles.

*c*approaches 1, the second term decreases to zero. That is, depending on the first term, the expected logarithmic adaptation response may be negative, resulting in decreasing step sizes. In order to achieve a positive expected logarithmic response, the negative third term in Equation (24) has to be outweighed by the first two terms. This can be realized for

*c*values that are close enough to zero. Requiring yields the condition for the cumulation parameter. Condition (25) allows for the computation of the maximal

*c*value for which stagnation, namely, convergence to a nonstationary point, can be avoided. The calculation is performed by considering equality in (25), where the moments are computed by making use of the Dirac model for the average distance from the constraint plane. The results for the strategies using reflection and truncation are illustrated in Figure 5 and compared to the corresponding results for projection and resampling obtained by Arnold (2011a, 2011b). It can be observed that for the strategy that reflects initially infeasible candidate solutions, the maximal cumulation parameter

*c*capable of avoiding premature convergence decreases to zero as the constraint angle becomes increasingly acute. For the strategies applying truncation and resampling, the same behavior can be deduced from Figure 5. That is, provided that the constraint angle is small enough, all three strategies will converge to a nonstationary limit point independent of the choice of the cumulation parameter

*c*. This behavior has already been investigated for the strategy that resamples infeasible offspring candidate solutions. The approach of the maximal cumulation parameter toward a finite limit value for decreasing , as observed by Arnold (2013), is only realized by the strategy that repairs infeasible candidate solutions by projection. Thus, only projection is able to ensure continuous progress for arbitrarily small constraint angles. Note, as , values exceeding 1 are not shown in Figure 5.

The failure of cumulative step size adaptation in the limit of small constraint angles (see Figure 4) can be explained as follows. The moments and of the parental mutation vector’s first component describe the behavior in *x*_{1} direction. For small values of both moments assume very small values near zero, as both strategies operate in close vicinity to the constraint plane. In order to compensate for the small contribution of the moments and to the expected squared length of the search path on the *x*_{1}-axis, the moments and have to be large enough to ensure a positive expected logarithmic adaptation response . In terms of the first two moments of and components, repair by reflection resembles the strategy that resamples infeasible solutions. Arnold (2013) found that the behavior in *x*_{2} direction almost performs a random walk, namely, and . The same observation can be made for the strategy that applies reflection (see Figure 4). This can be clarified considering the way reflection acts on initially infeasible offspring candidate solutions. In the limit of very small , on average, half of the offspring are infeasible and will be reflected into the feasible region. On average, half of those that are reflected have a positive *z*_{2} component, and the other half are reflected on a negative *z*_{2} component. But offspring being reflected on a negative *z*_{2} component are for small only slightly more likely to be selected. That is, in contrast to the observations made for projection (see Arnold [2013]), the strategy using reflection does not exhibit a strong correlation between candidates being successful and a large negative value of the *z*_{2} components. As a consequence, reflection behaves similarly to resampling.

With the strategy that truncates the mutation vectors of initially infeasible candidate solutions at the boundary of the feasible area, and approach even smaller values. The observed moments and assume zero for decreasing constraint angles . Thus they are not suitable to compensate for the low contribution of and on the *x*_{1}-axis. To prevent convergence to a nonstationary limit point, the strategy using truncation requires even lower choices of the cumulation parameter *c*. In the limit of very small , the probability of generating an initially infeasible offspring is again approximately .5. The area that provides positive *z*_{1} components after truncation shrinks considerably with decreasing constraint angle. That is, the mutation vectors of successful candidate solutions after truncation are on average very small. We can infer the same correlation from truncation that was observed in the case of projection, namely, that successful offspring after truncation tend to be associated with negative *z*_{2} components. But this correlation is not able to compensate for the drawback of small mutation vectors. As a consequence, using repair by truncation within the -CSA-ES is not a well-suited constraint-handling method, especially for rather small constraint angles .

The accuracy of the predictions based on the Dirac model is verified in Figure 6. There, all curves are obtained from condition (25) for both reflection and truncation as well as for several offspring population sizes . Each mark in Figure 6 represents experimental results obtained from 10 independent runs of the respective evolution strategies with cumulative step size adaptation initialized with , , and . Further, search space dimensionality and damping parameter were used. The runs were terminated after the mutation strength assumed a value either smaller than or greater than . The very small step sizes suggest that stagnation is likely to ensue, whereas large step sizes point to continuing progress at increasing rates. The + symbol in Figure 6 indicates that at least 9 of the 10 runs terminated with ; the × symbol indicates that termination took place with in at least 9 of the 10 runs. Neither symbol is present at a grid location in the plots if either termination criterion was satisfied in at least 2 of the 10 runs. Within the range of considered constraint angles and population size parameters, the agreement of our predictions based on condition (25) with the experimentally generated results is very good. Only for small population sizes the strategy using truncation shows deviations from the theoretical predictions. This points toward the Dirac model providing an insufficient approximation of the evolution strategy’s behavior in the range of small constraint angles and small population sizes.

## 5 Summary and Conclusions

This paper investigated constraint-handling techniques for the -ES using cumulative step size adaptation on a linear problem with a single linear inequality constraint. Two different repair mechanisms were considered: the reflection of infeasible points into the feasible region and the truncation of the mutation vectors at the boundary of the feasible area. As pointed out earlier, an interesting aspect of the simple scenario of a linear function with a single linear constraint is that it can be used to model microscopic properties of more general constrained problems.

The use of the simple Dirac model allowed for approximating the average distance of the parental candidate solution from the constraint plane, provided that the strategy runs with fixed mutation strength. With cumulative step size adaptation and the assumption that the evolution strategy operates out of a stationary state, this approximation was then used to derive an expression for the expected logarithmic adaptation response. Hence, it was possible to calculate the maximal cumulation parameter up to which premature convergence can be avoided by the -ES. The theoretical predictions were verified by experiments over a wide range of constraint angles with varying number of offspring. The validation revealed generally very good agreement.

While projection still allows for positive logarithmic adaptation response for arbitrarily small constraint angles, neither reflection nor truncation turned out to enable the strategy to make sustained progress by continually increasing the mutation strength for small . The reason for the better performance of projection is that it exhibits a strong correlation between an offspring being successful and a large negative value of its *z*_{2} component. This is not true for repair by reflection, since an initially infeasible offspring with negative *z*_{2} component is not necessarily successful after reflection. Thus reflection shows a behavior similar to resampling. For truncation the strong correlation between a successful offspring and a negative value of its *z*_{2} component can also be found. However, on average, truncation generates short mutation vectors, and consequently the correlation turns out to be insufficient to make up for the short length. There is also empirical evidence in Beyer and Finck (2012) that for more complex constraints, repair by projection provides better performance compared to truncation and resampling.

The results provide insight into the interactions of constraint-handling techniques used in combination with cumulative step size adaptation. Possible future work includes the extension of the analysis to the more general multirecombinant -ES. Furthermore, the interplay of repair by reflection with other step size adaptation methods, such as mutative self-adaptation, remains to be discussed. A theoretical attempt to address problems with a nonlinear constraint was provided by Arnold (2014). There, a linear optimization problem with conically constrained feasible region was investigated by using resampling to deal with infeasible candidate solutions. A future analysis of this type of problem using repair by projection seems reasonable. As well, investigations regarding nonlinear objective functions and further nonlinear constraints are worth considering.

## Acknowledgments

This work was supported by the Austrian Science Fund (FWF) under grant P22649-N23.

## References

_{i}, λ)-ES in a noisy environment

### Appendix Computation of the Distributions

This appendix computes the distribution of the mutation vectors’ first components *z*_{1} after the repair step for both constraint-handling techniques, that is, for repair by reflection as well as the repair method using truncation. Subsequently, the distribution of the mutation vectors’ *z*_{2} components conditional on *z*_{1} is computed for the respective strategies.

#### A.1 Reflection

In the first step the probability distribution function of the *z*_{1} components of the mutation vector after reflection is calculated. Then we derive the first two moments about zero of its *z*_{2} components conditional on *z*_{1}.

##### A.1.1 Distribution of the *z*_{1} Components after Reflection

*z*

_{1}components after reflection is the sum of the distribution of the corresponding components from immediately feasible offspring candidate solutions and the distribution of those candidates that are initially infeasible and thus repaired by reflection into the feasible area. Reflection transforms the mutation vector of infeasible candidate solutions in the following way (see Equations (4) and (5)): A feasible offspring candidate solution has to satisfy the constraint condition . Here, refers to the normalized distance of the parental candidate solution to the constraint plane, and denotes the constraint angle. The joint probability density of the initially feasible

*z*

_{1}and

*z*

_{2}components is the truncated normal density Notice, the factor in the denominator is the probability of an offspring being feasible Accordingly, the joint probability density of initially infeasible

*z*

_{1}and

*z*

_{2}components before reflection is a truncated normal density as well. It is described by The strategy reflects infeasible candidate solutions into the feasible region of the search space. That is, a transformation of the random variables

*x*and

*y*according to the repair step (26) applied to Equation (29) yields the joint probability density of initially infeasible

*z*

_{1}and

*z*

_{2}components after reflection. After simple rearrangements the transformation reads The Jacobian matrix of this transformation with determinant is computed and some straightforward simplifications are applied. The resulting joint probability density of the initially infeasible components after reflection is After the joint densities and are weighted with the probability of their occurrence and , respectively, the overall joint probability density of the

*z*

_{1}and

*z*

_{2}components after the repair step (26) is for those

*x*and

*y*that satisfy the condition , and zero otherwise.

*z*

_{1}components after reflection is derived by integration over all feasible

*z*

_{2}values Solving the integral yields By introducing the abbreviation and integrating the marginal density with respect to

*x*, the corresponding cumulative distribution function of the

*z*

_{1}components after reflection is

##### A.1.2 Distribution of the *z*_{2} Components after Reflection

*z*

_{2}components after reflection conditional on . On the basis of this distribution the first two moments about zero can be computed. The probability of an offspring being infeasible and thus being in need of repair conditional on is obtained by computation of the relative weight of the second term in the marginal density in Equation (34) and reads The contributions to the first two moments after reflection conditional on , which arise from initially feasible candidate solutions, can be computed from the conditional probability density where denotes the marginal density function of the

*z*

_{1}components of initially feasible candidate solutions according to Equation (27). Taking into account Equation (38), the contributions to the first two moments equal and The next step is concerned with the calculation of the contributions caused by initially infeasible candidate solutions after reflection. Therefore, the marginal density function of the

*z*

_{1}components of initially infeasible candidate solutions is derived from the joint density in Equation (31), yielding Thus the conditional density of the

*z*

_{2}components reads and the contributions from initially infeasible and thus reflected offspring to the first moments of

*z*

_{2}can be computed as and Weighting the terms with the probabilities of their occurrence and applying some simplifications, one obtains and for the first two moments about zero of the

*z*

_{2}component of mutation vectors after reflection conditional on .

#### A.2 Truncation

Now for repair by truncation, the probability distribution function of the *z*_{1} components of the mutation vector and the first two moments about zero of its *z*_{2} components conditional on *z*_{1} are computed.

##### A.2.1 Distribution of the *z*_{1} Components after Truncation

*z*

_{1}components is independent of the repair method and thus equals the first addend in Equation (34): That is, only the derivation of the contribution from truncated offspring to the marginal density of the

*z*

_{1}components has to be carried out. The probability of an initially infeasible candidate solution having a

*z*

_{1}component less than some

*x*is the sum Computing the derivative with respect to

*x*and applying some straightforward simplifications yields After solving the integral in (52) and introducing the abbreviation the contribution of initially infeasible offspring after truncation is obtained by Finally, combining (49) and (54) yields the marginal density of the

*z*

_{1}component of mutation vectors after repair by truncation: By integration of Equation (55) with respect to

*x*, the cumulative distribution function of the

*z*

_{1}component reads

##### A.2.2 Distribution of the *z*_{2} Components after Truncation

*z*

_{2}components of mutation vectors after truncation conditional on is regarded. The probability of an offspring being in need of repair conditional on is the relative weight of in the marginal density (55) and equals Initially feasible offspring are not truncated. The moments of their

*z*

_{2}components are described by Equations (40) and (41). Initially infeasible candidate solutions after truncation have the

*z*

_{2}component and thus and Weighting both terms with the probability of their occurrence yields the moments of the

*z*

_{2}components of mutation vectors after truncation conditional on . After some straightforward simplifications these moments are and