We study the behaviour of multi-recombination evolution strategies for the problem of maximising a linear function with a single linear constraint. Two variants of the algorithm are considered: a strategy that resamples infeasible candidate solutions and one that applies a simple repair mechanism. Integral expressions that describe the strategies’ one-generation behaviour are derived and used in a simple zeroth order model for the steady state attained when operating with constant step size. Applied to the analysis of cumulative step size adaptation, the approach provides an intuitive explanation for the qualitative difference in the algorithm variants’ behaviour. The findings have implications for the design of constraint handling techniques to be used in connection with cumulative step size adaptation.
The number of constraint handling techniques used in evolutionary algorithms is large and growing. Approaches range from penalty methods and repair mechanisms to algorithms based on ideas from multi-objective optimisation, and they include those described by Oyman et al. (1999), Runarsson and Yao (2000, 2005), Mezura-Montes and Coello Coello (2005), and Kramer and Schwefel (2006). A survey of techniques was compiled by Coello Coello (2008). Knowledge with regard to the relative capabilities of the approaches is almost always gained by comparing their performances on sets of test functions that are believed to be difficult. The influence of the choice of those functions as well as of parameters implicit in the performance criteria is often not obvious. The results are not easy to interpret. If a mechanism fails, it often remains unclear why. Likewise, it is not always straightforward what “good” performance for an involved problem means. Test function generators such as the one proposed by Michalewicz et al. (2000) that automatically generate test functions with parameters controlling properties such as the connectedness of the feasible region and the degree of multimodality of the objective provide some extra insights with regard to scaling properties. However, the functions generated are too complex to afford an understanding of algorithm behaviour on a microscopic level.
The approach pursued here is complementary to benchmarking studies on large testbeds. We study the behaviour of algorithms on a very simple class of test functions for which analytically based findings can be derived. The results offer the advantage of being easy to interpret and reveal scaling properties and the influence of parameters on optimisation performance. It has been known at least since the early work of Schwefel (1975) that the presence of constraints can cause step size adaptation mechanisms to fail even for the simplest of problems. Understanding the interplay of step size adaptation mechanisms and constraint handling techniques on a microscopic level will benefit the developer of evolutionary algorithms for constrained optimisation.
A sizable number of studies pursue a similar approach in the realm of unconstrained evolutionary optimisation. Much less such work has been published in constrained settings. Among the sparse references is the early work by Rechenberg (1973), who studies the performance of the (1+1)-ES1 for the axis-aligned corridor model, and upon which the one-fifth success rule for step size adaptation is partly based. Schwefel (1975) considers the performance of the -ES in the same environment. Beyer (1989) analyses the behaviour of the (1+1)-ES for a constrained discus-like function. All of those analyses have in common that the normal vectors of the constraint planes are oriented such that they are perpendicular to the gradient vector of the objective function.
More recent work considers constrained problems that do not have that specific property. Arnold and Brauer (2008) study the behaviour of the (1+1)-ES for a linear problem with a single linear constraint of general orientation. They describe the distance of the parental candidate solution from the constraint plane and investigate its limit behaviour using two simple statistical models. Schwefel (1975) points out that using the one-fifth rule, the presence of constraints may result in the step size being reduced in situations where the angle between the gradient direction and the normal vector of the active constraint is small, leading to convergence to a nonsingular point. The findings derived by Arnold and Brauer (2008) provide a quantitative confirmation of this. Arnold (2011b) and Arnold (2011a) study the behaviour of the -ES with cumulative step size adaptation in the same environment. The former reference considers the case that infeasible candidate solutions are resampled until they are feasible; the latter one uses a simple repair mechanism that orthogonally projects infeasible candidate solutions onto the boundary of the feasible region. It is found that cumulative step size adaptation may fail in scenarios similar to those where the success probability based mechanism fails, though for different reasons, and that employing the repair mechanism results in more robust performance in the sense that convergence to a nonsingular point is avoided, provided that the cumulation parameter used in the step size adaptation mechanism is set appropriately.
In contrast to the work in Arnold (2011a, 2011b), this paper considers the behaviour of evolution strategies employing nonsingleton populations and recombination. The results derived earlier appear as special cases. Being able to study the influence of selection pressure (quantified by the truncation ratio) on algorithm performance adds a new dimension to the analysis. The remainder of the paper is organised as follows. Section 2 briefly formalises the algorithms and problem and introduces notational conventions used throughout the paper. Section 3 investigates the behaviour of the strategy for fixed step size employing a simple zeroth-order model for the distribution of distances from the constraint plane. Section 4 considers cumulative step size adaptation. Section 5 concludes with a brief discussion and directions for future research. Calculations that underlie Sections 3 and 4 but that are not essential to the understanding of the analysis have been relegated to Appendices A and B in order to improve readability.
2 Problem and Algorithms
3 Behaviour for Fixed Mutation Strength
4 Mutation Strength Adaptation
We now abandon the assumption that the mutation strength of the strategy is fixed. For the constrained linear environment considered here, the -ES with cumulative step size adaptation does not assume a stationary limit state. The mutation strength is either increased or decreased indefinitely. As decreasing mutation strengths lead to stagnation and convergence to a nonsingular point, increasing is desirable. Figure 3 shows four runs of the (3/3, 10)-ES with cumulative step size adaptation for each of the two constraint handling approaches. In all cases , N=2, c=0.5, s(0)=0, and . While the mutation strength and objective function value of the population centroid increase exponentially quickly for the strategy that uses the repair mechanism, the strategy that employs the resampling approach decreases its mutation strength exponentially quickly and objective function values stagnate. It will be seen below that a lower setting for the cumulation parameter would enable the resampling strategy to avoid stagnation.
Equations for ei,j for are derived in Appendices A and B. Numerically evaluating Equations (19), (20), (21), and (22) using the Dirac model for the average distance of the population centroid from the constraint plane yields the curves in Figure 4. Figure 4(a) shows the same results as Figure 2(d), but with logarithmic scales for both axes. Comparison of the curves with the values obtained experimentally in runs of the respective evolution strategies suggests good visual agreement for the first moments e1,1 and e2,1. The quality of the approximation relying on the Dirac model is not as good for moments e1,2 when constraint angles are highly acute. However, the values in that scenario are generally small and have limited impact on the results below. The agreement of values for e2,2 derived based on the Dirac model with experimental results deteriorates with increasing constraint angle but appears visually good for small values of .
An intuitive explanation for the failure of cumulative step size adaptation for small constraint angles in the strategy that resamples infeasible offspring candidate solutions can be derived from Figure 4. In the limit of very small constraint angles, both e1,1 and e1,2 assume very small values for both strategies as both stay in the immediate vicinity of the constraint plane (compare Figures 2(b) and 4(a) and 4(b)). In order to have positive expected logarithmic adaptation response, the terms in Equation (6) that involve the moments e2,1 and e2,2 of the step in the direction of the x2 axis must be large enough to compensate for the lack of a contribution to the expected squared length of the search path from the components in the direction of the x1 axis. The strategy that resamples infeasible offspring in the limit of very small constraint angles performs an almost random walk in the direction of the x2 axis, resulting in and (compare Figures 4(e) and 4(f)), which is insufficient to ensure positive expected logarithmic adaptation response unless c is small enough to sufficiently amplify the term involving e21,1 in Equation (6).
The strategy that repairs infeasible offspring does not perform a random walk in the direction of the x2 axis. In the limit of very small , on average, half of the offspring are infeasible and will be projected onto the constraint plane. On average, half of those that are projected have a negative z2 component and are likely to prevail under selection as their z1 components are positive. Unless is very small or the truncation ratio is large, the majority of mutation vectors being averaged through recombination will thus point in the direction of the negative x2 axis and lead to relatively large values of e22,1 and e2,2 that ensure positive expected logarithmic adaptation response (compare Figures 4(e) and 4(f)). It is important to note that this is not a consequence of a bias built into the step size adaptation mechanism as a result of changing the length of individual steps through projection. It is easy to see that projection of mutation vectors never increases their length, and that thus any bias as a result of projection will be toward smaller step sizes, not toward larger ones. Positive expected logarithmic adaptation response is a result of the coherence of the selected steps in the direction of the negative x2 axis, not the length of the mutation vectors.
Finally, the accuracy of the predictions made on the basis of the Dirac model remain to be verified. The lines in Figure 6 were obtained from Equation (7) for several population size parameter settings. For and constraint handling by repair, the values computed consistently exceed 1.0, resulting in no curves in the respective plots. Figure 7 shows limit values for obtained from the same equation. Each dot in either figure marks the results from 10 independent runs of the respective evolution strategies with cumulative step size adaptation. Each run was initialised with , , and s(0)=0 and uses N=2 and D=1.0. We ran further experiments with other settings for the search space dimensionality and damping parameter and observed virtually identical results. A run was terminated when the mutation strength reached a value either smaller than 10-20 or greater than 1020. The very small step sizes in the former case suggest that stagnation is likely to ensue while the large step sizes in the latter case indicate continued progress at increasingly faster rates. The + symbol in the plots indicates that at least nine out of the 10 runs terminated with ; the × symbol indicates that upon termination in at least nine out of the 10 runs. If neither symbol is present at a grid location in the plots, then either termination criterion was satisfied in at least two out of the 10 runs. It can be seen that the agreement of predictions based on Equation (7) with experimentally observed values is very good across the entire range of population size parameters and constraint angles considered.
5 Summary and Conclusions
We studied the behaviour of the -ES with cumulative step size adaptation applied to a linear problem with a single linear inequality constraint. Two constraint handling techniques were considered: the resampling of infeasible candidate solutions, and their repair through projection onto the constraint plane. We have argued that while it is simple, the scenario of a linear function with a single linear constraint is an interesting one to consider as it reflects microscopic properties of more general constrained, differentiable problems.
Using a simple model, we derived an approximation to the average distance of the population centroid from the constraint plane if the strategy is run with fixed mutation strength. We then used that approximation in the context of adaptive step sizes and derived an expression for the expected logarithmic adaptation response of the strategy if it operates out of a stationary state. Experiments showed that predictions of whether the strategy is capable of sustained progress or converges to a nonsingular point can be made using those results across a wide range of population size parameter settings and constraint angles. Most interestingly, the findings revealed a qualitative difference between the behaviours resulting from the two constraint handling mechanisms for highly acute constraint angles. While the strategy that resamples infeasible candidate solutions will converge to a nonsingular point for any setting of the cumulation parameter c, the strategy that employs a simple repair mechanism is capable of sustained progress as long as that parameter is set to not too large a value as governed by Equation (7). The difference in the behaviour of the two strategy variants has been explained by considering the components of the search path in the two-dimensional subspace spanned by the gradient of the objective function and the normal vector of the constraint plane. For small constraint angles, both constraint handling mechanisms result in very small steps in the direction of the gradient vector being made. However, while for the strategy that resamples infeasible offspring steps in the perpendicular direction become increasingly random as decreases, they tend to point in the same direction for any for the strategy that applies a repair mechanism, resulting in longer search paths and consequently in a systematic increase of the step size.
Clearly, these results have strong implications for the choice of constraint handling approach to be used in combination with cumulative step size adaptation. They also provide guidance to the algorithm engineer developing new constraint handling techniques, suggesting that if step size adaptation is to be based on the (squared) length of the search path, the impact of constraint handling on that path needs to be considered. It is desirable to extend the analyses presented here to investigate properties of further constraint handling techniques as well as of other step size adaptation mechanisms, such as mutative self-adaptation. Future work will also consider nonlinear objective functions where the local constraint angle is not constant, nonlinear constraints, and problems with multiple constraints.
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).
Appendix A Single Step Behaviour: Mutation
This appendix computes the marginal density of the z1 components as well as the first two moments about zero of the z2 components of mutation vectors for both constraint handling techniques. Both will be used in Appendix B in order to compute the corresponding moments after selection and averaging. Superscripts denoting time or indexing offspring candidate solutions are omitted as only individual offspring are considered.
Appendix B Single Step Behaviour: Selection and Recombination
B.1 Computing First Moments
B.2 Computing Second Moments
See Beyer and Schwefel (2002) for an overview of evolution strategy terminology.
Strictly speaking, the task is one of amelioration rather than maximisation, as a finite maximum does not exist. We do not make that distinction here.
The update in Algorithm 1 differs from the original prescription in that Ostermeier et al. (1994) adapt the mutation strength based on the length of the search path rather than on its squared length. With appropriately chosen parameters, both variants behave similarly. Basing the update on the squared length simplifies the analysis.