Abstract
Active-set approaches are commonly used in algorithms for constrained numerical optimization. We propose that active-set techniques can beneficially be employed for evolutionary black-box optimization with explicit constraints and present an active-set evolution strategy. We experimentally evaluate its performance relative to those of several algorithms for constrained optimization and find that the active-set evolution strategy compares favourably for the problem set under consideration.
1 Introduction
Black-box optimization refers to problems where the objective is a black box in the sense that objective function values of points can be obtained, but no further information is available. Gradients can only be approximated through finite differencing. Obtaining an objective function value may require running a possibly expensive computer simulation or even conducting a physical experiment. The total number of objective function evaluations is commonly used as a measure for the computational cost of a black-box optimization process.
In constrained black-box optimization, the objective function is a black box in the sense described above. However, the constraint functions may or may not be black boxes. Le Digabel and Wild (2015) propose a taxonomy of constraints along the following four axes:
Quantifiable or nonquantifiable. A quantifiable constraint is one where constraint function values represent degrees of violation or feasibility. An example of a nonquantifiable constraint is one where the constraint function returns a Boolean value.
Relaxable or unrelaxable. Relaxable constraints allow obtaining meaningful objective function values for infeasible points; unrelaxable ones do not.
Simulation based or a priori. Simulation-based constraints are black boxes and their evaluation may involve substantial cost. A priori constraints are cheap to evaluate, and gradients can be obtained at an insubstantial cost either directly or through finite differencing.
Known or hidden. Hidden constraints are not specified in the problem formulation and may occur for example when a simulation crashes.
The axes are not mutually orthogonal as, for example, hidden constraints cannot be a priori. We refer to constraints that are a priori and quantifiable as explicit constraints. An example of an explicit constraint is a bound constraint, which can be evaluated easily and for which gradients are readily available.
Most work on constrained optimization with evolutionary algorithms (EAs) does not distinguish between different types of constraints, and many algorithms compute both objective and constraint function values for all candidate solutions generated in the course of a run. However, if the constraints are explicit, then it is conceivable that algorithms may benefit from evaluating some points using the (cheap) constraint functions, but not the (expensive) objective function. If the constraints are unrelaxable, then the use of the constraint functions for guiding the search toward the feasible region is unavoidable. Among the few instances of EAs that consider nonlinear constraints as explicit is work by Arnold (2016, 2017), Sakamoto and Akimoto (2019), and Spettel and Beyer (2020).
The primary goal of this article is to show that the use of active-set techniques in an EA can result in a powerful tool for black-box optimization problems where constraints are explicit and objective function evaluations are costly. Toward that goal, we present an active-set evolution strategy for explicitly constrained black-box optimization. The article extends our preliminary work (Arnold, 2016, 2017) in two directions. First, it introduces several modifications and algorithmic tweaks that result in improved performance on a wider range of test problems. A detailed discussion of those modifications can be found in Appendix A. Second, it presents an extensive experimental comparison with other algorithms for constrained black-box optimization.
2 Background
This section briefly introduces evolution strategy related terminology and discusses issues with step size adaptation resulting from the presence of constraints in Section 2.1. Benchmarks for the experimental evaluation and comparison of EAs for constrained optimization are discussed in Section 2.2. EAs for constrained optimization that are relevant in the context of this work are surveyed in Section 2.3.
2.1 Evolution Strategies, Step Size Adaptation, and Constraints
EAs are stochastic algorithms for black-box optimization. They iteratively apply variation and selection to populations of candidate solutions. Evolution strategies constitute a class of EAs where variation may involve recombination of more than a single candidate solution—implemented as some form of arithmetic averaging—and always involves mutation—most often implemented as adding a random vector drawn from a normal distribution with zero mean. Hansen et al. (2015) provide a comprehensive overview of evolution strategies. The simplest evolution strategy is referred to as (1 1)-ES and does not employ recombination. In each iteration it generates a single offspring candidate solution from a single parental candidate solution by means of mutation. It selects the better of the two with respect to objective function value as the parental candidate solution for the next iteration.
The covariance matrix of the mutation distribution is adapted throughout a run of the algorithm. Covariance matrix adaptation evolution strategies (CMA-ES) have been introduced by Hansen and Ostermeier (2001) and adapt the full matrix. For well-conditioned problems, the covariance matrix may be restricted to a scalar multiple of the identity matrix, leaving only a single parameter to be adapted. That parameter is referred to as the step size parameter and for the (1 1)-ES can be adapted by means of the 1/5th rule (Rechenberg, 1973): decrease the step size if fewer than one in five offspring are successful in the sense of being superior to their parent; increase it otherwise.
Considering Figure 1, an active-set approach would at some point add the constraint into the working set and henceforth treat it as an equality constraint. An active-set evolution strategy would then be constrained to searching on the constraint boundary, where it faces an unconstrained convex problem and is capable of linear convergence.
2.2 Evaluating Algorithms for Constrained Black-Box Optimization
In contrast to mathematical programming approaches that are developed with the aim of being able to guarantee convergence to stationary points under certain conditions, EAs are commonly evaluated and compared experimentally, using sets of test problems. Partly drawing from earlier collections, such as those assembled by Hock and Schittkowski (1981) and Floudas and Pardalos (1987), a set of eleven test problems for constrained evolutionary optimization was gathered by Michalewicz and Schoenauer (1996). That set was expanded to 24 test problems to form the basis of a competition held in connection with the Special Session on Constrained Real-Parameter Optimization at the 2006 IEEE Congress on Evolutionary Computation (Liang et al., 2006). Detailed information on these test problems that we refer to as the CEC 2006 test set can be found in Appendix B. More recent competitions held in connection with the IEEE Congress on Evolutionary Computation have used sets of constrained test problems with variable dimension (Mallipeddi and Suganthan, 2010; Wu et al., 2016). However, the numbers of constraints present in those problems are small, and the problems' characteristics differ significantly from those present in other commonly used test sets. The most recent competition employed test problems closer in character to the 2006 test set, but also included mixed-integer problems (Kumar et al., 2020).
A central factor in the experimental evaluation and comparison of algorithms is the choice of performance measure. Hansen et al. (2016) discuss the relative merits of fixed-target approaches versus fixed-budget approaches and argue for running times measured as the number of objective function evaluations required to reach target objective function values as “the only available measure with a generic, meaningful, and quantitative interpretation.” We adopt that measure in place of the fixed-budget approach used in the CEC benchmarks.
Mallipeddi and Suganthan (2010) state that “it has become impossible to demonstrate the superior performance of newly designed algorithms” with the CEC 2006 benchmark since it “has been solved satisfactorily by several methods.” However, the number of objective function evaluations required to solve the CEC 2006 test problems is large even for those algorithms that were the most successful in the competitions. We argue that optimal solutions can be obtained with much less computational effort if the constraints are a priori.
2.3 Algorithms for Constrained Black-Box Optimization
Several strategies exist for equipping EAs for unconstrained optimization with constraint handling techniques. The strategies often combine all constraint function values of a point in a single, scalar-constraint violation value. An order is defined on the product space of objective function and constraint violation values and is used for selection. An example is the constrained method due to Takahama and Sakai (2006, 2010b). The order defined in that method initially prioritizes the minimization of objective function values over the minimization of constraint violation, but then progressively shifts more emphasis toward the latter. Variants of constrained differential evolution have ranked first in both the CEC 2006 and CEC 2010 competitions. Takahama and Sakai (2006) successfully solve all but two of the problems from the CEC 2006 test set, but require numbers of objective function evaluations that range from the low thousands to the hundreds of thousands. The constrained method has been used in evolution strategies by Hellwig and Beyer (2018), whose MAg-ES adapts the full covariance matrix of its mutation distribution.
Some of the algorithms that employ the constrained method approximate the gradients of the constraint functions to repair some infeasible candidate solutions and thus implicitly assume that the constraints are a priori, while others do not. All of them depend upon the constraints to be relaxable. Crucially, the method requires the specification of a parameter that determines the duration of the phase during which selection for objective function values may take precedence over selection for constraint violation. Both Takahama and Sakai (2010a) and Hellwig and Beyer (2018) set the number of iterations after which the minimization of constraint violation is given absolute priority over the minimization of the objective function to 1,000. It is unclear how the parameter should be set if optimal solutions can be obtained with far fewer objective function evaluations. In the remainder of this section, we focus on work that does not assume large function evaluation budgets.
Martínez and Sobral (2013) contend that many derivative-free approaches for constrained optimization are not efficient on “thin” domains (e.g., those that include nonlinear equality constraints) and that other algorithms do not exploit constraints being a priori. They propose Skinny as an algorithm that in a first phase strives to decrease infeasibility without evaluating the objective function while in a second phase minimizes the objective function on a relaxed feasible set. The admissible constraint violation is tightened gradually. Using test problems from Hock and Schittkowski (1981) they find that Skinny requires fewer objective function evaluations than an augmented Lagrangian method and solves more problems than a derivative-free trust region approach.
Regis (2014) proposes COBRA, an algorithm for inequality constrained black-box optimization that uses radial basis function based surrogate models for both the objective and constraint functions. The approach does not require the constraint functions to be a priori and proceeds in two phases. In the first phase it aims to find a feasible point that is then improved upon in the second phase. The strategy has several parameters that need to be set. Bagheri et al. (2017) introduce SACOBRA, a self-adaptive variant of COBRA that uses an EA for the online optimization of those parameters, removing the need for problem specific tuning. Bagheri et al. (2016) enable SACOBRA to solve equality and inequality constrained problems by using a conjugate gradient method to minimize the sum of (surrogate model approximated) equality constraint violations in each iteration.
Regis (2018) proposes CONOPUS, a surrogate model assisted particle swarm algorithm for inequality constrained black-box optimization. As COBRA, the approach does not assume that the constraints are a priori and fits surrogate models to both the objective and constraint functions. In each iteration, those models are used to refine the best point located so far. The algorithm is demonstrated to outperform particle swarm variants that do not employ surrogate models. Two further surrogate model assisted approaches applicable in the face of simulation based inequality constraints are a differential evolution variant proposed by Wang et al. (2019) and an algorithm by Li and Zhang (2021). In both cases, the experimental evaluation on the inequality constrained subset of problems from the CEC 2006 test set employs function evaluation budgets much smaller than those used in connection with the CEC 2006 special session.
Sakamoto and Akimoto (2019) design ARCH as an algorithm specifically for black-box optimization problems with constraints that are quantifiable and a priori, but not required to be relaxable. Their algorithm involves a constraint handling technique similar to an approach for handling bound constraints in the CMA-ES introduced by Hansen et al. (2009). Infeasible points are not evaluated directly but instead associated with the objective function value of the closest corresponding point in the feasible region, subject to inequality constraints being violated to remain tight. This is akin to the projection of points in active-set evolution strategies, but differs in that in ARCH projection is performed only for the purpose of computing objective function values; infeasible points remain within the infeasible region. Selection employs rankings of the objective and constraint function values that are based on a linear combination with an adaptive coefficient. The resulting algorithm is shown to possess desirable invariance properties.
Both Spettel et al. (2019) and Spettel and Beyer (2020) propose interior-point evolution strategies that repair infeasible candidate solutions through projection onto the feasible region before evaluating them. The former algorithm is applicable only to linearly constrained problems, the latter only to equality constrained problems.
3 Active-Set Evolution Strategy
This section introduces a (1 1)-ES that uses an active-set approach for constraint handling and is referred to as (1 1)-AS-ES. It is the result of making several modifications to the strategies presented by Arnold (2016, 2017); differences between the algorithm variants are described in Appendix A. The algorithm employs a working set of indices of inequality constraints that are treated as equality constraints. We write for the subset of the feasible region where all of the constraints in the working set are tight and refer to it as the reduced search space. Offspring candidate solutions are projected onto the reduced search space, thus ensuring their feasibility. If an offspring candidate solution that improves on the parental candidate solution is selected to replace the latter, then any constraints that are tight at and that are not yet present in are added to the working set. As it is possible that constraints not in an optimal active set are added to the working set at some point during an optimization process, constraints in need to periodically be considered for removal. This is accomplished by temporarily releasing a constraint from the working set and generating an offspring candidate solution for which that constraint is not tight. If the offspring improves on the parental candidate solution, then the constraint is removed from the working set.
3.1 Algorithm
3.2 Details
Numerical issues arise both due to the limited precision of floating point number representations on digital computers and possibly due to objective or constraint functions for some inputs returning values that are not real numbers. We have implemented the (1 1)-AS-ES in MATLAB,1 which uses double-precision floating-point numbers. The presence of equality constraints necessitates the specification of a parameter that determines the maximally admissible constraint violation and that is referred to as the constraint tolerance. An equality constraint is thus interpreted as ; an inequality constraint as . We refer to a solution that is feasible to within the constraint tolerance as a viable solution.
Projection in Line 9 of the algorithm in Figure 2 requires the minimization of a squared Euclidean norm as described above and can be accomplished by sequential quadratic programming. Sequential quadratic programming may or may not result in a viable solution, potentially necessitating more than a single iteration of the loop in Lines 7 through 10. Not shown in Figure 2 in order not to impede readability, in some situations projection may repeatedly fail to generate a viable candidate solution that satisfies the conditions in Line 10. We abort an iteration of the algorithm after 400 unsuccessful tries. If considering a constraint for removal from the working set, then that constraint is simply removed with a probability of 20 percent and we set otherwise. As a result, the next iteration of the algorithm may see either another constraint or no constraint at all considered for removal from . The activation of this safeguard is observed only on a small subset of the test problems considered in Section 4.
The MATLAB Optimization Toolbox in the fmincon function implements two sequential quadratic programming algorithms: active-set and sqp. The former can be observed to be faster on some problems. However, it may enter infinite loops or crash in runs where the constraint functions return values that are not real numbers. A further difference is that active-set interprets tolerances as absolute while sqp considers them relative. As a consequence, sqp may for some problems return solutions that do not satisfy the (absolute) constraint tolerance . In order to benefit from the relative strengths of both algorithms, we first attempt to project using active-set. In order to avoid crashes and infinite loops, we catch exceptions and use an output function to abort runs in which values that are not real are observed. If projection using active-set fails, then sqp is employed instead. Moreover, fmincon can either be supplied with the gradients of the constraint functions or it can approximate them through finite differencing. Differences between the two options are most obvious where values that are not real are encountered. If constraint gradients are available explicitly, then we first attempt to use them; we switch to gradients that are approximated numerically if sequential quadratic programming with the former fails. Both active-set and sqp are used with default parameter settings. Only those solutions that absolutely satisfy the constraints to within the constraint tolerance are used.
Finally, Line 14 of the algorithm in Figure 2 requires determining which constraints are newly tight. In order for a constraint to be added to the working set, we require that the corresponding Lagrange multiplier obtained in the process of projection and returned by fmincon is strictly positive and that the value of the constraint function is within the constraint tolerance in that .
4 Evaluation
We experimentally evaluate the (1 1)-AS-ES with the primary goal of showing that the use of active-set techniques in an EA can result in a powerful tool for solving black-box optimization problems where constraints are explicit and objective function evaluations are costly. For that purpose, we compare the performance of the (1 1)-AS-ES with that of several deterministic algorithms that use active-set and other techniques for handling constraints as well as with a stochastic algorithm that does not assume the explicitness of the constraints and has been found to be successful in a small function-evaluation budget scenario. We first describe our testing environment in Section 4.1 and discuss the algorithms that the (1 1)-AS-ES is compared with in Section 4.2. We then present results from the experimental comparison in Section 4.3.
4.1 Testing Environment
We employ the CEC 2006 test problem set gathered by Liang et al. (2006). We choose this set as its problems have been designed or selected to exhibit specific intricacies that pose difficulties for algorithms for constrained optimization and thus present a diverse set of challenges. None of the algorithms that we consider have been designed to be applicable to mixed-integer problems, thus precluding the use of the 2020 test set. Moreover, the widespread use of the 2006 test set makes it possible to put into perspective performance figures from a wide range of published papers. The problems of the 2006 set are well understood and optimal function values are known, making it possible to not simply present but to interpret experimental results.
Problems of the CEC 2006 test set are labelled from g01 to g24 and have bound constraints as well as further inequality and equality constraints in the bound constrained region. In accordance with the definition of constrained optimization in Section 1, we require that for all problems the feasible region is closed and that the objective function is well defined anywhere on the feasible region, necessitating changes to the definitions of problems g14 and g20. Those modifications are detailed in Appendix B, along with a summary of characteristics of the entire test set. Starting points for all algorithms are sampled uniformly within the bound constrained region and may or may not be feasible. Experiments are run with constraint tolerance unless noted otherwise.
Our primary tool for comparing algorithm performance is empirical cumulative running-time distributions, which accumulate data from multiple runs and for multiple objective function value targets in easy to interpret figures. The results are “anytime” in that they do not assume specific function evaluation budgets. Hansen et al. (2021) provide a comprehensive discussion of the use and relative merits of empirical cumulative running-time distributions when compared with other approaches to assessing the performance of black-box optimization algorithms.
With the exception of SACOBRA, which samples random points with a small probability, we do not consider restart algorithms and report success rates instead in order to avoid termination criteria from impacting the results. A problem is considered solved to within target accuracy if a solution is found that is feasible to within the constraint tolerance and that has an objective function value , where is the value of an optimal solution to the problem. Target objective function values for generating empirical cumulative running-time distributions are obtained for each problem by selecting twenty values that are logarithmically uniformly spaced between a minimum value of and the median objective function value of 101 points uniformly sampled in the bound constrained domain and subsequently projected onto the feasible region.
4.2 Algorithms
We compare the performance of the (1 1)-AS-ES with those of several other algorithms: two sequential quadratic programming approaches and an interior point algorithm implemented in the MATLAB Optimization Toolbox, as well as Skinny and SACOBRA as described in Section 2.3. Approaches that employ the constrained method for constraint handling are less suitable for use in a fixed-target comparison as they require providing a schedule for the parameter that controls the constraint violation below which selection for objective function values takes precedence over the minimization of constraint violation. With their default schedules, the algorithms of Takahama and Sakai (2010a) and Hellwig and Beyer (2018) may require thousands of objective function evaluations before generating feasible candidate solutions and are thus not competitive in a scenario where targets are to be attained with small numbers of function evaluations. While SACOBRA will be seen below to require about 220 objective function evaluations to reach 50 percent of all targets, MAgES does not attain that milestone until in excess of 20,000 function evaluations. Although ARCH is designed for nonlinearly constrained problems, Sakamoto and Akimoto (2019) provide test results only for problems with linear constraints. The evolution strategies of Spettel et al. (2019) and Spettel and Beyer (2020) are designed for the solution of linearly constrained and equality constrained optimization problems, respectively, and thus not applicable for the full CEC 2006 test set.
The sequential quadratic programming approaches and the interior point algorithm are available through the fmincon interface of the MATLAB Optimization Toolbox and are referred to as active-set, sqp, and interior-point, respectively. All of them are deterministic, and some differences between active-set and sqp are described in Section 3.2. In order to prevent the algorithms from stopping before a viable solution of the required target accuracy is found, we set the optimality tolerance, step tolerance, and function tolerance parameters to zero and use an output function to stop when the termination criteria given in Section 4.1 are satisfied. Runs are aborted as unsuccessful if either the number of objective function evaluations exceeds or if values that are not real numbers are encountered. In the latter case, the algorithms would not be able to make further progress even if allowed to continue running. Skinny is run with default parameter settings and uses the DFO/Algencan packages as described by Martínez and Sobral (2013). We use the SACOBRA version current as of this writing (Version 1.2) with default parameter settings. The active-set evolution strategy is initialized with a value for the step size parameter equal to one-fifth of the minimum extent of the bound constrained region across the dimensions.
4.3 Results
All six algorithms have been used to solve the problems in the CEC 2006 test set with constraint tolerance . For each algorithm/problem combination we have conducted 101 runs. This number is sufficient for Fisher's exact test to affirm that an absolute difference between observed success rates of ten percentage points or more affords at least 95 percent confidence that the underlying success probabilities differ.
Presumably as a result of generating models based on increasing amounts of data, runs of SACOBRA slow down with increasing number of iterations. As Bagheri et al. (2016), we have limited each run to a maximum of 500 objective function evaluations. The corresponding empirical cumulative running-time distributions are thus shown only up to that point.2 Longer runs of SACOBRA would likely result in higher success rates for some of the problems. At the same time, the median number of objective function evaluations across successful runs for those problems would increase.
For the active-set evolution strategy, the safeguard described in Section 3.2 that results in aborting iterations in which the algorithm repeatedly fails to generate viable candidate solutions can be observed to kick in infrequently on problems g02 and g07, and frequently on g22, which is highly ill-conditioned and where the vast majority of projections fail due to numerical issues. It is not triggered on any of the remaining problems.
Performance data for two objective function value targets are summarized in Table 1. Each cell in the table contains results for target accuracy on the left and for on the right. The numbers in the upper row are the median number of objective function evaluations across all successful runs. The numbers in the lower row indicate the fraction of runs that were successful. It can be seen from the table that the active-set evolution strategy solves 15 out of the 24 problems to within target accuracy in all runs. Of the 9 problems that the (1 1)-AS-ES does not solve in all of the runs, g02, g08, g12, g13, g18, and g23 are multimodal, and those runs that do not locate the optimal solution converge to locally optimal solutions. The evolution strategy solves problem g03 to within target accuracy in all runs, but in some cases fails to attain a solution satisfying the higher target accuracy. A reduced constraint tolerance of would allow it to succeed in all runs. Problem g17 is multimodal and 42 of the 101 runs converge to the merely local optimizer. However, about one-third of the runs fail to locate either of the two locally optimal solutions. Those runs approach the globally optimal solution, but fail to locate it to within the required accuracy due to an unwarranted decrease of the step size parameter, likely as a result of the discontinuity of the objective function that creates a ridge-like structure. Finally, one of the 101 runs on problem g20 fails. Problem g20 is the problem with the highest dimension and the largest number of constraints tight at the optimal solution. The algorithm in the failed run does not add all of the constraints that are tight to its working set and as a result generates many candidate solutions that after projection are inferior to their parent, resulting in a systematic decrease of the step size that prevents it from converging to the optimal solution.
. | (1 1)-AS-ES . | active-set . | sqp . | interior-point . | Skinny . | SACOBRA . |
---|---|---|---|---|---|---|
g01 | 22/22 | 28/35 | 28/42 | 352/438 | 1582/1589 | 30/120 |
1.00/1.00 | 0.17/0.12 | 0.14/0.09 | 0.56/0.56 | 0.98/0.98 | 1.00/0.63 | |
g02 | —/— | —/— | —/— | —/— | —/— | —/— |
0.00/0.00 | 0.00/0.00 | 0.00/0.00 | 0.00/0.00 | 0.00/0.00 | 0.00/0.00 | |
g03 | 285/555 | 653/674 | 707/714 | 872/943 | —/— | 141/— |
1.00/0.87 | 0.71/0.67 | 0.39/0.39 | 1.00/1.00 | 0.00/0.00 | 0.97/0.00 | |
g04 | 18/18 | 24/30 | 24/30 | 61/66 | 186/378 | 18/45 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | |
g05 | 37/79 | 45/45 | 50/50 | 61/61 | 419/425 | 567/567 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.51/0.51 | |
g06 | 5/5 | 36/36 | 37/37 | 104/104 | 207/207 | 8/23 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | |
g07 | 279/449 | 155/155 | 157/157 | 187/242 | —/— | 38/277 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.00/0.00 | 1.00/0.82 | |
g08 | 98/183 | 66/72 | 64/68 | 50/57 | 75/112 | 149/245 |
0.54/0.54 | 0.10/0.10 | 0.20/0.20 | 0.73/0.73 | 1.00/1.00 | 0.85/0.74 | |
g09 | 250/523 | 341/341 | 290/290 | 280/332 | 641/1147 | 279/— |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.79/0.00 | |
g10 | 119/219 | 342/342 | 360/360 | 1302/1355 | 12809/12953 | 122/165 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.90/0.90 | 1.00/1.00 | 0.66/0.49 | |
g11 | 27/78 | 21/21 | 21/21 | 27/27 | 144/144 | 95/97 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | |
g12 | 68/174 | 26/26 | 50/50 | —/— | 60/81 | 9/9 |
0.94/0.94 | 0.17/0.10 | 0.01/0.01 | 0.00/0.00 | 1.00/1.00 | 0.76/0.76 | |
g13 | 82/166 | 98/120 | 109/126 | 144/144 | 423/450 | 362/462 |
0.84/0.84 | 0.37/0.21 | 0.41/0.31 | 0.29/0.29 | 0.59/0.59 | 0.28/0.19 | |
g14 | 205/1885 | 376/578 | 284/415 | 177/335 | —/— | 136/— |
1.00/1.00 | 1.00/0.99 | 1.00/1.00 | 1.00/1.00 | 0.00/0.00 | 0.81/0.00 | |
g15 | 7/55 | 44/44 | 40/40 | 45/48 | 142/142 | 121/125 |
1.00/1.00 | 0.92/0.90 | 0.88/0.88 | 0.84/0.84 | 0.93/0.93 | 0.96/0.96 | |
g16 | 15/15 | 109/109 | 187/187 | 537/580 | 543/543 | 113/279 |
1.00/1.00 | 0.14/0.14 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.71/0.02 | |
g17 | 81/197 | —/— | 351/368 | 179/— | 654/722 | 259/— |
0.40/0.27 | 0.00/0.00 | 0.16/0.16 | 0.18/0.00 | 0.09/0.09 | 0.34/0.00 | |
g18 | 33/36 | 100/100 | 110/110 | 233/382 | 1333/1374 | 324/— |
0.74/0.74 | 0.67/0.64 | 0.67/0.67 | 0.74/0.72 | 1.00/1.00 | 0.03/0.00 | |
g19 | 257/380 | 176/240 | 176/226 | 384/496 | —/— | —/— |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.00/0.00 | 0.00/0.00 | |
g20 | 51/51 | 525/538 | 300/300 | —/— | 1023/1118 | —/— |
0.99/0.99 | 0.26/0.23 | 0.78/0.73 | 0.00/0.00 | 0.24/0.23 | 0.00/0.00 | |
g21 | 44/44 | —/— | 143/143 | 409/712 | 442/574 | 853/— |
1.00/1.00 | 0.00/0.00 | 0.01/0.01 | 0.76/0.70 | 0.63/0.41 | 0.02/0.00 | |
g22 | 88/88 | —/— | —/— | 3714/3980 | —/— | —/— |
1.00/1.00 | 0.00/0.00 | 0.00/0.00 | 0.40/0.40 | 0.00/0.00 | 0.00/0.00 | |
g23 | 75/75 | 160/170 | 160/160 | 271/302 | 877/901 | 191/229 |
0.97/0.97 | 0.91/0.90 | 0.97/0.97 | 0.95/0.95 | 0.96/0.96 | 0.79/0.72 | |
g24 | 17/17 | 18/21 | 21/21 | 27/42 | 170/170 | 20/175 |
1.00/1.00 | 0.44/0.43 | 0.52/0.43 | 0.42/0.42 | 1.00/1.00 | 1.00/0.27 |
. | (1 1)-AS-ES . | active-set . | sqp . | interior-point . | Skinny . | SACOBRA . |
---|---|---|---|---|---|---|
g01 | 22/22 | 28/35 | 28/42 | 352/438 | 1582/1589 | 30/120 |
1.00/1.00 | 0.17/0.12 | 0.14/0.09 | 0.56/0.56 | 0.98/0.98 | 1.00/0.63 | |
g02 | —/— | —/— | —/— | —/— | —/— | —/— |
0.00/0.00 | 0.00/0.00 | 0.00/0.00 | 0.00/0.00 | 0.00/0.00 | 0.00/0.00 | |
g03 | 285/555 | 653/674 | 707/714 | 872/943 | —/— | 141/— |
1.00/0.87 | 0.71/0.67 | 0.39/0.39 | 1.00/1.00 | 0.00/0.00 | 0.97/0.00 | |
g04 | 18/18 | 24/30 | 24/30 | 61/66 | 186/378 | 18/45 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | |
g05 | 37/79 | 45/45 | 50/50 | 61/61 | 419/425 | 567/567 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.51/0.51 | |
g06 | 5/5 | 36/36 | 37/37 | 104/104 | 207/207 | 8/23 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | |
g07 | 279/449 | 155/155 | 157/157 | 187/242 | —/— | 38/277 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.00/0.00 | 1.00/0.82 | |
g08 | 98/183 | 66/72 | 64/68 | 50/57 | 75/112 | 149/245 |
0.54/0.54 | 0.10/0.10 | 0.20/0.20 | 0.73/0.73 | 1.00/1.00 | 0.85/0.74 | |
g09 | 250/523 | 341/341 | 290/290 | 280/332 | 641/1147 | 279/— |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.79/0.00 | |
g10 | 119/219 | 342/342 | 360/360 | 1302/1355 | 12809/12953 | 122/165 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.90/0.90 | 1.00/1.00 | 0.66/0.49 | |
g11 | 27/78 | 21/21 | 21/21 | 27/27 | 144/144 | 95/97 |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | |
g12 | 68/174 | 26/26 | 50/50 | —/— | 60/81 | 9/9 |
0.94/0.94 | 0.17/0.10 | 0.01/0.01 | 0.00/0.00 | 1.00/1.00 | 0.76/0.76 | |
g13 | 82/166 | 98/120 | 109/126 | 144/144 | 423/450 | 362/462 |
0.84/0.84 | 0.37/0.21 | 0.41/0.31 | 0.29/0.29 | 0.59/0.59 | 0.28/0.19 | |
g14 | 205/1885 | 376/578 | 284/415 | 177/335 | —/— | 136/— |
1.00/1.00 | 1.00/0.99 | 1.00/1.00 | 1.00/1.00 | 0.00/0.00 | 0.81/0.00 | |
g15 | 7/55 | 44/44 | 40/40 | 45/48 | 142/142 | 121/125 |
1.00/1.00 | 0.92/0.90 | 0.88/0.88 | 0.84/0.84 | 0.93/0.93 | 0.96/0.96 | |
g16 | 15/15 | 109/109 | 187/187 | 537/580 | 543/543 | 113/279 |
1.00/1.00 | 0.14/0.14 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.71/0.02 | |
g17 | 81/197 | —/— | 351/368 | 179/— | 654/722 | 259/— |
0.40/0.27 | 0.00/0.00 | 0.16/0.16 | 0.18/0.00 | 0.09/0.09 | 0.34/0.00 | |
g18 | 33/36 | 100/100 | 110/110 | 233/382 | 1333/1374 | 324/— |
0.74/0.74 | 0.67/0.64 | 0.67/0.67 | 0.74/0.72 | 1.00/1.00 | 0.03/0.00 | |
g19 | 257/380 | 176/240 | 176/226 | 384/496 | —/— | —/— |
1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 1.00/1.00 | 0.00/0.00 | 0.00/0.00 | |
g20 | 51/51 | 525/538 | 300/300 | —/— | 1023/1118 | —/— |
0.99/0.99 | 0.26/0.23 | 0.78/0.73 | 0.00/0.00 | 0.24/0.23 | 0.00/0.00 | |
g21 | 44/44 | —/— | 143/143 | 409/712 | 442/574 | 853/— |
1.00/1.00 | 0.00/0.00 | 0.01/0.01 | 0.76/0.70 | 0.63/0.41 | 0.02/0.00 | |
g22 | 88/88 | —/— | —/— | 3714/3980 | —/— | —/— |
1.00/1.00 | 0.00/0.00 | 0.00/0.00 | 0.40/0.40 | 0.00/0.00 | 0.00/0.00 | |
g23 | 75/75 | 160/170 | 160/160 | 271/302 | 877/901 | 191/229 |
0.97/0.97 | 0.91/0.90 | 0.97/0.97 | 0.95/0.95 | 0.96/0.96 | 0.79/0.72 | |
g24 | 17/17 | 18/21 | 21/21 | 27/42 | 170/170 | 20/175 |
1.00/1.00 | 0.44/0.43 | 0.52/0.43 | 0.42/0.42 | 1.00/1.00 | 1.00/0.27 |
With the exception of problem g02, which has such a large number of local minima that none of the algorithms locates the globally optimal solution in even a single run, and problem g17, for which it achieves the highest success rates among all of the algorithms considered, the (1 1)-AS-ES locates the globally optimal solutions in the majority of runs. Embedding it in a simple restart scheme would allow it to locate globally optimal solutions with overwhelming probability.
Across all problems, the active-set evolution strategy locates globally optimal solutions in 89 percent of runs for target accuracy and in 88 percent of runs for target accuracy . The corresponding rates for active-set are 57 percent and 56 percent, for sqp are 63 percent and 62 percent, for interior-point are 70 percent and 69 percent, for Skinny are 64 percent and 63 percent, and for SACOBRA are 60 percent and 38 percent. The figures for SACOBRA are negatively impacted by the constraint on the number of objective function evaluations that is not shared by the other algorithms. However, the relatively low success rate of SACOBRA is likely also a result of SACOBRA not exploiting that the constraint functions are a priori and relying on inexact surrogate models instead.
Comparing the numbers of objective function evaluations across successful runs, Table 1 shows lower or equal median values for the evolution strategy than for all of the other algorithms on all of those problems where the effective dimension of the reduced search space at the optimal solution is zero. Wilcoxon rank sum tests confirm that the null hypothesis of equal medians is rejected at the 5 percent significance level in all of those cases, with three exceptions. For problem g04, SACOBRA performs similarly to the evolution strategy at target accuracy , but is significantly slower at target accuracy . For problem g21, the null hypothesis is not rejected for sqp. However, while all of the runs of the evolution strategy succeed, only a single run of the sequential quadratic programming approach does. For problem g24, both active-set and sqp perform similarly to the evolution strategy in those runs where they succeed, but have much lower success rates. In contrast, the evolution strategy requires a relatively larger number of function evaluations where is large and where the deterministic approaches that approximate gradients through finite differencing and that are equipped with techniques to handle ill-conditioning in some cases have advantages.
The (1 1)-AS-ES dominates the empirical cumulative running-time distributions of the other algorithms in Figure 3 for any computational budget. In contrast to the other algorithms, it generates feasible candidate solutions from the first iteration and as a result reaches about 4 percent of the targets with a single objective function evaluation. The evolution strategy eventually reaches almost 90 percent of targets. The interior-point fmincon algorithm attains 71 percent, Skinny 70 percent, the sequential quadratic programming approaches implemented in fmincon 65 percent and 59 percent, and SACOBRA 60 percent, albeit with a much more limited computational budget. For most computational budgets of up to about 200 objective function evaluations SACOBRA is the algorithm that attains the second largest number of targets.
Looking at the individual empirical cumulative running-time distributions in Figures 4 and 5, the (1 1)-AS-ES dominates the other algorithms on twelve out of the 24 problems (g01, g06, g10, g13, g15, g16, g17, g20, g21, g22, g23, and g24). For none of these does the effective dimension of the reduced search space at the optimal solution exceed two. The evolution strategy reaches all targets for problems g04, g05, g07, g09, g11, g14, and g19, but algorithms from the MATLAB Optimization Toolbox reach the more stringent targets with fewer objective function evaluations. Skinny is the only algorithm to reach all targets on multimodal problems g08 and g18. SACOBRA intermittently dominates the empirical cumulative running-time distributions for problems g03, g07, g12, and g14. We interpret this dominance to be a result of the use of surrogate models of the objective function. Nearly half of the empirical cumulative running-time distributions of SACOBRA suggest that a significant number of further targets could be attained if the constraint on the number of objective function evaluations were relaxed.
In summary, if it is assumed that the constraints are explicit, then the problems from the CEC 2006 test set can be solved with vastly fewer objective function evaluations than were available in the context of the competition. The (1 1)-AS-ES is successful primarily on those problems where the effective dimension of the reduced search space at the optimal solution is low. If there are many unconstrained search directions at the optimal solution, then the relatively slow convergence of the evolution strategy puts it at a disadvantage in comparison with algorithms that are able to exploit the smoothness of most of the test problems. A large majority of the failures to achieve target objective function values can be traced to the multimodality of the problems.
5 Conclusions
The primary goal of this article is to show the potential usefulness of active-set techniques for evolutionary black-box optimization with explicit constraints. While active-set approaches are commonly used in mathematical programming, they have not received widespread use in evolutionary computation. We show that a simple evolution strategy that uses active-set techniques and that can be implemented in under 200 lines of code can solve problems from the CEC 2006 test set with numbers of objective function evaluations comparable with those required by both sequential quadratic programming and by interior-point methods, and that for some problems it appears to exhibit greater robustness. The evolution strategy is two orders of magnitude faster than the algorithms successful in the CEC competitions, which did not assume that constraint function evaluations have negligible cost compared with objective function evaluations. Embedding the (1 1)-AS-ES in a simple restart scheme would allow solving all of the problems from the test set with the exception of g02 with high probability and with comparatively few objective function evaluations. There are no fundamental obstacles to using active-set techniques in more sophisticated evolution strategies, such as those that employ covariance matrix adaptation or surrogate modelling approaches. If constraints are not explicit, then algorithms such as SACOBRA, which use vastly fewer constraint function evaluations than the (1 1)-AS-ES, need to be considered.
Acknowledgments
This research was supported by the Austrian Science Fund FWF through grant P29651-N32 and by the Natural Sciences and Engineering Research Council of Canada (NSERC) though grant RGPIN-2020-04833. We are grateful to the anonymous reviewers for their constructive comments on an earlier version of the manuscript.
Notes
Code implementing the algorithm is available at web.cs.dal.ca/∼dirk/AS_ES.tar.
SACOBRA may perform multiple objective function evaluations between checking the termination criteria, resulting in runs that exceed the mandated maximum number of objective function evaluations. As a consequence, the corresponding empirical cumulative running-time distributions may extend past 500 evaluations, but can no longer be considered accurate after that point.
References
Appendix A Relation to Earlier Algorithm Variants
The algorithm described in Section 3 has evolved from and combines aspects of two prototypical variants proposed by Arnold (2016, 2017). All variants share the same fundamental design: an active-set approach implemented in a simple (1 1)-ES. All variants project candidate solutions generated through mutation onto the feasible region before evaluating them. Constraints are added to the working set if a new best candidate solution has newly tight constraints. Constraints are periodically considered for removal from the working set in order to prevent convergence to solutions in reduced subspaces that do not contain optimal solutions.
The most significant difference between the variants is the mechanism for considering constraints for removal from the working set. That mechanism is highly important as failure to remove constraints from the working set may result in convergence to non-optimal solutions. On the other hand, considering constraints for removal from the working set too frequently results in computational overhead as each attempt incurs an objective function evaluation. The strategy described by Arnold (2016) suspends the use of the entire working set in an attempt to release constraints, either if the effective dimension of the reduced search space is zero or otherwise with a fixed probability. That simple approach allows effectively solving most of the test problems from g01 to g11, but results in occasional failure to converge to the optimal solution on problem g07. Suboptimal performance ensues when a constraint from the working set needs to be removed and the working set is relatively large. In that case, chances of generating successful offspring by suspending the use of the entire working set are small and many iterations may be needed to attain the optimal working set. On problem g07, numerical issues can sometimes be observed to arise before being able to remove the constraint, resulting in failure to locate the optimal solution.
Arnold (2017) in addition considers individual constraints for release. Whenever the step size parameter has decreased by a certain factor or the effective dimension of the reduced search space is zero, the algorithm loops through all of the constraints in the working set and releases them one by one. If the effective dimension of the reduced search space is zero, then the step size parameter is reduced rapidly. The modifications not only address the issues observed with the earlier variant on problem g07 but can also be observed to result in significantly improved performance on some linearly constrained sphere functions. Advantages of the latter algorithm variant are most pronounced on problems where constraints need to be removed from relatively large working sets. However, the algorithm can be observed to behave suboptimally on several problems of the CEC 2006 test set. First, on multimodal and severely constrained problems, such as g01, the rapid reduction of the step size if the effective dimension of the reduced search space is zero further impairs the (1 1)-ES's limited capacity for global search. And second, the rule used for triggering a review of all of the constraints present in the working set can be observed to result in unnecessary overhead for problems where there are many active constraints and the dimension of the search space is low.
The algorithm described in Section 3only attempts to remove single constraints and regains the simplicity of the original variant. The conditions that trigger a constraint to be considered for release are those from Arnold (2016), and no rapid decrease of the step size parameter is needed. A further significant change in the algorithm is the reduction of the damping constant used in the adaptation of the step size from a value proportional to the effective dimension of the reduced search space to a value proportional to the square root. The value adopted in Section 3 is in line with the setting proposed by Hansen et al. (2015) and results in the ability to converge faster on those problems where the effective dimension of the reduced search space at the optimal solution is relatively large.
Finally, some of the problems from g12 to g24 pose difficulties not present in the smaller test set considered in the earlier papers and may result in numerical issues when attempting to project points onto the reduced search space. These issues were not considered by Arnold (2016, 2017) and are addressed though the measures described in Section 3.2. As a result of the sum of these changes, the algorithm described in Section 3 can be used to solve all of the problems from the 2006 CEC test set with the exception of g02. Moreover, it can be observed to require smaller median numbers of function evaluations across successful runs than both preliminary variants on all but three of the problems from g01 to g11 (Wilcoxon rank sum test at 5% significance level).
Appendix B Test Problems
Liang et al. (2006) list 24 test problems for constrained nonlinear optimization that they have gathered from earlier sources and labelled from g01 to g24. In order to conform with the definition of constrained optimization problems in Section 1 and to ensure that optimal solutions exist, we require that for all problems the feasible region is a closed set and that the objective function is well defined anywhere on the feasible region. This necessitates changes to two problems:
- Problem g14 has lower bound constraints for . We replace those constraints with constraints for . This change results in the objective functionbeing undefined where one or more lower bound constraints are active. We continue the objective function to the constraint boundary by defining (i.e., the contribution of the th term to the sum is zero if the lower bound constraint is tight). None of the bound constraints is tight at the optimal solution.
- Problem g20 has constraintsThose constraint functions are undefined if or . In order to ensure that the feasible region is closed, we replace the constraints with constraintsFor zero constraint tolerance, points that are feasible or infeasible according to the original definition retain their feasibility or infeasibility.
Some of the remaining problems have the objective or constraint functions undefined at some points outside of the bound constrained region. Others have constraint normals diverge as the boundary of the feasible region is approached. We have not made any further modifications to address those issues and leave it to the algorithms to deal with them.
Table 2 summarizes characteristics of the test problems, listing the dimension of the search space, the number of equality constraints, the number of inequality constraints active at the optimal solutions, the effective dimension of the reduced search space at those solutions, and optimal objective function values . Problem g18 has multiple optimal solutions with different values of and . The optimal objective function values are given to twelve significant digits and have been obtained analytically where possible and by using sequential quadratic programming with a constraint tolerance of started in close proximity to the known optimal solutions otherwise. Some of the optimal function values differ by small amounts from those reported by Liang et al. (2006), presumably because the latter were obtained with a higher constraint tolerance. Our values are stable in that increasing the constraint tolerance by a factor of ten has no impact on the values reported in Table 2. Larger deviations between our figures and those reported by Liang et al. (2006) can be observed for two problems. For problem g20, Liang et al. (2006) state that no feasible solution to the problem has been found. Ensuring that the feasible region is a closed set as detailed above results in there being a feasible solution with the objective function value indicated in the table. For problem g22 the accurate solution to the problem is better than the best solution found by Liang et al. (2006).
. | . | . | . | . | . |
---|---|---|---|---|---|
g01 | 13 | 0 | 17 | 0 | 15 |
g02 | 20 | 0 | 1 | 19 | 0.803619104126 |
g03 | 10 | 1 | 0 | 9 | 1 |
g04 | 5 | 0 | 5 | 0 | 30665.5386718 |
g05 | 4 | 3 | 0 | 1 | 5126.49810960 |
g06 | 2 | 0 | 2 | 0 | 6961.81387558 |
g07 | 10 | 0 | 6 | 4 | 24.3062090682 |
g08 | 2 | 0 | 0 | 2 | 0.0958250414180 |
g09 | 7 | 0 | 2 | 5 | 680.630057374 |
g10 | 8 | 0 | 6 | 2 | 7049.24802053 |
g11 | 2 | 1 | 0 | 1 | 0.75 |
g12 | 3 | 0 | 0 | 3 | 1 |
g13 | 5 | 3 | 0 | 2 | 0.0539498477703 |
g14 | 10 | 3 | 0 | 7 | 47.7610908594 |
g15 | 3 | 2 | 0 | 1 | 961.715172130 |
g16 | 5 | 0 | 6 | 0 | 1.90515525853 |
g17 | 6 | 4 | 1 | 1 | 8853.53989133 |
g18 | 9 | 0 | 8 to 11 | 0 to 2 | 0.866025403784 |
g19 | 15 | 0 | 13 | 2 | 32.6555929502 |
g20 | 24 | 14 | 28 | 0 | 0.147466071547 |
g21 | 7 | 5 | 2 | 0 | 193.786925260 |
g22 | 22 | 19 | 3 | 0 | 236.370313315 |
g23 | 9 | 4 | 9 | 0 | 400 |
g24 | 2 | 0 | 2 | 0 | 5.50801327160 |
. | . | . | . | . | . |
---|---|---|---|---|---|
g01 | 13 | 0 | 17 | 0 | 15 |
g02 | 20 | 0 | 1 | 19 | 0.803619104126 |
g03 | 10 | 1 | 0 | 9 | 1 |
g04 | 5 | 0 | 5 | 0 | 30665.5386718 |
g05 | 4 | 3 | 0 | 1 | 5126.49810960 |
g06 | 2 | 0 | 2 | 0 | 6961.81387558 |
g07 | 10 | 0 | 6 | 4 | 24.3062090682 |
g08 | 2 | 0 | 0 | 2 | 0.0958250414180 |
g09 | 7 | 0 | 2 | 5 | 680.630057374 |
g10 | 8 | 0 | 6 | 2 | 7049.24802053 |
g11 | 2 | 1 | 0 | 1 | 0.75 |
g12 | 3 | 0 | 0 | 3 | 1 |
g13 | 5 | 3 | 0 | 2 | 0.0539498477703 |
g14 | 10 | 3 | 0 | 7 | 47.7610908594 |
g15 | 3 | 2 | 0 | 1 | 961.715172130 |
g16 | 5 | 0 | 6 | 0 | 1.90515525853 |
g17 | 6 | 4 | 1 | 1 | 8853.53989133 |
g18 | 9 | 0 | 8 to 11 | 0 to 2 | 0.866025403784 |
g19 | 15 | 0 | 13 | 2 | 32.6555929502 |
g20 | 24 | 14 | 28 | 0 | 0.147466071547 |
g21 | 7 | 5 | 2 | 0 | 193.786925260 |
g22 | 22 | 19 | 3 | 0 | 236.370313315 |
g23 | 9 | 4 | 9 | 0 | 400 |
g24 | 2 | 0 | 2 | 0 | 5.50801327160 |