We propose a novel constraint-handling technique for the covariance matrix adaptation evolution strategy (CMA-ES). The proposed technique is aimed at solving explicitly constrained black-box continuous optimization problems, in which the explicit constraint is a constraint whereby the computational time for the constraint violation and its (numerical) gradient are negligible compared to that for the objective function. This method is designed to realize two invariance properties: invariance to the affine transformation of the search space, and invariance to the increasing transformation of the objective and constraint functions. The CMA-ES is designed to possess these properties for handling difficulties that appear in black-box optimization problems, such as non-separability, ill-conditioning, ruggedness, and the different orders of magnitude in the objective. The proposed constraint-handling technique (CHT), known as ARCH, modifies the underlying CMA-ES only in terms of the ranking of the candidate solutions. It employs a repair operator and an adaptive ranking aggregation strategy to compute the ranking. We developed test problems to evaluate the effects of the invariance properties, and performed experiments to empirically verify the invariance of the algorithm. We compared the proposed method with other CHTs on the CEC 2006 constrained optimization benchmark suite to demonstrate its efficacy. Empirical studies reveal that ARCH is able to exploit the explicitness of the constraint functions effectively, sometimes even more efficiently than an existing box-constraint handling technique on box-constrained problems, while exhibiting the invariance properties. Moreover, ARCH overwhelmingly outperforms CHTs by not exploiting the explicit constraints in terms of the number of objective function calls.

We consider explicitly constrained black-box continuous minimization problems defined as
(1)
where f:RnR is the objective function and gj:RnR (j=1,,m) are the constraint functions. The equality constraints hk(x)=0,k=1,,l are assumed to be transformed into inequality constraints gk(x)=|hk(x)|-ɛeq, with a numerical tolerance ɛeq>0. In the black-box optimization scenario, the evaluation of the objective function f often requires computationally expensive simulation, while the explicit constraints can be computed independently of f-calls, and their computational cost is significantly lower than that of f-calls. These can be presented in a relatively simple mathematical expression, and their gradients are available symbolically or can be estimated numerically with a relatively low computational cost. The most common example of an explicit constraint is the box constraint, in which each coordinate of the design variable x is constrained in a closed interval. Explicit constraints often appear in engineering optimization as prerequisites for executing the simulation to compute the objective function. Taking into account the abovementioned situations, in this study, we assume that
  • the computational time for gj(x) and its (numerical) gradient gj(x) are negligible compared to that for f(x); and

  • the objective function f(x) is not necessarily defined for an infeasible solution x, which violates some gj(x).

The covariance matrix adaptation evolution strategy (CMA-ES) (Hansen and Ostermeier, 2001; Hansen et al., 2003; Hansen, 2016) is employed in this study as the baseline optimization method for solving the problem (1). The CMA-ES is known as one of the state-of-the-art zeroth-order optimization algorithms for unconstrained black-box continuous optimization. In particular, it has been demonstrated as efficient for difficult objective functions such as non-convex, ill-conditioned, and nonseparable functions. Since these difficulties of objective functions naturally appear in engineering optimization, whether or not constraints exist, it is desirable for search algorithms for constrained optimization to operate efficiently under these difficulties. Although the CMA-ES is designed for unconstrained optimization, it can be applied to constrained optimization with the aid of constraint-handling techniques (CHTs).

One key characteristic of the CMA-ES is its invariance to several transformations of the objective function and search space (Hansen et al., 2011). The invariance properties induce equivalent problem classes. All instances (problems) in an equivalent class are regarded as equivalent under the corresponding transformation of the initial search distribution. Owing to the invariance properties of the CMA-ES, if its initial state is transformed properly, it is empirically observed that the CMA-ES minimizes ill-conditioned and nonseparable quadratic functions as efficiently as well-conditioned and separable spherical functions. This is key to the success of the CMA-ES in real-world problems because it is designed to adapt the distribution to such a proper state. Although the invariance properties themselves do not imply algorithm efficacy, they are useful for generalizing observations. That is, they are essential for assessing the performance of algorithms empirically. The invariance further contributes to the quasi-parameter-free feature of the CMA-ES, which is an important characteristic of the CMA-ES that has attracted attention from practitioners. As opposed to many other evolutionary approaches to continuous black-box optimization, in which the hyper-parameters are required to be tuned depending on the problem characteristics to enable efficient performance (Karafotias et al., 2015), default values are prepared for all hyper-parameters of the CMA-ES, depending only on the search space dimension n. As many advantages of the CMA-ES for unconstrained optimization originate from its invariance properties, we hypothesize that a CHT for explicitly constrained optimization is desirable to preserve the invariance as much as possible to use the CMA-ES for constrained optimization.

However, the CHTs for explicit constraints employed in variants of the ES are not designed to preserve the invariance of the baseline search algorithms, as we discuss briefly in Section 3. When the CMA-ES is applied to an explicitly constrained black-box optimization problem with these CHTs, it loses the invariance properties exhibited by the CMA-ES. Certain CHTs, such as death penalty or resampling techniques, are applicable to the CMA-ES and preserve its invariance properties. However, as these approaches do not exploit the fact that the constraint violations are cheap to evaluate and their gradient information is available, they are often inefficient, as observed in Section 6.

We propose a novel CHT for the CMA-ES in solving the explicitly constrained optimization problem (1), named adaptive ranking-based constraint handling (ARCH). ARCH is designed to include the following two types of invariance properties: invariance to the element-wise increasing transformation of the objective and constraint functions, and invariance to the affine transformation of the search space. ARCH replaces the evaluation step in the sampling–evaluation–update cycle of the CMA-ES, as follows: A candidate solution generated by the CMA-ES is first repaired on a boundary of the feasible domain. The objective function value is evaluated at the repaired point. A penalty for the repair operation is computed by the Mahalanobis distance between the original and repaired solutions under the covariance matrix of the current search distribution. The candidate solutions are ranked based on the adaptive weighted sum of the rankings of the objective function values and rankings of the penalty values. An adaptive penalty coefficient is introduced to control the balance between the rankings, and is adapted for the search distribution so as not to move away from the feasible domain in terms of the Mahalanobis distance, while allowing infeasible but near boundary solutions to exhibit high rankings.

The contributions of this study are summarized as follows. Firstly, we present ARCH, which can handle explicit and nonlinear constraints.1 We prove that ARCH is invariant to any element-wise increasing transformation of f and gj, and to any affine transformation of the search space coordinates. To the best of the authors' knowledge, this is the first approach for explicit constraints that is invariant to these transformations. Secondly, we empirically evaluate the effectiveness of the proposed approach from an invariance perspective. We develop test problems to demonstrate the effects of the invariance to affine transformation. In test cases, we empirically observe that ARCH is invariant to affine transformations, and illustrate that ARCH is even more effective for box constrained optimization problems than an existing CHT specialized for the box constraint. Thirdly, we compare ARCH with existing CHTs for non-explicit constraints on problems in which the objective function is defined for the infeasible domain. We use the CEC 2006 constrained optimization benchmark suite to demonstrate that ARCH overwhelmingly outperforms other CHTs for non-explicit constraints. This indicates that the explicit constraints should be treated as explicit constraints even if the objective function is defined on an infeasible domain. Compared to the previous publication in Sakamoto and Akimoto (2019), we i) improve the algorithm to prevent the search distribution from being unnecessarily biased toward the boundary if the population size is larger than the default, ii) prove the invariance properties of ARCH, and iii) compare ARCH with existing approaches on the CEC 2006 testbed.

The remainder of this article is organized as follows. We summarize the mathematical notations applied throughout the article below. Section 2 introduces the baseline optimization algorithm, namely the CMA-ES, and related CHTs. We discuss the invariance properties for constrained optimization problems in Section 3. Our proposed CHT, ARCH, is described in Section 4. The invariant properties of ARCH are demonstrated in Section 5. In Section 6, we empirically demonstrate how the invariance properties operate in practice, by means of numerical experiments on linearly constrained quadratic problems, to observe the efficacy of ARCH. We present our comparison of ARCH with other CHTs using the CEC 2006 constrained optimization testbed in Section 7. The article is concluded in Section 8.

Notations In the following, R is the set of real numbers and R+ is the set of strictly positive real numbers. Let xRn be an n-dimensional column vector, where xT is its transpose, x denotes the Euclidean norm of x, and [x]i denotes the ith coordinate of x. Note that the ith coordinate of the kth vector xk is denoted by [xk]i. The (i, j)th element of a matrix A is also denoted by [A]i,j. The identity matrix is denoted by In. The indicator function 1{condition} returns 1 if condition is true, and 0 otherwise. The sign function sgn(a) returns 1 if a>0, -1 if a<0, and 0 otherwise. The integer interval between and including a and b is denoted by [[a,b]].

In this section, we introduce the CMA-ES, which is our baseline algorithm for unconstrained continuous optimization, followed by an overview of the existing CHTs for the CMA-ES.

2.1 CMA-ES

The CMA-ES (Hansen et al., 2003; Hansen, 2016) is a stochastic multi-point search algorithm for black-box continuous minimization of f:RnR. The CMA-ES samples λ candidate solutions xk for k{1,,λ} from the multivariate normal distribution N(m,σ2C), where mRn is the mean vector, σR+ is the step size, and CRn×n is the covariance matrix. These distribution parameters are updated using the candidate solutions and their ranking information.

Step0. Initialize m(0),σ(0),C(0) according to the initial problem search domain, and initialize two evolution paths pc(0)=pσ(0)=0 and their correction factors γσ(0)=γc(0)=0. All parameters that appear in the following are set to the default values listed in Table 1. The meanings of these parameters are described in Hansen (2016). These are designed based on theoretical ES research (e.g., see Akimoto et al., 2020) and extensive experiments. The CMA-ES repeats the following steps at each iteration, t=0,1,, until a termination criterion is satisfied.

Table 1:

Default parameter setting for CMA-ES.

λ=4+3ln(n),μ=λ/2 
wi=ln(λ+12)-ln(i)k=1μln(λ+12)-ln(k),cσ=μw+2n+μw+5,cc=4+μw/nn+4+2μw/n 
c1=2(n+1.3)2+μw,cμ=min1-c1,2(μw-2+1/μw)(n+2)2+μw 
dσ=1+cσ+2×max0,μw-1n+1-1 
λ=4+3ln(n),μ=λ/2 
wi=ln(λ+12)-ln(i)k=1μln(λ+12)-ln(k),cσ=μw+2n+μw+5,cc=4+μw/nn+4+2μw/n 
c1=2(n+1.3)2+μw,cμ=min1-c1,2(μw-2+1/μw)(n+2)2+μw 
dσ=1+cσ+2×max0,μw-1n+1-1 

Step1. Draw λ samples zk for k{1,,λ} independently from N(0,In). Compute yk=C(t)zk and xk=m(t)+σ(t)yk. Then, xk (k=1,,λ) are the candidate solutions that are independently N(m(t),(σ(t))2C(t)) distributed. Here, C(t) is the symmetric matrix satisfying C(t)=C(t)2.

Step2. Evaluate the candidate solutions xk, for k{1,,λ}, on the loss function L, and sort them in ascending order. In an unconstrained optimization scenario, usually, L=f. Let the ith best candidate solution be denoted by xi:λ. In the same manner, we denote the corresponding steps and normalized steps as yi:λ and zi:λ, respectively.

Step3. Compute the weighted sum of the μ best steps of the candidate solutions yw=i=1μwiyi:λ and update the mean vector m(t), as follows:
where wi is the recombination weight for the ith best candidate, which satisfies w1w2wμ>0 and i=1μwi=1.
Step4. Update the evolution paths according to
where cσ and cc are the cumulation factors for the evolution paths, μw=1/i=1μwi2,
and χ=E[N(0,In)]n121-14n+121n2 is the expectation of the norm of the n variate standard normal distribution. The Heaviside function hσ(t+1) stalls the update of pc(t+1) if pσ(t+1) is large. The correction factors for the evolution paths2 are updated as follows:
Step5. Update the step size and covariance matrix as follows:
where dσ is the damping parameter for the step size adaptation, while c1 and cμ are the learning rate for the rank-one and rank-μ updates of the covariance matrix, respectively.

2.2 CHTs for ESs

We briefly review the CHTs employed in variants of ESs.

2.2.1 Resampling and Death Penalty

The resampling technique is the simplest CHT. Candidate solutions are resampled repeatedly until λ feasible candidate solutions are generated. To guarantee that the resampling stops within a finite time, the maximum number of sampling in one iteration is set to a finite number. If the number of feasible candidate solutions is less than λ, the current population is filled with infeasible solutions and the worst loss value is assigned to them (for example, +). When the maximum sampling number is set to λ, the resampling technique is known as the death penalty.

The resampling and death penalty methods are easy to implement and are applicable to any constraint type. However, they are not appropriate if the optimum is located on the boundary of the feasible domain, as candidate solutions are biased in the feasible domain and the search distribution tends to approach the boundary slowly, as we observe in Section 6. Moreover, the search algorithm cannot conduct a meaningful ranking of the candidate solutions if the probability of sampling feasible solutions is rather low and the population is filled with infeasible solutions. If this is the case, the parameter update will result in random fluctuation.

2.2.2 Penalty Function Methods

CHTs based on the penalization of infeasible candidate solutions are the most extensively used CHTs for real-world engineering optimization problems. The main concept of penalty function methods is transforming a constrained optimization problem into an unconstrained optimization problem by defining the penalized loss function:
(2)
where p is a penalty function. If the objective function value is not well defined for infeasible solutions, a repair operation needs to be applied, and the first term is replaced with the objective function value evaluated at the repaired solution. The penalty function is manually designed in many engineering optimization problems, and a typical choice is the weighted sum of the constraint violations max(gj(x),0) or its monotone transformation, such as square.
The adaptive penalty box constraint handling (AP-BCH) method (Hansen et al., 2009) is a penalty function-based box CHT. The loss function is defined as
(3)
where γi is an adaptive penalty coefficient and xfeas is a feasible vector closest to the infeasible solution x; that is, xfeas=argminyx-y. The feasible solution xfeas is only used for evaluating the objective function and the penalty; that is, the repair operator is used in the Darwinian manner. This approach does not assume that the objective function is well defined outside the feasible domain, but it is only applicable to box-constrained problems.
The adaptive augmented Lagrangian constraint handling (AL) method (Arnold and Porter, 2015; Atamna et al., 2016) adapts the augmented Lagrangian
where γjR is a Lagrange factor, ωjR+ is a penalty coefficient, and both are adapted during the optimization process. AL was initially proposed for (1+1)-ES (Arnold and Porter, 2015), and was extended to the CMA-ES in a single constraint case (Atamna et al., 2016), where the median success rule was applied for the step-size adaptation. The AL is designed for implicit constraints, in which the implicit constraint means that the evaluation is expensive or is computed at the same time as the objective function. It requires the objective function to be defined in the infeasible domain.

2.2.3 Ranking-Based Methods

Ranking-based CHTs aggregate the rankings of the objective function values and constraint function values to create the final rankings of the candidate solutions, instead of aggregating the function values. The stochastic ranking technique (Runarsson and Yao, 2000) attempts to balance the objective and constraint functions by sorting the candidate solutions according to the objective function with a probability Pf, and the sum of the constraint function values with a probability 1-Pf. The multiple constraint ranking (MCR) technique (de Paula Garcia et al., 2017) ranks the candidate solutions according to the sum of the rankings of the objective values and rankings of each constraint violation value. Other techniques included in ranking-based CHTs such as the lexicographic ordering and the ɛ-lexicographic ordering have been proposed in GA and DE, and imported to ES (Oyman et al., 1999; Hellwig and Beyer, 2018). The advantage of these approaches over penalty function-based approaches is that they are invariant to strictly increasing transformation of the objective function and constraint functions, which we later refer to as element-wise increasing transformation. Therefore, practitioners do not need to tune the balance between the objective function values and constraint violation values manually. As with the AL, however, this method requires the objective function to be defined in the infeasible domain.

2.2.4 Active Constraint Handling

There are certain CHTs that modify the covariance matrix adaptation mechanism to shrink the variance actively in the direction of the constraint function gradient so as to decrease the likelihood of infeasible solutions. We refer to such methods as active constraint handling (ACH) in this article. Reference Arnold and Hansen (2012) proposed the use of the active covariance matrix update in the (1+1)-CMA-ES. Similar concepts have been employed in other variants of the CMA-ES, such as the (μ,λ)-CMA-ES in Chocat et al. (2015), MA-ES in Spettel and Beyer (2019), and xCMA-ES in Krause and Glasmachers (2015). As ACH techniques use binary information, whether or not the constraint is violated, they are invariant to monotone transformations of the objective and constraint violations. Such methods can be applied to problems in which the constraints return the outcome that the given solution is either feasible or not. However, this approach tends to be inefficient compared to other CHTs on quantifiable constraints, as it does not utilize the amount of constraint violations.

2.2.5 Other Explicit CHTs

The linear constraint covariance matrix self-adaptation ES (lcCMSA-ES) (Spettel et al., 2019) handles explicit and linear constraints in a variant of CMA-ES, known as CMSA-ES. It basically samples only feasible solutions, and updates the distribution parameters by using the feasible solutions. Active-set ES (Arnold, 2016, 2017) is also designed for explicit, but not necessarily linear, constraints. This approach is the most relevant one from the perspective of the assumptions on the constraint problem. It is a (1+1)-ES based approach that applies a repair operator in the Lamarckian manner; that is, the repair solution is used as the candidate solution, and not only to compute the loss function value. Unfortunately, it is not possible to directly extend it to the state-of-the-art variant of the ES, namely (μ, λ)-CMA-ES.

2.3 Formal Classification of CHTs

We summarize the formal classification of the abovementioned CHTs in Table 2. The taxonomy proposed in Le Digabel and Wild (2015) classifies constraints as follows: Quantifiable/Nonquantifiable, Relaxable/Unrelaxable, Simulation-based/A priori, and Known/Hidden. A Quantifiable constraint is a constraint for which the amount of feasibility and/or violation can be quantified, while a Nonquantifiable constraint returns a binary output indicating whether or not the solution satisfies the constraint. A Relaxable constraint is a constraint that does not need to be satisfied to compute the objective function, while an Unrelaxable constraint is a prerequisite for executing the simulation for the objective function computation. An A priori constraint is a constraint for which the feasibility can be confirmed without running a simulation; that is, this constraint can be formulated using optimization variables such as g(x)=i=1n[x]i1, while a Simulation-based constraint is only computed through a computationally expensive simulation. A Known constraint is a constraint that is explicitly provided in the problem formulation; for example, minf(x) s.t. g(x)0. Constrained problems can be expressed by combining these initial letters as an acronym, such as QRSK. Refer to Le Digabel and Wild (2015) for further descriptions of each type of constraint and example situations. Our assumption on the constraints is QUAK in this terminology.

Table 2:

Classification of CHTs, where bnds/lc/nlc means that the CHT can handle bound constraints, linear constraints, or nonlinear constraints, respectively.

CHT (bnds/lc/nlc)TaxonomyInvariance
ARCH [proposed CHT, described in Section 4] (nlc) QUAK increasing / affine 
AP-BCH (Hansen et al., 2009) (bnds) QUAK × / × 
lcCMSA-ES (Spettel et al., 2019) (lc) QUAK increasing / × 
Active-set ES (Arnold, 2017) (nlc) QUAK increasing / × 
AL (Atamna et al., 2016) (nlc) QRSK × / affine 
Stochastic ranking (Runarsson and Yao, 2000) (nlc) QRSK × / affine 
MCR (de Paula Garcia et al., 2017) (nlc) QRSK increasing / affine 
(1+1)-CMA-ES with ACH (Arnold and Hansen, 2012) (nlc) NUSK increasing / affine 
(μ,λ)-CMA-ES with ACH (Chocat et al., 2015) (nlc) NRSK increasing / affine 
xCMA-ES with ACH (Krause and Glasmachers, 2015) (nlc) NUSK increasing / affine 
MA-ES with ACH (Spettel and Beyer, 2019) (nlc) NUSK increasing / affine 
Resampling technique (nlc) NUSH increasing / affine 
CHT (bnds/lc/nlc)TaxonomyInvariance
ARCH [proposed CHT, described in Section 4] (nlc) QUAK increasing / affine 
AP-BCH (Hansen et al., 2009) (bnds) QUAK × / × 
lcCMSA-ES (Spettel et al., 2019) (lc) QUAK increasing / × 
Active-set ES (Arnold, 2017) (nlc) QUAK increasing / × 
AL (Atamna et al., 2016) (nlc) QRSK × / affine 
Stochastic ranking (Runarsson and Yao, 2000) (nlc) QRSK × / affine 
MCR (de Paula Garcia et al., 2017) (nlc) QRSK increasing / affine 
(1+1)-CMA-ES with ACH (Arnold and Hansen, 2012) (nlc) NUSK increasing / affine 
(μ,λ)-CMA-ES with ACH (Chocat et al., 2015) (nlc) NRSK increasing / affine 
xCMA-ES with ACH (Krause and Glasmachers, 2015) (nlc) NUSK increasing / affine 
MA-ES with ACH (Spettel and Beyer, 2019) (nlc) NUSK increasing / affine 
Resampling technique (nlc) NUSH increasing / affine 

CHTs assuming Unrelaxable constraints can be applied to QUAK constraints. For example, the resampling technique can be applied to QUAK constraints. However, CHTs assuming weaker conditions on constraints utilize less information on the constraints than that available to the optimization approaches. Therefore, we expect that CHTs assuming QUAK constraints are more efficient for solving QUAK constrained optimization problems. Only two approaches in Table 2, including the proposed approach, are designed for QUAK nonlinear constraints. As it is not clear how active-set ES is extended to a variant of the CMA-ES, the proposed approach is the only approach applied to the CMA-ES. However, if the constraints are nonlinear but Relaxable, many of the CHTs listed in Table 2 can be applied.

We describe two invariance properties that are desirable for a CHT that is designed for variants of the CMA-ES.

3.1 Element-Wise Increasing Transformation of Functions

An strictly increasing transformation h:RR is a function satisfying h(t)<h(s) if t<s. Invariance to an increasing transformation of the objective function in an unconstrained optimization scenario refers to the property whereby the algorithm does not change the behavior (which is possibly characterized by the sequence of the solutions generated by the algorithm) when solving f and its composite hf. Algorithms that are invariant to any increasing transformation can solve, for example, a non-convex discontinuous function hf as easily as a convex continuous functions f. The importance of this invariance property is extensively recognized in engineering optimization: if the search algorithm is not invariant, the objective function needs to be tuned for the search algorithm to perform effectively, which is time consuming. Numerous evolutionary algorithms, including the CMA-ES, are invariant to any increasing transformation of the objective function, because they use only the objective function value rankings of the candidate solutions.

In constrained optimization, we consider invariance to an element-wise increasing transformation H=(h0,,hm):Rm+1Rm+1 of the objective and constraint functions F=(f,g1,,gm), where hj:RR (j=0,,m) is an increasing transformation and hj for j=1,,m satisfies hj(0)=0. Invariance to an element-wise increasing transformation of the objective and constraint functions refers to the property whereby the algorithm does not change the behavior when solving a constrained problem F=(f,g1,,gm)=(h0f,h1g1,,hmgm)=HF. The original constrained optimization problem F and its transformation HF models the same optimization problem in the sense that they define the same feasible domain X and the same total order on X regarding the objective function value.3

In real-world applications, the ranges of the objective function values and constraint violations are often quite different. Algorithms without this invariance will suffer from this difference and place implicit priority on an objective or certain constraints depending on their values. Therefore, practitioners may determine a reasonable transformation H. As described above for unconstrained optimization cases, this can be time consuming, and requires domain knowledge of the problem and deep insight into the optimization algorithm.

Although this invariance property is a straightforward extension of invariance to the increasing transformation of the objective function and it is seemingly important, it is not exhibited by the frequently used penalty function-based techniques that take the sum of the objective and constraint function values as loss values. For example, it is clear that the AL does not exhibit this invariance: it is not invariant to increasing the transformation of f and gj. Although the AP-BCH is invariant to any increasing transformation of gj, it is not invariant to an increasing transformation of f in general, as the quadratic penalty term is added to the objective function value directly. It has been demonstrated in Sakamoto and Akimoto (2017) that the AP-BCH deteriorates when the objective function is, for example, an exponential function, where the objective function value is more sensitive than the quadratic penalty term.

3.2 Affine Transformation of Search Space Coordinates

To formulate the invariance properties for constrained optimization problems, we consider the fact that our minimization problem is defined on an n-dimensional inner product space (V,·,·) on the real field R:
(4)
where fV:VR and gjV:VR are the objective and constraint functions, respectively. The constrained problem (1) is considered a realization of (4) under an orthonormal basis {eiV}i=1n and a bias vector e0V, where ei,ei=1 for i=1,,n and ei,ej=0 for any ji (i,j1), and xRn corresponds to p=e0+i=1n[x]ieiV. Let px be denoted by ψe:VRn. If the optimization algorithm is defined on the inner product space, its behavior (which is possibly characterized by the sequence of the solutions generated by the algorithm) is identical on any basis {viV}i=1n with a bias vector v0V, where the bases are not necessarily orthonormal to one another. Let p=v0+i=1n[x˜]ivix˜ be denoted by ψv:VRn. The map A:ψeψv-1 from a coordinate system ψv to a coordinate system ψe is an affine transformation. The invariance to an affine transformation of the search space coordinates refers to the property whereby the algorithm behaves the same on the original coordinate system and its affine transformed coordinate system.
The objective function fV(p)=12p,p is expressed as fe(x)=fV(ψe-1(x))=i=1n[x-ψe(e0)]i2 in the system ψe, while it is expressed as fv(x˜)=fe(A(x˜))=i=1n[Ax˜-ψe(v0)]i2 in the system ψv. The former is the sphere function (well conditioned and separable), while the latter is a convex quadratic function that is ill conditioned if A also is, and is nonseparable if A is not diagonal. If the algorithm is invariant to any affine transformation of the search space coordinates, the algorithm solves the ill-conditioned and nonseparable function fv as efficiently as it solves the well-conditioned and separable fe under the corresponding transformation of the initial search distribution. Not all variants of the CMA-ES, including that presented in Section 2.1, are (proven to be) invariant to any affine transformation of the search space coordinate,4 although we empirically observe statistically invariant behaviors under arbitrary affine transformations (see Figure 3). Numerous other evolutionary computation approaches are not invariant to these transformations, and we empirically observe rather different performances depending on the affine transformation properties.
Figure 1:

For example, J(x1)={1,2,3} and A(x1)=, then x1 is repaired by using Eq. (6).

Figure 1:

For example, J(x1)={1,2,3} and A(x1)=, then x1 is repaired by using Eq. (6).

Close modal
Figure 2:

Same problem with different coordinate systems.

Figure 2:

Same problem with different coordinate systems.

Close modal
Figure 3:

Median (line) and 25% to 75%-ile range (band) over 100 trials.

Figure 3:

Median (line) and 25% to 75%-ile range (band) over 100 trials.

Close modal

In real-world applications, the change in the coordinate system corresponds to the change in the features describing the object to be optimized, or the change in the unit in each feature. In a constrained optimization scenario, the constraint function gj and the objective function f are transformed by the same transformation A, resulting in gjA and fA in the transformed coordinate system, respectively. The linear constraints are again linear in the transformed coordinate system. However, a box constraint will no longer be a box constraint, but rather a set of linear constraints (forming an n-parallelotope shape).

Suppose that the underlying unconstrained optimization algorithm is invariant to any affine transformation of the search space coordinates. As summarized in Table 2, CHTs that only touch the loss function values, such as the resampling technique, ACH, and AL, do not disturb the invariance property of the underlying algorithm, resulting in invariance to the affine transformation of the search space under the constraints (see Atamna et al. (2020) for proof of the affine invariance of AL). However, CHTs for unrelaxable constraints that require a repair operator often lose the invariance property. For example, the AP-BCH does not exhibit this invariance property, as the repair operator used in this approach exploits the fact that the feasible domain is composed of an interval in each coordinate. Repair operators that use the inner product or the distance in Euclidean space are generally affected by the affine transformation of the coordinate system.

We propose an explicit constraint-handling method based on adaptive ranking, known as ARCH. ARCH virtually transforms the constrained optimization problem into an unconstrained problem by defining an adaptive loss function L. The loss L is defined by the weighted sum of the rankings of the objective and constraint violations, denoted as L=RT below. The proposed algorithm is designed to exhibit the invariance properties listed in Section 3, and it does not require the objective function to be well defined in the infeasible domain.

4.1 Repair Operator

To make ARCH applicable to a problem in which the objective function values are not defined in the infeasible domain, we employ a repair operator in the Darwinian manner. Given a candidate solution x, ARCH determines a repaired solution x˜. The repaired solution x˜ is only used for the objective function value computation.

The repair operator Repair(·) is defined as follows: Let S={xRngj(x)0,j[[1,m]]} be the feasible domain. Given a solution x, let J(x)={j[[1,m]]gj(x)>0} be the set of indices of the violated constraints. Let A(x)={yRngj(y)=0,jJ(x)} be the intersection of the violated constraint boundaries (see Figure 1). We introduce the repair operator defined as follows:
where x-yΣ-12=(x-y)TΣ-1(x-y) is the Mahalanobis distance between x and y under the inverse matrix Σ-1=(σ2C)-1. This is a constrained minimization of a quadratic function, which is solved by standard numerical optimization routines. If the constraints are all linear, the problem is reduced to a quadratic programming problem. Equation (6) (see Figure 1) is the nearest feasible solution to x, while Eq. (5) is the feasible solution nearest to x under the constraint that the active constraints at x remain active at x˜. Eq. (5) is preferred over Eq. (6).

The condition that A(x)S is indeed simplified to A(x) when the constraints are all linear and not redundant. Here, a redundant constraint gj is defined as a constraint such that the boundary of gj is never in contact with the feasible domain; that is, {xRngj(x)=0}S. Note that determining whether A(x) is as easy as checking whether a system of linear equations has a solution. In explicit linear constraint situations, such a redundant constraint can be removed in advance.

In this approach, the Mahalanobis distance is employed rather than the Euclidean distance x-y2=(x-y)T(x-y), in order to achieve affine invariance of the search space coordinate system. This point is discussed formally in Section 5.

4.2 Total Ranking

The total ranking RT(xk) for k{1,,λ} at each iteration t{0,1,} is a weighted sum of the rankings of the candidate solutions based on the objective function values, Rf(xk), and based on the Mahalanobis distance to the feasible domain, Rg(xk), namely:
(7)
where α(t+1) is known as the ranking coefficient that controls the balance between the objective and constraints. The rankings Rf(xk) and Rg(xk) are defined as follows:
The f-ranking, Rf(xk), is the number of better candidate solutions in terms of f, plus the number of tie candidate solutions divided by 2, namely
(8)
where x˜i=Repair(xi). Note that the sum of Rf(xk) for k{1,,λ} is λ2/2. Moreover, in an unconstrained optimization scenario, the probability of sampling tie solutions is often zero, while the second term above may be nonzero with a nonzero probability in our situation owing to the repair operation.
The g-ranking, Rg(xk), is analogously defined by simply replacing fRepair with gΣ:
(9)
where gΣ(x)=x-x˜Σ-12 and Σ=(σ(t))2C(t).

4.3 Adaptation of Ranking Coefficient

The ranking coefficient α controls the balance between Rf and Rg. If α is too large, the search distribution is biased toward the feasible domain. If α is too small, the search distribution is biased toward the infeasible domain. Therefore, the adaptation of α significantly influences the search performance.

The adaptation of α is based on a theoretical study of the weighted recombination ES on a spherical function. It was reported in Akimoto et al. (2020) that, in the optimal situation in terms of quality gain (Beyer, 2001)
(10)
where msph and xsph* are the mean vector and optimum on a spherical function, respectively; σ¯* is the optimal normalized step size, which is approximated by σ^; c=-i=1μwiE[Ni:λ] is the weighted average of the expected value of the normal order statistics from λ samples and is usually in O(1); and μeff=(i=1μwi2)-1 is the variance effective selection mass.

Ideally, the CMA-ES with the proposed constraint handling should treat a constrained sphere function as though it is an unconstrained sphere function. That is, we wish (10) to hold even for a constrained sphere problem.

Assuming that the optimum of a constrained n-dimensional sphere problem having n constraints is located on the n boundaries, we estimate the left-hand side of (10) using the Mahalanobis distance between m violating n constraints and m˜=Repair(m):
(11)
We define dm(t+1) using the parameter at iteration t (m=m(t),Σ-1=((σ(t))2C(t))-1), as follows:
(12)
where cact=|{jgj(m˜)=0,j}| is the number of active constraints at the repaired mean m˜. From the perspective of (10) and (11), we wish to maintain dm1 if the denominator is n2. However, in (12), we replace n2 with n(n/2+cact), leading to a variation in the denominator in [n2/2,3n2/2]. The motivation is to incorporate the fact that the numerator is expected to be smaller if cact is smaller, as the projection is performed only on this cact-dimensional subspace. Moreover, we introduce the exponential term to prevent the search distribution from being unnecessarily biased toward the boundary if the population size λ is larger than the default λdef. If λΩ(n), σ^ will be in O(n), while σ^O(μw) if λn. Then, to maintain dm1, m needs to be closer to m˜, as λ is greater. That is, a larger λ results in the search distribution being more biased toward the feasible domain, and less efficient search performance. To mitigate this problem, we allow m to be up to e times away from m˜ by introducing the exponential term when λ is greater than λdef. We adapt α so that dm will remain approximately 1, as follows:
Ranking Coefficient Adaptation Initialize α(0)=1, dm(0)=0. At iteration t, α(t) is updated as
(13)
only if sgn(dm(t+1)-1)=sgn(dm(t+1)-dm(t)) or dm(t+1)=0, the latter of which is necessary to decrease α when the mean vector remains in the feasible domain; that is, dm(t)=dm(t+1)=0. Following the update, α(t+1) is clipped to [1/λ,λ], because α<1/λ and α>λ result in Rg and Rf being ignored, respectively.5
We prove that ARCH is invariant to the problem transformations described in Section 3. ARCH receives the candidate solutions (xk)k=1λ, and distribution parameters m and Σ=σ2C, from the CMA-ES, and returns the total rankings RT(xk)k=1λ of the candidate solutions to the CMA-ES. Meanwhile, it maintains the penalty coefficient α and retains dm for the next update. Therefore, the functionality of ARCH can be formulated as
(14)
where F:x(f(x),g1(x),,gm(x)) is the vector-valued function consisting of the objective and constraints. In the following, we prove that the outputs of (14) remain unchanged by the abovementioned transformations.

5.1 ARCH on Inner Product Space

We begin by defining ARCH on an inner product space (V,·,·).

A random vector XV on the inner product space (V,·,·) has a normal distribution N(μV,ΣV) if v,X has the normal distribution N(v,μV,v,ΣV(v)) for any vV, where μVV is the mean vector and ΣV:VV is the covariance (Eaton, 2007). Let FV=(fV,g1V,,gmV) be the objective and constraint functions defined on V as (4). ARCH on the inner product space receives these distribution parameters μV and ΣV, candidate solutions p1,,pλ drawn from the normal distribution, and the objective and constraint functions FV.

Let p and q be the vectors in V. The Mahalanobis distance between p and q under the covariance ΣV is defined as
Let SV={pVgjV(p)0,j[[1,m]]} be the feasible domain. Given a solution pV, let JV(p)={j[[1,m]]gjV(p)>0} be the set of indices of the unsatisfied constraints, and let AV(p)={qVgjV(q)=0,jJV(p)} be the intersection of the violated constraint boundaries. The repair operation is defined as follows:
(15)
Let dm and α be computed as in (12) and (13), where m, m˜=Repair(m), Σ-1, and cact are replaced with μV, μ˜V=RepairV(μV), ΣV-1, and cactV=|{jgjV(μ˜V)=0,j}|, respectively. Analogously to Eq. (7) to (9), we compute
(16)
where
(17)
(18)
in which gΣV(p)=p-p˜ΣV-12 and p˜=RepairV(p).
The operation of ARCH on the inner product space (V,·,·) is then expressed as
(19)
Note that ARCHV is defined without relying on any coordinate system. This implies that ARCHV is invariant to any coordinate system transformation.

5.2 Invariance to Affine Transformation of Search Space Coordinates

Firstly, we demonstrate that ARCH is invariant to an arbitrary affine transformation of the search space coordinate system. For this purpose, it is sufficient to prove that the operation of ARCH (14) on any given coordinate system x=ψ(p) is equivalent to the operation of ARCH (19) on the inner product space (V,·,·).

Note that the normal distribution N(μV,ΣV) on V corresponds to the normal distribution N(μψ,Σψ) on Rn in the coordinate system ψ:VRn, where μψ=ψ(μV) and [Σψ]i,j=ei,ΣV(ej). The objective and constraint function F:RnRm+1 corresponding to FV:VRm+1 is expressed as F=FVψ-1.

Theorem 1.
Let ψ:VRn be an arbitrary coordinate system. Let μVV be an arbitrary vector and ΣV:VV be an arbitrary positive definite symmetric linear transformation. Let μψ=ψ(μV) and [Σψ]i,j=ei,ΣV(ej) and F=FVψ-1, assume that the feasible domain S is convex. Then, for any pkV for k=1,,λ, α>0 and dm>0,

5.3 Invariance to Element-Wise Increasing Transformation

Next, we demonstrate that ARCH is invariant to an arbitrary element-wise increasing transformation. That is, the ranking and internal parameter updates are not affected by the transformation.

Theorem 2.
Let H=(h0,h1,,hm) be an arbitrary element-wise increasing transformation; assume that the feasible domain S is convex. For any xkRn for k=1,,λ, α>0, and dm>0,

5.4 Invariance Properties of Entire Algorithm

Finally, we discuss the invariance properties of the search algorithm including ARCH.

Suppose that the underlying unconstrained optimization algorithm is defined on an inner product space, and the following steps are repeated:

  1. p1,,pλ=sample(μV,ΣV,θV);

  2. Rf(p1),,Rf(pλ)=evaluate((pk)k=1λ,fV); and

  3. μV,ΣV,θV=update((pk,Rf(pk))k=1λ,μV,ΣV,θV),

where θV contains all of the information used in the algorithm. By nature, it is invariant to any strictly increasing transformation of the objective function fV, as the outputs of evaluate are the rankings of the objective function values, and it is invariant to any strictly increasing transformation h. Moreover, as it is defined independently of a coordinate system, its implementation under a given coordinate system ψ produces ψ(p1),,ψ(pλ),μψ,Σψ,θψ at any iteration t, where p1,,pλ are the candidate solutions mentioned above, while μψ, Σψ, and θψ are the expressions of μV, ΣV, and θV, respectively, in the coordinate system ψ.

When solving an explicitly constrained minimization problem FV, we simply replace the evaluation step as follows:

  1. p1,,pλ=sample(μV,ΣV,θV);

  2. RT(p1),,RT(pλ),α,dm=ARCHV((pk)k=1λ,μV,ΣV-1,α,dm,FV); and

  3. μV,ΣV,θV=update((pk,RT(pk))k=1λ,μV,ΣV,θV).

As all of the operations are defined independently from a coordinate system, the entire algorithm is invariant to any affine coordinate system transformation. Moreover, because ARCHV is invariant to any element-wise increasing transformation H, and the underlying algorithm only relies on the rankings of the candidate solutions, all of the operations are invariant to H.

ARCH can be combined with a search algorithm that is not generalized to an inner product space. For example, our baseline CMA-ES, which is described in Section 2.1, does not generalize to an inner product space (as pσ is coordinate dependent), although we empirically observe quite uniform behaviors under different coordinate systems, as we observe in Section 6. ARCH does not disturb the uniform behavior, as ARCH itself is invariant to any affine coordinate transformation.

Our first numerical experiments are aimed at demonstrating the effects of the invariance to affine coordinate transformations of ARCH. We develop a set of linearly constrained test problems. We compare ARCH with other CHTs for the CMA-ES, demonstrating the manner in which affine coordinate transformations affect the performances of CHTs that are not invariant thereto, while ARCH performs equally effectively under different transformations. Moreover, we compare ARCH with CHTs that are specialized for box constraints. Note that box CHTs are expected to be more efficient for box-constrained optimization problems than other CHTs for general linear constraints. We observe that ARCH is competitive with and sometimes outperforms these, even on box-constrained problems.

6.1 Linearly Constrained Quadratic Minimization Problems

We consider a linearly constrained problem (P0), defined in the n-dimensional inner product space (V,·,·) on the real field R as (4), where the constraints are defined as gi(p)=-vi,p-[LB]i and gn+i(p)=vi,p-[UB]i, where viV is the normal vector of the ith constraint and these are orthogonal to one another; that is, vi,vj=δi,j, and [LB]i,[UB]iR are the lower and upper bounds, respectively.

We consider the coordinate system ψ:VRn with {vi} as the basis vectors. That is, ψ:p=i=1n[x]ivix. On this coordinate system, (P0) can be expressed as the box-constrained minimization problem (P1), as follows:
(20)
where A=[-In,In]T and =[LBT,UBT], and g(P1)=(g1,,gm) is a vector form of m=2n constraint functions. We define an initial mean vector and initial covariance matrix as m(0) and C(0), respectively, on this coordinate system.
By taking another basis {wi}, the general linearly constrained optimization problem (P2) can be defined as
(21)
On this coordinate system, the coordinate vector yRn is transformed as x=Py using a basis-transformation matrix P, which transforms {vi} into {wi}. The initial mean vector and covariance matrix are transformed as P-1m(0) and P-1C(0)(P-1)T, respectively.

These optimization problems (P1) and (P2) are equivalent to (P0). However, (P1) has a box constraint, while (P2) has a set of linear constraints. Algorithms that are invariant to any affine transformation of the search space must perform equivalently on these problems. We use these problems to assess the invariance properties.

6.2 Settings

For the test problem (P1) in (20), we use three objective functions: sphere (fsph(x)=i=1n[x]i2), ellipsoid (fell(x)=i=1n106i-1n-1[x]i2), and rotated ellipsoid (frotell(x)=fell(Qθx)), where QθRn×n is a block diagonal matrix such that each block is a 2×2 orthogonal (that is, rotation) matrix and all blocks share the same matrix. We let the counter-clockwise rotation angle of each block be θ=π/6 in this experiment.

Problem (P1) is a box-constrained problem. The lower and upper bounds are set as LB=[-1,1,,-1,1]T and UB=LB+[5,,5]T, respectively. The optimal solution, which is obtained by the Karush–Kuhn–Tucker (KKT) condition, is located at x*=[0,1,,0,1]T for fsph and fell, and x*[0.37,1,,0.37,1]T for frotell. That is, the even-numbered coordinates of the optimum are on the boundary, while the others are not. The number of active constraints at the optimum is n/2.

We use the following two matrices for the transformation matrix P in (21): Prot=Qθ' and Pillrot=Qθ'TDQθ' with θ'=π/4, where Qθ' is as defined above, with θθ', and D=diag(1,10,,1,10) is a diagonal matrix. The transformation matrix Prot only rotates the search space, while Pillrot transforms the rectangular feasible domain into an n-parallelotope shape. Hereunder, problem (P1) is denoted by Box, the linearly constrained problem (P2) using Prot is denoted by rotBox, and (P2) using Pillrot is denoted by illrotBox. Their feasible domains and the levelsets of the objective function are illustrated in Figure 2.

For the box-constrained optimization problem (P1), the initial mean vector is m(0)=UB+LB2+U(-1,1)n, and the initial covariance matrix is C(0)=In. For the linearly constrained optimization problem (P2), m(0) and C(0) are transformed in the manner described in Section 6.1. The search space dimension is n{20,50}, the initial step size is σ(0)=1ni=1n[UB]i-[LB]i4=1.25, and the other parameters are set to their default values, as defined in Table 1.

6.3 Results and Discussion

Figure 3 presents the performance of ARCH, the resampling technique with a maximum resampling number of 500, and AP-BCH (Hansen et al., 2009) on fsph, fell, and frotell under three different coordinate systems (Box, rotBox, and illrotBox) on n=20 and n=50 dimensions. As AP-BCH is a box CHT, the results are illustrated only for Box. The optimization progress is measured by the Mahalanobis distance between the mean vector and optimal solution m-x*H2=(m-x*)TH(m-x*) given the Hessian matrix HRn×n of the objective function. Figure 4 presents the results of typical runs of the CMA-ES on unconstrained problems, resampling, AP-BCH, and ARCH on fsph and fell under the Box constraint on n=20 dimensions. For AP-BCH and ARCH, the ratio of the number of constraints satisfied by the mean vector m,
(22)
is illustrated for discussion.
Figure 4:

Typical single runs of CMA-ES (first row) on unconstrained fsph and fell, resampling (second row), AP-BCH (third row), and ARCH (fourth row) on fsph and fell with Box. The figures indicate the Mahalanobis distance m-x*H2 between the mean vector m and optimal solution x*, given the Hessian matrix H of the objective f; the step size σ; the eigenvalues eig(C) of the square root of C; the coordinates of m; the ratio rfeas of the constraints satisfied by m; the coefficient α and the parameter dm used for updating α versus the number of iterations.

Figure 4:

Typical single runs of CMA-ES (first row) on unconstrained fsph and fell, resampling (second row), AP-BCH (third row), and ARCH (fourth row) on fsph and fell with Box. The figures indicate the Mahalanobis distance m-x*H2 between the mean vector m and optimal solution x*, given the Hessian matrix H of the objective f; the step size σ; the eigenvalues eig(C) of the square root of C; the coordinates of m; the ratio rfeas of the constraints satisfied by m; the coefficient α and the parameter dm used for updating α versus the number of iterations.

Close modal

Firstly, we focus on the results of ARCH and the resampling technique in Figure 3. Although the underlying CMA-ES is not mathematically proven to be invariant to affine transformation of the search space, we observe in Figure 3 that the lines of ARCH for Box, rotBox, and illrotBox overlap one another, owing to the invariance of ARCH to the affine transformation of the search space. For resampling as well, we observe the same. This indicates that the performance comparison can be conducted on the most convenient case, namely the Box constraint case. In the following, we focus on analyzing the results on Box, which can be generalized to general cases.

Next, we investigate the behavior of ARCH. We observe in Figure 4 that the parameter dm is maintained at approximately 1. This is the desired behavior, as we design the adaptation of α to maintain dm1. Moreover, we can observe from the results that the behaviors of the CMA-ES on the unconstrained problem and the behaviors of the CMA-ES with ARCH and AP-BCH are similar, in that the covariance matrix tends to be proportional to the inverse of the Hessian matrix of the objective function. However, in Figure 3, we observe the difference between these two algorithms in terms of the adaptation speed of the covariance matrix. On fell and frotell, the CMA-ES with ARCH adapts the covariance matrix significantly faster than that with AP-BCH. By adapting the coefficient α, ARCH appears to resemble the selection of candidate solutions on an unconstrained problem better than AP-BCH.

The resampling technique exhibits different behavior. The eigenvalues of the covariance matrix are divided into two, with the smaller values corresponding to the axes where the constraints are active at the optimum, and the greater values corresponding to the axes where the constraints are inactive at the optimum. If we do not use any information from the infeasible domain and generate only feasible candidate solutions, this is a reasonable approach to cause the distribution to be narrow in the directions of the active constraints. This concept has indeed been employed in Arnold and Hansen (2012). However, this strategy significantly reduces the speed of the approach of the mean vector to the optimum on the boundary.

The aim of our second numerical experiments is twofold.

Firstly, we demonstrate that ARCH can be applied to nonlinearly as well as linearly constrained problems. To evaluate the efficacy, we compare ARCH with the active-set ES (Arnold, 2017) that can deal with explicit (a priori) and nonlinear constraints, because there is currently only one CHT designed for such a constraint. This is the (1+1)-ES-based approach, and the covariance matrix adaptation is not incorporated.

Secondly, we show that, even if the infeasible solutions can be evaluated on the objective, ARCH, which only evaluates the solutions in the feasible domain, exhibits advantages in solving constrained optimization with explicit (a priori) constraints. For this purpose, we compare ARCH with the AL (Atamna et al., 2016) and MCR (de Paula Garcia et al., 2017), which are designed for simulation-based constraints under the relaxed assumption that any infeasible solutions can be evaluated on the objective.

7.1 CEC 2006 testbed

To compare the CHTs, we use the CEC 2006 testbed (Liang et al., 2006), which consists of 24 constrained problems including linear/nonlinear and equality/inequality constraints, where the equality constraints hj(x)=0 are transformed into inequalities, as described below Eq. (1), with ɛeq=10-4. We reuse the CEC 2006 testbed since this testbed is widely used in existing works, which makes it easy to compare algorithms. All optimization variables are bounded as [LB]i[x]i[UB]i (i{1,,n}). The bound constraints are transformed into a set of linear inequality constraints, as in Eq. (20). Note that ARCH is independent of the manner in which they are transformed, but certain existing CHTs, such as MCR, are dependent on this. Table 3 summarizes the features of the constrained problems. Our implementation of the CEC 2006 testbed is available in the repository (https://github.com/naoking158/ARCH).

Table 3:

Details of 24 constrained problems of CEC 2006 testbed: n is the search space dimension; LI, NI, LE, and NE are the number of linear/nonlinear inequality constraints and linear/nonlinear equality constraints, respectively; and mact is the number of active constraints at the optimum.

Prob.nType of fLINILENEmact
g01 13 quadratic 
g02 20 nonlinear 
g03 10 polynomial 
g04 quadratic 
g05 cubic 
g06 cubic 
g07 10 quadratic 
g08 nonlinear 
g09 polynomial 
g10 linear 
g11 quadratic 
g12 quadratic 
g13 nonlinear 
g14 10 nonlinear 
g15 quadratic 
g16 nonlinear 34 
g17 nonlinear 
g18 quadratic 13 
g19 15 nonlinear 
g20 24 linear 12 16 
g21 linear 
g22 22 linear 11 19 
g23 linear 
g24 linear 
Prob.nType of fLINILENEmact
g01 13 quadratic 
g02 20 nonlinear 
g03 10 polynomial 
g04 quadratic 
g05 cubic 
g06 cubic 
g07 10 quadratic 
g08 nonlinear 
g09 polynomial 
g10 linear 
g11 quadratic 
g12 quadratic 
g13 nonlinear 
g14 10 nonlinear 
g15 quadratic 
g16 nonlinear 34 
g17 nonlinear 
g18 quadratic 13 
g19 15 nonlinear 
g20 24 linear 12 16 
g21 linear 
g22 22 linear 11 19 
g23 linear 
g24 linear 

7.2 ARCH vs Active-Set ES

In this section, we compare ARCH with the active-set ES (Arnold, 2017).

7.2.1 Settings

We follow the experimental setup of Arnold (2017). Note that the test problems used in Arnold (2017), known as the Michalewicz/Schoenauer (1996) test set are equivalent to the first 11 constrained problems in the CEC 2006 testbed. The initial mean vector m(0) is sampled from the uniform distribution U[LB,UB], and then projected onto the boundary of the feasible domain by the repair operator defined by each CHT if the sampled point is infeasible. The initial step size and covariance matrix are set to σ(0)=0.2min{UB-LB} and C(0)=In, respectively. A run is terminated if the algorithm reaches 1200 iterations or locates a feasible candidate solution x with the objective value f(x)<f*+ɛ|f*|, where f* is the optimal objective function value, as reported in Liang et al. (2006), and ɛ{10-4,10-8} is referred to as the target accuracy. The other parameters of the CMA-ES are set to the default values listed in Table 1.

7.2.2 Results and Discussion

We conduct 100 runs of each algorithm for each problem. The results are summarized in Table 4. The success rate is defined as the number of runs in which the algorithm can reach the target within 1200 iterations, divided by the total number of runs (100).

Table 4:

Median number of function evaluations (NFES), iterations (# Iterations) and success rate (SR) among 100 independent runs for each problem and each algorithm. The results of the active-set ES were obtained from Arnold (2017).

Target AccuracyActive-set ESARCH
Prob.ɛNFES (# Iterations)SRNFES# IterationsSR
g01 10-4 30 61% 154 14 96% 
 10-8 30 62% 154 14 96% 
g02 10-4 – 0% – – 0% 
 10-8 – 0% – – 0% 
g03 10-4 463 100% 900 90 100% 
 10-8 863 100% 1720 172 100% 
g04 10-4 22 100% 176 22 100% 
 10-8 24 100% 176 22 100% 
g05 10-4 36 100% 624 78 97% 
 10-8 82 100% 1160 145 41% 
g06 10-4 100% 1 100% 
 10-8 100% 1 100% 
g07 10-4 325 100% 1635 163.5 100% 
 10-8 557 100% 2705 270.5 100% 
g08 10-4 107 38% 510 85 57% 
 10-8 210 44% 636 106 57% 
g09 10-4 307 100% 846 94 100% 
 10-8 582 100% 1620 180 100% 
g10 10-4 117 100% 580 58 100% 
 10-8 236 100% 2985 298.5 100% 
g11 10-4 25 100% 60 10 100% 
 10-8 73 100% 258 43 100% 
Target AccuracyActive-set ESARCH
Prob.ɛNFES (# Iterations)SRNFES# IterationsSR
g01 10-4 30 61% 154 14 96% 
 10-8 30 62% 154 14 96% 
g02 10-4 – 0% – – 0% 
 10-8 – 0% – – 0% 
g03 10-4 463 100% 900 90 100% 
 10-8 863 100% 1720 172 100% 
g04 10-4 22 100% 176 22 100% 
 10-8 24 100% 176 22 100% 
g05 10-4 36 100% 624 78 97% 
 10-8 82 100% 1160 145 41% 
g06 10-4 100% 1 100% 
 10-8 100% 1 100% 
g07 10-4 325 100% 1635 163.5 100% 
 10-8 557 100% 2705 270.5 100% 
g08 10-4 107 38% 510 85 57% 
 10-8 210 44% 636 106 57% 
g09 10-4 307 100% 846 94 100% 
 10-8 582 100% 1620 180 100% 
g10 10-4 117 100% 580 58 100% 
 10-8 236 100% 2985 298.5 100% 
g11 10-4 25 100% 60 10 100% 
 10-8 73 100% 258 43 100% 

Comparing the median number of f-calls for reaching the same accuracy, ɛ, in Table 4, we can observe that the active-set ES achieves the target accuracy with a lower number of f-calls. However, the median number of iterations is lower for ARCH, as it is the population-based approach. As the number of candidate solutions per iteration is increased, we expect the number of iterations to decrease. This is an advantage of ARCH when the candidate solutions can be evaluated on f in parallel.

Focusing on problems g01 and g08, ARCH exhibits higher success rates than the active-set ES. This is possibly because ARCH is combined with the (μ,λ) type ES, rather than the (1+1)-ES. As these problems have local minima, ARCH can reach the target at a higher rate than the active-set ES, which is (1+1)-ES based. None of the algorithms can solve problem g02, because this problem has a strongly multimodal landscape.

Another advantage that does not appear in this experiment is that we incorporate the covariance matrix adaptation, which is desirable for ill-conditioned and non-separable problems.

7.3 ARCH vs AL and MCR

In this section, we compare ARCH with the AL (Atamna et al., 2016) and MCR (de Paula Garcia et al., 2017) techniques.

7.3.1 Settings

In this experiment, we use all 24 problems in the CEC 2006 testbed. Following the default setup in Liang et al. (2006), we regard each run as successful if a feasible candidate solution x with the objective function value f(x)-f*10-4 is located within 5×105f-calls, where f* is the optimal objective value.

Because we consider the constraints to be explicit, it is a natural approach to prepare a set of feasible solutions and begin the optimization process with one feasible solution as the initial search point. The set of feasible solutions is generated as follows. We run the CMA-ES with the following loss function L:RnR,
(23)
where gj+(x)=max(0,gj(x)). If the candidate solutions are all feasible, they receive the same ranking and the distribution parameter update becomes an unbiased random walk. Then, the CMA-ES tends to produce feasible solutions extensively in a connected feasible subset. To obtain diverse feasible solutions, we run the CMA-ES until 10n feasible solutions are generated, and repeat it 50 times with the following initial parameters: σ(0)=exp1ni=1nln[UB]i-[LB]i5, C(0)=diag(UB-LB5σ(0))2, and m(0)U[LB,UB].

Because multimodal functions exist in the CEC 2006 testbed, we use the BIPOP restart strategy (Hansen, 2009), which updates the population size and initial step size for every restart. Instead of randomly sampling an initial mean vector m(0) from the entire search space, we randomly sample one feasible solution from the above prepared feasible set in each restart. The initial step size and covariance matrix are set to σ(0)=exp1ni=1nln[UB]i-[LB]i5 and C(0)=diag(UB-LB5σ(0))2, respectively, with the step size scaled by the BIPOP strategy in each restart. The other parameters of the CMA-ES are set to the default values listed in Table 1, and we follow the restart condition described in Hansen (2009).

7.3.2 Results and Discussion

All results are summarized in Table 5. The statistical significance is tested using the two-sided Mann–Whitney rank test, with a significance level of 5%/m, where m=42 is the number of tests (Bonferroni correction).6

Table 5:

Median number of function evaluations (NFES) and success rate (SR) among 25 independent runs for each problem and each algorithm. The subscript of the NFES indicates the average number of restarts for the successful runs. The markers * (AL vs ARCH), + (AL vs MCR), and (MCR vs ARCH) indicate the statistical significance according to the two-sided Mann–Whitney rank test with a significance level of 5/m%, where m=42 is the number of tests (Bonferroni correction).

ALMCRARCH
Prob.NFESSRNFESSRNFESSR
g01 – 0% 320245(4.92) 100% 737(0.38) 100% 
g02 – 0% – 0% – 0% 
g03 – 0% 10670(0.33) 100% 1030(0.75) 100% 
g04 – 0% 7728(0.25) 100% 120(0.21) 100% 
g05 22056(10.00)+ 100% 269447(47.96) 8% 737(1.46)* 100% 
g06 8981(43.50) 4% 1332(0.00)+ 100% 90(3.62)* 100% 
g07 14415(2.42) 100% 19860(0.17) 100% 2782(3.88)* 100% 
g08 468(0.46) 100% 162(0.17) 100% 390(3.08) 100% 
g09 2079(0.00)+ 100% 5328(0.00) 100% 1791(2.42) 100% 
g10 – 0% 41780(1.12) 100% 2990(0.33) 100% 
g11 294(0.00)+ 100% 3204(0.88) 100% 54(0.17)* 100% 
g12 15601(6.08) 100% 2870(1.38)+ 100% 902(6.21)* 100% 
g13 14208(2.12) 100% – 0% 2686(4.96)* 100% 
g14 418341(16.17) 12% 16830(0.46)+ 100% 2172(2.50)* 100% 
g15 2317(0.42)+ 100% 72239(11.17) 100% 175(0.92)* 100% 
g16 5816(0.54) 100% 8664(0.46) 100% – 0% 
g17 29400(5.92) 100% – 0% 1814(4.17)* 100% 
g18 11490(2.75) 100% 8810(1.58) 100% 3939(6.83)* 100% 
g19 – 0% 127308(1.38) 100% 5772(0.08) 100% 
g20 – 0% – 0% – 0% 
g21 – 0% – 0% – 0% 
g22 – 0% – 0% – 0% 
g23 – 0% 96078(34.38) 4% 9386(4.62) 100% 
g24 7422(3.33) 100% 600(0.21)+ 100% 54(0.38)* 100% 
ALMCRARCH
Prob.NFESSRNFESSRNFESSR
g01 – 0% 320245(4.92) 100% 737(0.38) 100% 
g02 – 0% – 0% – 0% 
g03 – 0% 10670(0.33) 100% 1030(0.75) 100% 
g04 – 0% 7728(0.25) 100% 120(0.21) 100% 
g05 22056(10.00)+ 100% 269447(47.96) 8% 737(1.46)* 100% 
g06 8981(43.50) 4% 1332(0.00)+ 100% 90(3.62)* 100% 
g07 14415(2.42) 100% 19860(0.17) 100% 2782(3.88)* 100% 
g08 468(0.46) 100% 162(0.17) 100% 390(3.08) 100% 
g09 2079(0.00)+ 100% 5328(0.00) 100% 1791(2.42) 100% 
g10 – 0% 41780(1.12) 100% 2990(0.33) 100% 
g11 294(0.00)+ 100% 3204(0.88) 100% 54(0.17)* 100% 
g12 15601(6.08) 100% 2870(1.38)+ 100% 902(6.21)* 100% 
g13 14208(2.12) 100% – 0% 2686(4.96)* 100% 
g14 418341(16.17) 12% 16830(0.46)+ 100% 2172(2.50)* 100% 
g15 2317(0.42)+ 100% 72239(11.17) 100% 175(0.92)* 100% 
g16 5816(0.54) 100% 8664(0.46) 100% – 0% 
g17 29400(5.92) 100% – 0% 1814(4.17)* 100% 
g18 11490(2.75) 100% 8810(1.58) 100% 3939(6.83)* 100% 
g19 – 0% 127308(1.38) 100% 5772(0.08) 100% 
g20 – 0% – 0% – 0% 
g21 – 0% – 0% – 0% 
g22 – 0% – 0% – 0% 
g23 – 0% 96078(34.38) 4% 9386(4.62) 100% 
g24 7422(3.33) 100% 600(0.21)+ 100% 54(0.38)* 100% 

The MCR and AL are CHTs that are often used for simulation-based constraints. They do not exploit the fact that the constraints are explicit, but assume that f is defined on the infeasible domain. With the exception of problems g08 and g16, ARCH overwhelmingly outperforms the MCR and AL, which do not explicitly utilize the fact that g-calls are computationally cheaper than f-calls, and perform as many f-calls as g-calls. This indicates that ARCH can efficiently exploit the fact that the constraints are explicit, even if the objective function is defined on the infeasible domain.

ARCH does not reach the target for problems g02, g16, and g20 to g22. For g02, g20, g21, and g22, no algorithms can reach the target value. Problem g02 has a strongly multimodal landscape, and it is difficult to locate the global optimum among many local optima. Problems g20 to g22 have many nonlinear equality constraints, and it is difficult to grasp the global landscape without using the function values outside of the feasible domain. In particular, a feasible solution has not yet been determined for problem g20 (Liang et al., 2006). Problem g16 is a unique problem that can be solved by the AL and MCR, but not ARCH. In ARCH, the internal optimizer (SLSQP) used in the repair operation fails to locate the solution. This may occur if there are too many complex constraints.

A constrained continuous optimization problem has been addressed in this study, in which the constraints are assumed to be explicitly written as mathematical expressions and their evaluation time is negligible compared to that of the objective function. We do not assume that the objective function values are defined outside of the feasible domain. Our proposed CHT for the CMA-ES, known as ARCH, can handle explicit and nonlinear constraints. ARCH is designed to be invariant to any element-wise transformation of the objective and constraint functions, as well as to any affine transformation of the search space coordinate, so as to preserve the invariance properties of the underlying CMA-ES. The invariance properties are mathematically proven and empirically validated. To the best of the authors' knowledge, this is the first approach for explicit constraints that is invariant to those transformations.

Two sets of experiments revealed the effectiveness of ARCH. The first demonstrated that the convergence speed of ARCH improved compared with BCH, in the especially ill-conditioned and nonseparable objective function while it is almost the same as in the well-conditioned problem. These results could be attributed to the fact that adapting the coefficient α allows for faster adaptation of the covariance matrix C, and the affine invariance allows for a search behavior that resembles one on the well-conditioned function. The second experiment revealed that ARCH is overwhelmingly more efficient than the CHTs that do not exploit the explicit constraints. This implies that the explicit constraints should be exploited in the constraint handling, which ARCH effectively achieves. However, note that the CHTs used in the comparison are limited to those available to the CMA-ES. Further comparison with state-of-the-art methods is necessary to show the usefulness of ARCH among existing approaches. On the other hand, the comparison to the active-set (1+1)-ES showed a possibility to further improve the efficacy of ARCH in terms of the number of the objective function calls. Incorporating the idea of the active-set ES into the repair operator in ARCH is a possible direction of a future work.

ARCH is designed to solve the constrained continuous optimization problems where the constraints are all explicit. Box-constrained optimization problems are such examples, where ARCH exhibits superior performance to an existing box constraint handling technique. However, it is often the case that there exist explicit constraints and simulation-based constraints at the same time, where the explicit constraints are prerequisites to a simulator that computes the objective function value and the simulation-based constraint violation values. ARCH is expected to be inefficient for such problems as it requires a substantial amount of constraint evaluations. To deal with the combination of these two types of constraints, one might need to combine an explicit constraint handling such as ARCH and a simulation-based constraint handling such as the multiple constraint ranking (MCR) technique. The invariance properties of ARCH will play an important role to preserve the invariance properties of the entire search algorithm when two CHTs are combined. This is an important direction of future work.

1

Our implementation of ARCH is available in a GitHub repository (https://github.com/naoking158/ARCH).

2

Note that we introduce γσ and γc that does not appear in a standard formulation in order to treat the initialization effect of the evolution paths and write it short. See Akimoto and Hansen (2020) for more detail.

3

This transformation is applied after the transformation from the equality constraint to the inequality one described in the introduction is performed.

4

The invariance of CMA-ES can be proven by using C(t)-1pc instead of pσ of the step size update formula in Section 2.1 (note the meaning changes because what is accumulated is not z). However, the step size formula using pσ is generally employed, and no adverse effect of this difference has been empirically observed. Therefore, we use pσ as well in this article.

5

We observed in preliminary experiments that the best result was obtained by using Eq. (13).

6

Since the tests have been performed excluding the results where no runs reached the target, the number of tests was 42, not 24×3=72.

This work was supported by the JSPS KAKENHI under Grant Numbers 19H04179 and 19J21892.

Akimoto
,
Y.
,
Auger
,
A.
, and
Hansen
,
N.
(
2020
).
Quality gain analysis of the weighted recombination evolution strategy on general convex quadratic functions.
Theoretical Computer Science,
832
:
42
67
. Theory of Evolutionary Computation.
Akimoto
,
Y.
, and
Hansen
,
N.
(
2020
).
Diagonal acceleration for covariance matrix adaptation evolution strategies.
Evolutionary Computation,
28
(
3
):
405
435
. PMID: 31120772.
Arnold
,
D. V
. (
2016
).
An active-set evolution strategy for optimization with known constraints
. In
International Conference on Parallel Problem Solving from Nature
, pp.
192
202
.
Arnold
,
D. V
. (
2017
).
Reconsidering constraint release for active-set evolution strategies
. In
Proceedings of the Genetic and Evolutionary Computation Conference
, pp.
665
672
.
Arnold
,
D. V.
, and
Hansen
,
N
. (
2012
).
A (1+1)-CMA-ES for constrained optimisation
. In
Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference (GECCO)
, pp.
297
304
.
Arnold
,
D. V.
, and
Porter
,
J
. (
2015
).
Towards an augmented Lagrangian constraint handling approach for the (1+1)-ES
. In
Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO)
, pp.
249
256
.
Atamna
,
A.
,
Auger
,
A.
, and
Hansen
,
N
. (
2016
).
Augmented Lagrangian constraint handling for CMA-ES—Case of a single linear constraint
. In
Proceedings of the 14th International Conference on Parallel Problem Solving from Nature
, pp.
181
191
.
Atamna
,
A.
,
Auger
,
A.
, and
Hansen
,
N.
(
2020
).
On invariance and linear convergence of evolution strategies with augmented Lagrangian constraint handling.
Theoretical Computer Science,
832
:
68
97
.
Beyer
,
H.-G
. (
2001
).
The theory of evolution strategies
.
New York
:
Springer Science & Business Media
.
Chocat
,
R.
,
Brevault
,
L.
,
Balesdent
,
M.
, and
Defoort
,
S.
(
2015
).
Modified covariance matrix adaptation–evolution strategy algorithm for constrained optimization under uncertainty, application to rocket design.
International Journal for Simulation and Multidisciplinary Design Optimization,
6
:
A1
.
[PubMed]
de Paula Garcia
,
R.
,
de Lima
,
B. S. L. P.
,
de Castro Lemonge
,
A. C.
, and
Jacob
,
B. P.
(
2017
).
A rank-based constraint handling technique for engineering design optimization problems solved by genetic algorithms.
Computers & Structures, 187:
77
87
.
Eaton
,
M.
(
2007
).
Chapter 3: The normal distribution on a vector space
, series. Lecture Notes–Monograph Series. Institute of Mathematical Statistics,
53
.
Hansen
,
N
. (
2009
).
Benchmarking a bi-population CMA-ES on the BBOB-2009 function testbed
. In
Workshop Proceedings of the GECCO Genetic and Evolutionary Computation Conference
, pp.
2389
2395
.
Hansen
,
N.
(
2016
).
The CMA evolution strategy: A tutorial
. arXiv:1604.00772
Hansen
,
N.
,
Muller
,
S. D.
, and
Koumoutsakos
,
P
. (
2003
).
Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)
.
Evolutionary Computation
,
11
(
1
):
1
18
.
[PubMed]
Hansen
,
N.
,
Niederberger
,
A. S. P.
,
Guzzella
,
L.
, and
Koumoutsakos
,
P
. (
2009
).
A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion
.
IEEE Transactions on Evolutionary Computation
,
13
(
1
):
180
197
.
Hansen
,
N.
, and
Ostermeier
,
A
. (
2001
).
Completely derandomized self-adaptation in evolution strategies
.
Evolutionary Computation
,
9
(
2
):
159
195
.
[PubMed]
Hansen
,
N.
,
Ros
,
R.
,
Mauny
,
N.
,
Schoenauer
,
M.
, and
Auger
,
A
. (
2011
).
Impacts of invariance in search: When CMA-ES and PSO face ill-conditioned and non-separable problems
.
Applied Soft Computing
,
11
(
8
):
5755
5769
.
Hellwig
,
M.
, and
Beyer
,
H
. (
2018
).
A matrix adaptation evolution strategy for constrained real-parameter optimization
. In
2018 IEEE Congress on Evolutionary Computation
, pp.
1
8
.
Karafotias
,
G.
,
Hoogendoorn
,
M.
, and
Eiben
,
A. E
. (
2015
).
Parameter control in evolutionary algorithms: Trends and challenges
.
IEEE Transactions on Evolutionary Computation
,
19
(
2
):
167
187
.
Kraft
,
D.
(
1988
). A software package for sequential quadratic programming. Technical Report DFVLR-FB 88-28, DLR German Aerospace Center---Institute for Flight Mechanics, Koln, Germany.
Krause
,
O.
, and
Glasmachers
,
T
. (
2015
).
A CMA-ES with multiplicative covariance matrix updates
. In
Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (GECCO)
, pp.
281
288
.
Le Digabel
,
S.
, and
Wild
,
S. M.
(
2015
).
A taxonomy of constraints in simulation-based optimization
. arXiv:1505.07881
Liang
,
J.
,
Runarsson
,
T. P.
,
Mezura-Montes
,
E.
,
Clerc
,
M.
,
Suganthan
,
P. N.
,
Coello
,
C. C.
, and
Deb
,
K
. (
2006
).
Problem definitions and evaluation criteria for the CEC 2006 special session on constrained real-parameter optimization
.
Journal of Applied Mechanics
,
41
(
8
):
8
31
.
Michalewicz
,
Z.
, and
Schoenauer
,
M
. (
1996
).
Evolutionary algorithms for constrained parameter optimization problems
.
Evolutionary Computation
,
4
(
1
):
1
32
.
Oyman
,
A. I.
,
Deb
,
K.
, and
Beyer
,
H
. (
1999
).
An alternative constraint handling method for evolution strategies
. In
Proceedings of the 1999 Congress on Evolutionary Computation
, Vol.
1
, pp.
612
619
.
Runarsson
,
T. P.
, and
Yao
,
X
. (
2000
).
Stochastic ranking for constrained evolutionary optimization
.
IEEE Transactions on Evolutionary Computation
,
4
(
3
):
284
294
.
Sakamoto
,
N.
, and
Akimoto
,
Y
. (
2017
).
Modified box constraint handling for the covariance matrix adaptation evolution strategy
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Companion
, pp.
183
184
.
Sakamoto
,
N.
, and
Akimoto
,
Y
. (
2019
).
Adaptive ranking based constraint handling for explicitly constrained black-box optimization
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, p.
700
.
Spettel
,
P.
,
Beyer
,
H.
, and
Hellwig
,
M
. (
2019
).
A covariance matrix self-adaptation evolution strategy for optimization under linear constraints
.
IEEE Transactions on Evolutionary Computation
,
23
(
3
):
514
524
.
Spettel
,
P.
, and
Beyer
,
H.-G
. (
2019
).
A multi-recombinative active matrix adaptation evolution strategy for constrained optimization
.
Soft Computing
,
23
(
16
):
6847
6869
.