## Abstract

We propose a novel constraint-handling technique for the covariance matrix adaptation evolution strategy (CMA-ES). The proposed technique is aimed at solving explicitly constrained black-box continuous optimization problems, in which the explicit constraint is a constraint whereby the computational time for the constraint violation and its (numerical) gradient are negligible compared to that for the objective function. This method is designed to realize two invariance properties: invariance to the affine transformation of the search space, and invariance to the increasing transformation of the objective and constraint functions. The CMA-ES is designed to possess these properties for handling difficulties that appear in black-box optimization problems, such as non-separability, ill-conditioning, ruggedness, and the different orders of magnitude in the objective. The proposed constraint-handling technique (CHT), known as ARCH, modifies the underlying CMA-ES only in terms of the ranking of the candidate solutions. It employs a repair operator and an adaptive ranking aggregation strategy to compute the ranking. We developed test problems to evaluate the effects of the invariance properties, and performed experiments to empirically verify the invariance of the algorithm. We compared the proposed method with other CHTs on the CEC 2006 constrained optimization benchmark suite to demonstrate its efficacy. Empirical studies reveal that ARCH is able to exploit the explicitness of the constraint functions effectively, sometimes even more efficiently than an existing box-constraint handling technique on box-constrained problems, while exhibiting the invariance properties. Moreover, ARCH overwhelmingly outperforms CHTs by not exploiting the explicit constraints in terms of the number of objective function calls.

## 1 Introduction

*explicit*constraints can be computed independently of $f$-calls, and their computational cost is significantly lower than that of $f$-calls. These can be presented in a relatively simple mathematical expression, and their gradients are available symbolically or can be estimated numerically with a relatively low computational cost. The most common example of an explicit constraint is the box constraint, in which each coordinate of the design variable $x$ is constrained in a closed interval. Explicit constraints often appear in engineering optimization as prerequisites for executing the simulation to compute the objective function. Taking into account the abovementioned situations, in this study, we assume that

the computational time for $gj(x)$ and its (numerical) gradient $\u2207gj(x)$ are negligible compared to that for $f(x)$; and

the objective function $f(x)$ is not necessarily defined for an infeasible solution $x$, which violates some $gj(x)$.

The covariance matrix adaptation evolution strategy (CMA-ES) (Hansen and Ostermeier, 2001; Hansen et al., 2003; Hansen, 2016) is employed in this study as the baseline optimization method for solving the problem (1). The CMA-ES is known as one of the state-of-the-art zeroth-order optimization algorithms for unconstrained black-box continuous optimization. In particular, it has been demonstrated as efficient for difficult objective functions such as non-convex, ill-conditioned, and nonseparable functions. Since these difficulties of objective functions naturally appear in engineering optimization, whether or not constraints exist, it is desirable for search algorithms for constrained optimization to operate efficiently under these difficulties. Although the CMA-ES is designed for unconstrained optimization, it can be applied to constrained optimization with the aid of constraint-handling techniques (CHTs).

One key characteristic of the CMA-ES is its *invariance* to several transformations of the objective function and search space (Hansen et al., 2011). The invariance properties induce equivalent problem classes. All instances (problems) in an equivalent class are regarded as equivalent under the corresponding transformation of the initial search distribution. Owing to the invariance properties of the CMA-ES, if its initial state is transformed properly, it is empirically observed that the CMA-ES minimizes ill-conditioned and nonseparable quadratic functions as efficiently as well-conditioned and separable spherical functions. This is key to the success of the CMA-ES in real-world problems because it is designed to adapt the distribution to such a proper state. Although the invariance properties themselves do not imply algorithm efficacy, they are useful for generalizing observations. That is, they are essential for assessing the performance of algorithms empirically. The invariance further contributes to the quasi-parameter-free feature of the CMA-ES, which is an important characteristic of the CMA-ES that has attracted attention from practitioners. As opposed to many other evolutionary approaches to continuous black-box optimization, in which the hyper-parameters are required to be tuned depending on the problem characteristics to enable efficient performance (Karafotias et al., 2015), default values are prepared for all hyper-parameters of the CMA-ES, depending only on the search space dimension $n$. As many advantages of the CMA-ES for unconstrained optimization originate from its invariance properties, we hypothesize that a CHT for explicitly constrained optimization is desirable to preserve the invariance as much as possible to use the CMA-ES for constrained optimization.

However, the CHTs for explicit constraints employed in variants of the ES are not designed to preserve the invariance of the baseline search algorithms, as we discuss briefly in Section 3. When the CMA-ES is applied to an explicitly constrained black-box optimization problem with these CHTs, it loses the invariance properties exhibited by the CMA-ES. Certain CHTs, such as death penalty or resampling techniques, are applicable to the CMA-ES and preserve its invariance properties. However, as these approaches do not exploit the fact that the constraint violations are cheap to evaluate and their gradient information is available, they are often inefficient, as observed in Section 6.

We propose a novel CHT for the CMA-ES in solving the explicitly constrained optimization problem (1), named adaptive ranking-based constraint handling (ARCH). ARCH is designed to include the following two types of invariance properties: invariance to the element-wise increasing transformation of the objective and constraint functions, and invariance to the affine transformation of the search space. ARCH replaces the evaluation step in the sampling–evaluation–update cycle of the CMA-ES, as follows: A candidate solution generated by the CMA-ES is first repaired on a boundary of the feasible domain. The objective function value is evaluated at the repaired point. A penalty for the repair operation is computed by the Mahalanobis distance between the original and repaired solutions under the covariance matrix of the current search distribution. The candidate solutions are ranked based on the adaptive weighted sum of the rankings of the objective function values and rankings of the penalty values. An adaptive penalty coefficient is introduced to control the balance between the rankings, and is adapted for the search distribution so as not to move away from the feasible domain in terms of the Mahalanobis distance, while allowing infeasible but near boundary solutions to exhibit high rankings.

The contributions of this study are summarized as follows. Firstly, we present ARCH, which can handle explicit and nonlinear constraints.^{1} We prove that ARCH is invariant to any element-wise increasing transformation of $f$ and $gj$, and to any affine transformation of the search space coordinates. To the best of the authors' knowledge, this is the first approach for explicit constraints that is invariant to these transformations. Secondly, we empirically evaluate the effectiveness of the proposed approach from an invariance perspective. We develop test problems to demonstrate the effects of the invariance to affine transformation. In test cases, we empirically observe that ARCH is invariant to affine transformations, and illustrate that ARCH is even more effective for box constrained optimization problems than an existing CHT specialized for the box constraint. Thirdly, we compare ARCH with existing CHTs for non-explicit constraints on problems in which the objective function is defined for the infeasible domain. We use the CEC 2006 constrained optimization benchmark suite to demonstrate that ARCH overwhelmingly outperforms other CHTs for non-explicit constraints. This indicates that the explicit constraints should be treated as explicit constraints even if the objective function is defined on an infeasible domain. Compared to the previous publication in Sakamoto and Akimoto (2019), we $i$) improve the algorithm to prevent the search distribution from being unnecessarily biased toward the boundary if the population size is larger than the default, $ii$) prove the invariance properties of ARCH, and $iii$) compare ARCH with existing approaches on the CEC 2006 testbed.

The remainder of this article is organized as follows. We summarize the mathematical notations applied throughout the article below. Section 2 introduces the baseline optimization algorithm, namely the CMA-ES, and related CHTs. We discuss the invariance properties for constrained optimization problems in Section 3. Our proposed CHT, ARCH, is described in Section 4. The invariant properties of ARCH are demonstrated in Section 5. In Section 6, we empirically demonstrate how the invariance properties operate in practice, by means of numerical experiments on linearly constrained quadratic problems, to observe the efficacy of ARCH. We present our comparison of ARCH with other CHTs using the CEC 2006 constrained optimization testbed in Section 7. The article is concluded in Section 8.

**Notations** In the following, $R$ is the set of real numbers and $R+$ is the set of strictly positive real numbers. Let $x\u2208Rn$ be an $n$-dimensional column vector, where $xT$ is its transpose, $\u2225x\u2225$ denotes the Euclidean norm of $x$, and $[x]i$ denotes the $i$th coordinate of $x$. Note that the $i$th coordinate of the $k$th vector $xk$ is denoted by $[xk]i$. The ($i$, $j$)th element of a matrix $A$ is also denoted by $[A]i,j$. The identity matrix is denoted by $In$. The indicator function $1{condition}$ returns 1 if condition is true, and 0 otherwise. The sign function $sgn(a)$ returns 1 if $a>0$, $-1$ if $a<0$, and 0 otherwise. The integer interval between and including $a$ and $b$ is denoted by $[[a,b]]$.

## 2 CMA-ES and Related CHTs

In this section, we introduce the CMA-ES, which is our baseline algorithm for unconstrained continuous optimization, followed by an overview of the existing CHTs for the CMA-ES.

### 2.1 CMA-ES

The CMA-ES (Hansen et al., 2003; Hansen, 2016) is a stochastic multi-point search algorithm for black-box continuous minimization of $f:Rn\u2192R$. The CMA-ES samples $\lambda $ candidate solutions $xk$ for $k\u2208{1,\u2026,\lambda}$ from the multivariate normal distribution $N(m,\sigma 2C)$, where $m\u2208Rn$ is the mean vector, $\sigma \u2208R+$ is the step size, and $C\u2208Rn\xd7n$ is the covariance matrix. These distribution parameters are updated using the candidate solutions and their ranking information.

**Step****0**. Initialize $m(0),\sigma (0),C(0)$ according to the initial problem search domain, and initialize two evolution paths $pc(0)=p\sigma (0)=0$ and their correction factors $\gamma \sigma (0)=\gamma c(0)=0$. All parameters that appear in the following are set to the default values listed in Table 1. The meanings of these parameters are described in Hansen (2016). These are designed based on theoretical ES research (e.g., see Akimoto et al., 2020) and extensive experiments. The CMA-ES repeats the following steps at each iteration, $t=0,1,\u2026$, until a termination criterion is satisfied.

$\lambda =4+\u230a3ln(n)\u230b,\mu =\u230a\lambda /2\u230b$ |

$wi=ln(\lambda +12)-ln(i)\u2211k=1\mu ln(\lambda +12)-ln(k),c\sigma =\mu w+2n+\mu w+5,cc=4+\mu w/nn+4+2\mu w/n$ |

$c1=2(n+1.3)2+\mu w,c\mu =min1-c1,2(\mu w-2+1/\mu w)(n+2)2+\mu w$ |

$d\sigma =1+c\sigma +2\xd7max0,\mu w-1n+1-1$ |

$\lambda =4+\u230a3ln(n)\u230b,\mu =\u230a\lambda /2\u230b$ |

$wi=ln(\lambda +12)-ln(i)\u2211k=1\mu ln(\lambda +12)-ln(k),c\sigma =\mu w+2n+\mu w+5,cc=4+\mu w/nn+4+2\mu w/n$ |

$c1=2(n+1.3)2+\mu w,c\mu =min1-c1,2(\mu w-2+1/\mu w)(n+2)2+\mu w$ |

$d\sigma =1+c\sigma +2\xd7max0,\mu w-1n+1-1$ |

**Step****1**. Draw $\lambda $ samples $zk$ for $k\u2208{1,\u2026,\lambda}$ independently from $N(0,In)$. Compute $yk=C(t)zk$ and $xk=m(t)+\sigma (t)yk$. Then, $xk$ ($k=1,\u2026,\lambda $) are the candidate solutions that are independently $N(m(t),(\sigma (t))2C(t))$ distributed. Here, $C(t)$ is the symmetric matrix satisfying $C(t)=C(t)2$.

**Step****2**. Evaluate the candidate solutions $xk$, for $k\u2208{1,\u2026,\lambda}$, on the loss function $L$, and sort them in ascending order. In an unconstrained optimization scenario, usually, $L=f$. Let the $i$th best candidate solution be denoted by $xi:\lambda $. In the same manner, we denote the corresponding steps and normalized steps as $yi:\lambda $ and $zi:\lambda $, respectively.

**Step**

**3**. Compute the weighted sum of the $\mu $ best steps of the candidate solutions $\u2329y\u232aw=\u2211i=1\mu wiyi:\lambda $ and update the mean vector $m(t)$, as follows:

**Step**

**4**. Update the evolution paths according to

^{2}are updated as follows:

**Step**

**5**. Update the step size and covariance matrix as follows:

### 2.2 CHTs for ESs

We briefly review the CHTs employed in variants of ESs.

#### 2.2.1 Resampling and Death Penalty

The resampling technique is the simplest CHT. Candidate solutions are resampled repeatedly until $\lambda $ feasible candidate solutions are generated. To guarantee that the resampling stops within a finite time, the maximum number of sampling in one iteration is set to a finite number. If the number of feasible candidate solutions is less than $\lambda $, the current population is filled with infeasible solutions and the worst loss value is assigned to them (for example, $+\u221e$). When the maximum sampling number is set to $\lambda $, the resampling technique is known as the death penalty.

The resampling and death penalty methods are easy to implement and are applicable to any constraint type. However, they are not appropriate if the optimum is located on the boundary of the feasible domain, as candidate solutions are biased in the feasible domain and the search distribution tends to approach the boundary slowly, as we observe in Section 6. Moreover, the search algorithm cannot conduct a meaningful ranking of the candidate solutions if the probability of sampling feasible solutions is rather low and the population is filled with infeasible solutions. If this is the case, the parameter update will result in random fluctuation.

#### 2.2.2 Penalty Function Methods

*implicit*constraints, in which the

*implicit*constraint means that the evaluation is expensive or is computed at the same time as the objective function. It requires the objective function to be defined in the infeasible domain.

#### 2.2.3 Ranking-Based Methods

Ranking-based CHTs aggregate the rankings of the objective function values and constraint function values to create the final rankings of the candidate solutions, instead of aggregating the function values. The stochastic ranking technique (Runarsson and Yao, 2000) attempts to balance the objective and constraint functions by sorting the candidate solutions according to the objective function with a probability $Pf$, and the sum of the constraint function values with a probability $1-Pf$. The multiple constraint ranking (MCR) technique (de Paula Garcia et al., 2017) ranks the candidate solutions according to the sum of the rankings of the objective values and rankings of each constraint violation value. Other techniques included in ranking-based CHTs such as the lexicographic ordering and the $\u025b$-lexicographic ordering have been proposed in GA and DE, and imported to ES (Oyman et al., 1999; Hellwig and Beyer, 2018). The advantage of these approaches over penalty function-based approaches is that they are invariant to strictly increasing transformation of the objective function and constraint functions, which we later refer to as *element-wise increasing transformation*. Therefore, practitioners do not need to tune the balance between the objective function values and constraint violation values manually. As with the AL, however, this method requires the objective function to be defined in the infeasible domain.

#### 2.2.4 Active Constraint Handling

There are certain CHTs that modify the covariance matrix adaptation mechanism to shrink the variance actively in the direction of the constraint function gradient so as to decrease the likelihood of infeasible solutions. We refer to such methods as active constraint handling (ACH) in this article. Reference Arnold and Hansen (2012) proposed the use of the active covariance matrix update in the (1+1)-CMA-ES. Similar concepts have been employed in other variants of the CMA-ES, such as the ($\mu ,\lambda $)-CMA-ES in Chocat et al. (2015), MA-ES in Spettel and Beyer (2019), and xCMA-ES in Krause and Glasmachers (2015). As ACH techniques use binary information, whether or not the constraint is violated, they are invariant to monotone transformations of the objective and constraint violations. Such methods can be applied to problems in which the constraints return the outcome that the given solution is either feasible or not. However, this approach tends to be inefficient compared to other CHTs on quantifiable constraints, as it does not utilize the amount of constraint violations.

#### 2.2.5 Other Explicit CHTs

The linear constraint covariance matrix self-adaptation ES (lcCMSA-ES) (Spettel et al., 2019) handles explicit and linear constraints in a variant of CMA-ES, known as CMSA-ES. It basically samples only feasible solutions, and updates the distribution parameters by using the feasible solutions. Active-set ES (Arnold, 2016, 2017) is also designed for *explicit*, but not necessarily linear, constraints. This approach is the most relevant one from the perspective of the assumptions on the constraint problem. It is a (1+1)-ES based approach that applies a repair operator in the Lamarckian manner; that is, the repair solution is used as the candidate solution, and not only to compute the loss function value. Unfortunately, it is not possible to directly extend it to the state-of-the-art variant of the ES, namely ($\mu $, $\lambda $)-CMA-ES.

### 2.3 Formal Classification of CHTs

We summarize the formal classification of the abovementioned CHTs in Table 2. The taxonomy proposed in Le Digabel and Wild (2015) classifies constraints as follows: **Q**uantifiable/**N**onquantifiable, **R**elaxable/**U**nrelaxable, **S**imulation-based/**A** priori, and **K**nown/**H**idden. A **Q**uantifiable constraint is a constraint for which the amount of feasibility and/or violation can be quantified, while a **N**onquantifiable constraint returns a binary output indicating whether or not the solution satisfies the constraint. A **R**elaxable constraint is a constraint that does not need to be satisfied to compute the objective function, while an **U**nrelaxable constraint is a prerequisite for executing the simulation for the objective function computation. An **A** priori constraint is a constraint for which the feasibility can be confirmed without running a simulation; that is, this constraint can be formulated using optimization variables such as $g(x)=\u2211i=1n[x]i\u2a7d1$, while a **S**imulation-based constraint is only computed through a computationally expensive simulation. A **K**nown constraint is a constraint that is explicitly provided in the problem formulation; for example, $minf(x)$ s.t. $g(x)\u2a7d0$. Constrained problems can be expressed by combining these initial letters as an acronym, such as QRSK. Refer to Le Digabel and Wild (2015) for further descriptions of each type of constraint and example situations. Our assumption on the constraints is QUAK in this terminology.

CHT (bnds/lc/nlc) . | Taxonomy . | Invariance . |
---|---|---|

ARCH [proposed CHT, described in Section 4] (nlc) | QUAK | increasing / affine |

AP-BCH (Hansen et al., 2009) (bnds) | QUAK | $\xd7$ / $\xd7$ |

lcCMSA-ES (Spettel et al., 2019) (lc) | QUAK | increasing / $\xd7$ |

Active-set ES (Arnold, 2017) (nlc) | QUAK | increasing / $\xd7$ |

AL (Atamna et al., 2016) (nlc) | QRSK | $\xd7$ / affine |

Stochastic ranking (Runarsson and Yao, 2000) (nlc) | QRSK | $\xd7$ / affine |

MCR (de Paula Garcia et al., 2017) (nlc) | QRSK | increasing / affine |

(1+1)-CMA-ES with ACH (Arnold and Hansen, 2012) (nlc) | NUSK | increasing / affine |

($\mu ,\lambda $)-CMA-ES with ACH (Chocat et al., 2015) (nlc) | NRSK | increasing / affine |

xCMA-ES with ACH (Krause and Glasmachers, 2015) (nlc) | NUSK | increasing / affine |

MA-ES with ACH (Spettel and Beyer, 2019) (nlc) | NUSK | increasing / affine |

Resampling technique (nlc) | NUSH | increasing / affine |

CHT (bnds/lc/nlc) . | Taxonomy . | Invariance . |
---|---|---|

ARCH [proposed CHT, described in Section 4] (nlc) | QUAK | increasing / affine |

AP-BCH (Hansen et al., 2009) (bnds) | QUAK | $\xd7$ / $\xd7$ |

lcCMSA-ES (Spettel et al., 2019) (lc) | QUAK | increasing / $\xd7$ |

Active-set ES (Arnold, 2017) (nlc) | QUAK | increasing / $\xd7$ |

AL (Atamna et al., 2016) (nlc) | QRSK | $\xd7$ / affine |

Stochastic ranking (Runarsson and Yao, 2000) (nlc) | QRSK | $\xd7$ / affine |

MCR (de Paula Garcia et al., 2017) (nlc) | QRSK | increasing / affine |

(1+1)-CMA-ES with ACH (Arnold and Hansen, 2012) (nlc) | NUSK | increasing / affine |

($\mu ,\lambda $)-CMA-ES with ACH (Chocat et al., 2015) (nlc) | NRSK | increasing / affine |

xCMA-ES with ACH (Krause and Glasmachers, 2015) (nlc) | NUSK | increasing / affine |

MA-ES with ACH (Spettel and Beyer, 2019) (nlc) | NUSK | increasing / affine |

Resampling technique (nlc) | NUSH | increasing / affine |

CHTs assuming **U**nrelaxable constraints can be applied to QUAK constraints. For example, the resampling technique can be applied to QUAK constraints. However, CHTs assuming weaker conditions on constraints utilize less information on the constraints than that available to the optimization approaches. Therefore, we expect that CHTs assuming QUAK constraints are more efficient for solving QUAK constrained optimization problems. Only two approaches in Table 2, including the proposed approach, are designed for QUAK nonlinear constraints. As it is not clear how active-set ES is extended to a variant of the CMA-ES, the proposed approach is the only approach applied to the CMA-ES. However, if the constraints are nonlinear but **R**elaxable, many of the CHTs listed in Table 2 can be applied.

## 3 Desired Invariance Properties for Constrained Optimization

We describe two invariance properties that are desirable for a CHT that is designed for variants of the CMA-ES.

### 3.1 Element-Wise Increasing Transformation of Functions

An strictly increasing transformation $h:R\u2192R$ is a function satisfying $h(t)<h(s)$ if $t<s$. Invariance to an increasing transformation of the objective function in an unconstrained optimization scenario refers to the property whereby the algorithm does not change the behavior (which is possibly characterized by the sequence of the solutions generated by the algorithm) when solving $f$ and its composite $h\u2218f$. Algorithms that are invariant to any increasing transformation can solve, for example, a non-convex discontinuous function $h\u2218f$ as easily as a convex continuous functions $f$. The importance of this invariance property is extensively recognized in engineering optimization: if the search algorithm is not invariant, the objective function needs to be tuned for the search algorithm to perform effectively, which is time consuming. Numerous evolutionary algorithms, including the CMA-ES, are invariant to any increasing transformation of the objective function, because they use only the objective function value rankings of the candidate solutions.

In constrained optimization, we consider invariance to an element-wise increasing transformation $H=(h0,\u2026,hm):Rm+1\u2192Rm+1$ of the objective and constraint functions $F=(f,g1,\u2026,gm)$, where $hj:R\u2192R$ ($j=0,\u2026,m$) is an increasing transformation and $hj$ for $j=1,\u2026,m$ satisfies $hj(0)=0$. Invariance to an element-wise increasing transformation of the objective and constraint functions refers to the property whereby the algorithm does not change the behavior when solving a constrained problem $F=(f,g1,\u2026,gm)=(h0\u2218f,h1\u2218g1,\u2026,hm\u2218gm)=H\u2218F$. The original constrained optimization problem $F$ and its transformation $H\u2218F$ models the same optimization problem in the sense that they define the same feasible domain $X$ and the same total order on $X$ regarding the objective function value.^{3}

In real-world applications, the ranges of the objective function values and constraint violations are often quite different. Algorithms without this invariance will suffer from this difference and place implicit priority on an objective or certain constraints depending on their values. Therefore, practitioners may determine a reasonable transformation $H$. As described above for unconstrained optimization cases, this can be time consuming, and requires domain knowledge of the problem and deep insight into the optimization algorithm.

Although this invariance property is a straightforward extension of invariance to the increasing transformation of the objective function and it is seemingly important, it is not exhibited by the frequently used penalty function-based techniques that take the sum of the objective and constraint function values as loss values. For example, it is clear that the AL does not exhibit this invariance: it is not invariant to increasing the transformation of $f$ and $gj$. Although the AP-BCH is invariant to any increasing transformation of $gj$, it is not invariant to an increasing transformation of $f$ in general, as the quadratic penalty term is added to the objective function value directly. It has been demonstrated in Sakamoto and Akimoto (2017) that the AP-BCH deteriorates when the objective function is, for example, an exponential function, where the objective function value is more sensitive than the quadratic penalty term.

### 3.2 Affine Transformation of Search Space Coordinates

^{4}although we empirically observe statistically invariant behaviors under arbitrary affine transformations (see Figure 3). Numerous other evolutionary computation approaches are not invariant to these transformations, and we empirically observe rather different performances depending on the affine transformation properties.

In real-world applications, the change in the coordinate system corresponds to the change in the features describing the object to be optimized, or the change in the unit in each feature. In a constrained optimization scenario, the constraint function $gj$ and the objective function $f$ are transformed by the same transformation $A$, resulting in $gj\u2218A$ and $f\u2218A$ in the transformed coordinate system, respectively. The linear constraints are again linear in the transformed coordinate system. However, a box constraint will no longer be a box constraint, but rather a set of linear constraints (forming an $n$-parallelotope shape).

Suppose that the underlying unconstrained optimization algorithm is invariant to any affine transformation of the search space coordinates. As summarized in Table 2, CHTs that only touch the loss function values, such as the resampling technique, ACH, and AL, do not disturb the invariance property of the underlying algorithm, resulting in invariance to the affine transformation of the search space under the constraints (see Atamna et al. (2020) for proof of the affine invariance of AL). However, CHTs for unrelaxable constraints that require a repair operator often lose the invariance property. For example, the AP-BCH does not exhibit this invariance property, as the repair operator used in this approach exploits the fact that the feasible domain is composed of an interval in each coordinate. Repair operators that use the inner product or the distance in Euclidean space are generally affected by the affine transformation of the coordinate system.

## 4 ARCH

We propose an explicit constraint-handling method based on adaptive ranking, known as ARCH. ARCH virtually transforms the constrained optimization problem into an unconstrained problem by defining an adaptive loss function $L$. The loss $L$ is defined by the weighted sum of the rankings of the objective and constraint violations, denoted as $L=RT$ below. The proposed algorithm is designed to exhibit the invariance properties listed in Section 3, and it does not require the objective function to be well defined in the infeasible domain.

### 4.1 Repair Operator

To make ARCH applicable to a problem in which the objective function values are not defined in the infeasible domain, we employ a repair operator in the Darwinian manner. Given a candidate solution $x$, ARCH determines a repaired solution $x\u02dc$. The repaired solution $x\u02dc$ is only used for the objective function value computation.

The condition that $A(x)\u2229S\u2260\u2205$ is indeed simplified to $A(x)\u2260\u2205$ when the constraints are all linear and not redundant. Here, a redundant constraint $gj$ is defined as a constraint such that the boundary of $gj$ is never in contact with the feasible domain; that is, ${x\u2208Rn\u2223gj(x)=0}\u2209S$. Note that determining whether $A(x)\u2260\u2205$ is as easy as checking whether a system of linear equations has a solution. In explicit linear constraint situations, such a redundant constraint can be removed in advance.

In this approach, the Mahalanobis distance is employed rather than the Euclidean distance $\u2225x-y\u22252=(x-y)T(x-y)$, in order to achieve affine invariance of the search space coordinate system. This point is discussed formally in Section 5.

### 4.2 Total Ranking

### 4.3 Adaptation of Ranking Coefficient

The ranking coefficient $\alpha $ controls the balance between $Rf$ and $Rg$. If $\alpha $ is too large, the search distribution is biased toward the feasible domain. If $\alpha $ is too small, the search distribution is biased toward the infeasible domain. Therefore, the adaptation of $\alpha $ significantly influences the search performance.

Ideally, the CMA-ES with the proposed constraint handling should treat a constrained sphere function as though it is an unconstrained sphere function. That is, we wish (10) to hold even for a constrained sphere problem.

**Ranking Coefficient Adaptation**Initialize $\alpha (0)=1$, $dm(0)=0$. At iteration $t$, $\alpha (t)$ is updated as

^{5}

## 5 Invariance Properties of ARCH

### 5.1 ARCH on Inner Product Space

We begin by defining ARCH on an inner product space $(V,\u2329\xb7,\xb7\u232a)$.

A random vector $XV$ on the inner product space $(V,\u2329\xb7,\xb7\u232a)$ has a normal distribution $N(\mu V,\Sigma V)$ if $\u2329v,X\u232a$ has the normal distribution $N(\u2329v,\mu V\u232a,\u2329v,\Sigma V(v\u232a))$ for any $v\u2208V$, where $\mu V\u2208V$ is the mean vector and $\Sigma V:V\u2192V$ is the covariance (Eaton, 2007). Let $FV=(fV,g1V,\u2026,gmV)$ be the objective and constraint functions defined on $V$ as (4). ARCH on the inner product space receives these distribution parameters $\mu V$ and $\Sigma V$, candidate solutions $p1,\u2026,p\lambda $ drawn from the normal distribution, and the objective and constraint functions $FV$.

### 5.2 Invariance to Affine Transformation of Search Space Coordinates

Firstly, we demonstrate that ARCH is invariant to an arbitrary affine transformation of the search space coordinate system. For this purpose, it is sufficient to prove that the operation of ARCH (14) on any given coordinate system $x=\psi (p)$ is equivalent to the operation of ARCH (19) on the inner product space $(V,\u2329\xb7,\xb7\u232a)$.

Note that the normal distribution $N(\mu V,\Sigma V)$ on $V$ corresponds to the normal distribution $N(\mu \psi ,\Sigma \psi )$ on $Rn$ in the coordinate system $\psi :V\u2192Rn$, where $\mu \psi =\psi (\mu V)$ and $[\Sigma \psi ]i,j=\u2329ei,\Sigma V(ej)\u232a$. The objective and constraint function $F:Rn\u2192Rm+1$ corresponding to $FV:V\u2192Rm+1$ is expressed as $F=FV\u2218\psi -1$.

### 5.3 Invariance to Element-Wise Increasing Transformation

Next, we demonstrate that ARCH is invariant to an arbitrary element-wise increasing transformation. That is, the ranking and internal parameter updates are not affected by the transformation.

### 5.4 Invariance Properties of Entire Algorithm

Finally, we discuss the invariance properties of the search algorithm including ARCH.

Suppose that the underlying unconstrained optimization algorithm is defined on an inner product space, and the following steps are repeated:

$p1,\u2026,p\lambda =sample(\mu V,\Sigma V,\theta V)$;

$Rf(p1),\u2026,Rf(p\lambda )=evaluate((pk)k=1\lambda ,fV)$; and

$\mu V,\Sigma V,\theta V=update((pk,Rf(pk))k=1\lambda ,\mu V,\Sigma V,\theta V)$,

where $\theta V$ contains all of the information used in the algorithm. By nature, it is invariant to any strictly increasing transformation of the objective function $fV$, as the outputs of $evaluate$ are the rankings of the objective function values, and it is invariant to any strictly increasing transformation $h$. Moreover, as it is defined independently of a coordinate system, its implementation under a given coordinate system $\psi $ produces $\psi (p1),\u2026,\psi (p\lambda ),\mu \psi ,\Sigma \psi ,\theta \psi $ at any iteration $t$, where $p1,\u2026,p\lambda $ are the candidate solutions mentioned above, while $\mu \psi $, $\Sigma \psi $, and $\theta \psi $ are the expressions of $\mu V$, $\Sigma V$, and $\theta V$, respectively, in the coordinate system $\psi $.

When solving an explicitly constrained minimization problem $FV$, we simply replace the evaluation step as follows:

$p1,\u2026,p\lambda =sample(\mu V,\Sigma V,\theta V)$;

$RT(p1),\u2026,RT(p\lambda ),\alpha ,dm=ARCHV((pk)k=1\lambda ,\mu V,\Sigma V-1,\alpha ,dm,FV)$; and

$\mu V,\Sigma V,\theta V=update((pk,RT(pk))k=1\lambda ,\mu V,\Sigma V,\theta V)$.

As all of the operations are defined independently from a coordinate system, the entire algorithm is invariant to any affine coordinate system transformation. Moreover, because $ARCHV$ is invariant to any element-wise increasing transformation $H$, and the underlying algorithm only relies on the rankings of the candidate solutions, all of the operations are invariant to $H$.

ARCH can be combined with a search algorithm that is not generalized to an inner product space. For example, our baseline CMA-ES, which is described in Section 2.1, does not generalize to an inner product space (as $p\sigma $ is coordinate dependent), although we empirically observe quite uniform behaviors under different coordinate systems, as we observe in Section 6. ARCH does not disturb the uniform behavior, as ARCH itself is invariant to any affine coordinate transformation.

## 6 Experiments on Linearly Constrained Quadratic Problems

Our first numerical experiments are aimed at demonstrating the effects of the invariance to affine coordinate transformations of ARCH. We develop a set of linearly constrained test problems. We compare ARCH with other CHTs for the CMA-ES, demonstrating the manner in which affine coordinate transformations affect the performances of CHTs that are not invariant thereto, while ARCH performs equally effectively under different transformations. Moreover, we compare ARCH with CHTs that are specialized for box constraints. Note that box CHTs are expected to be more efficient for box-constrained optimization problems than other CHTs for general linear constraints. We observe that ARCH is competitive with and sometimes outperforms these, even on box-constrained problems.

### 6.1 Linearly Constrained Quadratic Minimization Problems

We consider a linearly constrained problem (P0), defined in the $n$-dimensional inner product space $(V,\u2329\xb7,\xb7\u232a)$ on the real field $R$ as (4), where the constraints are defined as $gi(p)=\u2329-vi,p\u232a-[LB]i$ and $gn+i(p)=\u2329vi,p\u232a-[UB]i$, where $vi\u2208V$ is the normal vector of the $i$th constraint and these are orthogonal to one another; that is, $\u2329vi,vj\u232a=\delta i,j$, and $[LB]i,[UB]i\u2208R$ are the lower and upper bounds, respectively.

These optimization problems (P1) and (P2) are equivalent to (P0). However, (P1) has a box constraint, while (P2) has a set of linear constraints. Algorithms that are invariant to any affine transformation of the search space must perform equivalently on these problems. We use these problems to assess the invariance properties.

### 6.2 Settings

For the test problem (P1) in (20), we use three objective functions: sphere ($fsph(x)=\u2211i=1n[x]i2$), ellipsoid ($fell(x)=\u2211i=1n106i-1n-1[x]i2$), and rotated ellipsoid ($frotell(x)=fell(Q\theta x)$), where $Q\theta \u2208Rn\xd7n$ is a block diagonal matrix such that each block is a $2\xd72$ orthogonal (that is, rotation) matrix and all blocks share the same matrix. We let the counter-clockwise rotation angle of each block be $\theta =\pi /6$ in this experiment.

Problem (P1) is a box-constrained problem. The lower and upper bounds are set as $LB=[-1,1,\u2026,-1,1]T$ and $UB=LB+[5,\u2026,5]T$, respectively. The optimal solution, which is obtained by the Karush–Kuhn–Tucker (KKT) condition, is located at $x*=[0,1,\u2026,0,1]T$ for $fsph$ and $fell$, and $x*\u2248[0.37,1,\u2026,0.37,1]T$ for $frotell$. That is, the even-numbered coordinates of the optimum are on the boundary, while the others are not. The number of active constraints at the optimum is $n/2$.

We use the following two matrices for the transformation matrix $P$ in (21): $Prot=Q\theta '$ and $Pillrot=Q\theta 'TDQ\theta '$ with $\theta '=\pi /4$, where $Q\theta '$ is as defined above, with $\theta \u2260\theta '$, and $D=diag(1,10,\u2026,1,10)$ is a diagonal matrix. The transformation matrix $Prot$ only rotates the search space, while $Pillrot$ transforms the rectangular feasible domain into an $n$-parallelotope shape. Hereunder, problem (P1) is denoted by $Box$, the linearly constrained problem (P2) using $Prot$ is denoted by $rotBox$, and (P2) using $Pillrot$ is denoted by $illrotBox$. Their feasible domains and the levelsets of the objective function are illustrated in Figure 2.

For the box-constrained optimization problem (P1), the initial mean vector is $m(0)=UB+LB2+U(-1,1)n$, and the initial covariance matrix is $C(0)=In$. For the linearly constrained optimization problem (P2), $m(0)$ and $C(0)$ are transformed in the manner described in Section 6.1. The search space dimension is $n\u2208{20,50}$, the initial step size is $\sigma (0)=1n\u2211i=1n[UB]i-[LB]i4=1.25$, and the other parameters are set to their default values, as defined in Table 1.

### 6.3 Results and Discussion

Firstly, we focus on the results of ARCH and the resampling technique in Figure 3. Although the underlying CMA-ES is not mathematically proven to be invariant to affine transformation of the search space, we observe in Figure 3 that the lines of ARCH for $Box$, $rotBox$, and $illrotBox$ overlap one another, owing to the invariance of ARCH to the affine transformation of the search space. For resampling as well, we observe the same. This indicates that the performance comparison can be conducted on the most convenient case, namely the $Box$ constraint case. In the following, we focus on analyzing the results on $Box$, which can be generalized to general cases.

Next, we investigate the behavior of ARCH. We observe in Figure 4 that the parameter $dm$ is maintained at approximately 1. This is the desired behavior, as we design the adaptation of $\alpha $ to maintain $dm\u22481$. Moreover, we can observe from the results that the behaviors of the CMA-ES on the unconstrained problem and the behaviors of the CMA-ES with ARCH and AP-BCH are similar, in that the covariance matrix tends to be proportional to the inverse of the Hessian matrix of the objective function. However, in Figure 3, we observe the difference between these two algorithms in terms of the adaptation speed of the covariance matrix. On $fell$ and $frotell$, the CMA-ES with ARCH adapts the covariance matrix significantly faster than that with AP-BCH. By adapting the coefficient $\alpha $, ARCH appears to resemble the selection of candidate solutions on an unconstrained problem better than AP-BCH.

The resampling technique exhibits different behavior. The eigenvalues of the covariance matrix are divided into two, with the smaller values corresponding to the axes where the constraints are active at the optimum, and the greater values corresponding to the axes where the constraints are inactive at the optimum. If we do not use any information from the infeasible domain and generate only feasible candidate solutions, this is a reasonable approach to cause the distribution to be narrow in the directions of the active constraints. This concept has indeed been employed in Arnold and Hansen (2012). However, this strategy significantly reduces the speed of the approach of the mean vector to the optimum on the boundary.

## 7 Experiments on CEC 2006 Constrained Optimization Testbed

The aim of our second numerical experiments is twofold.

Firstly, we demonstrate that ARCH can be applied to nonlinearly as well as linearly constrained problems. To evaluate the efficacy, we compare ARCH with the active-set ES (Arnold, 2017) that can deal with explicit (a priori) and nonlinear constraints, because there is currently only one CHT designed for such a constraint. This is the $(1+1)$-ES-based approach, and the covariance matrix adaptation is not incorporated.

Secondly, we show that, even if the infeasible solutions can be evaluated on the objective, ARCH, which only evaluates the solutions in the feasible domain, exhibits advantages in solving constrained optimization with explicit (a priori) constraints. For this purpose, we compare ARCH with the AL (Atamna et al., 2016) and MCR (de Paula Garcia et al., 2017), which are designed for simulation-based constraints under the relaxed assumption that any infeasible solutions can be evaluated on the objective.

### 7.1 CEC 2006 testbed

To compare the CHTs, we use the CEC 2006 testbed (Liang et al., 2006), which consists of 24 constrained problems including linear/nonlinear and equality/inequality constraints, where the equality constraints $hj(x)=0$ are transformed into inequalities, as described below Eq. (1), with $\u025beq=10-4$. We reuse the CEC 2006 testbed since this testbed is widely used in existing works, which makes it easy to compare algorithms. All optimization variables are bounded as $[LB]i\u2a7d[x]i\u2a7d[UB]i$ ($i\u2208{1,\u2026,n}$). The bound constraints are transformed into a set of linear inequality constraints, as in Eq. (20). Note that ARCH is independent of the manner in which they are transformed, but certain existing CHTs, such as MCR, are dependent on this. Table 3 summarizes the features of the constrained problems. Our implementation of the CEC 2006 testbed is available in the repository (https://github.com/naoking158/ARCH).

Prob. . | $n$ . | Type of $f$ . | LI . | NI . | LE . | NE . | $mact$ . |
---|---|---|---|---|---|---|---|

g01 | 13 | quadratic | 9 | 0 | 0 | 0 | 6 |

g02 | 20 | nonlinear | 0 | 2 | 0 | 0 | 1 |

g03 | 10 | polynomial | 0 | 0 | 0 | 1 | 1 |

g04 | 5 | quadratic | 0 | 6 | 0 | 0 | 2 |

g05 | 4 | cubic | 2 | 0 | 0 | 3 | 3 |

g06 | 2 | cubic | 0 | 2 | 0 | 0 | 2 |

g07 | 10 | quadratic | 3 | 5 | 0 | 0 | 6 |

g08 | 2 | nonlinear | 0 | 2 | 0 | 0 | 0 |

g09 | 7 | polynomial | 0 | 4 | 0 | 0 | 2 |

g10 | 8 | linear | 3 | 3 | 0 | 0 | 6 |

g11 | 2 | quadratic | 0 | 0 | 0 | 1 | 1 |

g12 | 3 | quadratic | 0 | 1 | 0 | 0 | 0 |

g13 | 5 | nonlinear | 0 | 0 | 0 | 3 | 3 |

g14 | 10 | nonlinear | 0 | 0 | 3 | 0 | 3 |

g15 | 3 | quadratic | 0 | 0 | 1 | 1 | 2 |

g16 | 5 | nonlinear | 4 | 34 | 0 | 0 | 4 |

g17 | 6 | nonlinear | 0 | 0 | 0 | 4 | 4 |

g18 | 9 | quadratic | 0 | 13 | 0 | 0 | 6 |

g19 | 15 | nonlinear | 0 | 5 | 0 | 0 | 0 |

g20 | 24 | linear | 0 | 6 | 2 | 12 | 16 |

g21 | 7 | linear | 0 | 1 | 0 | 5 | 6 |

g22 | 22 | linear | 0 | 1 | 8 | 11 | 19 |

g23 | 9 | linear | 0 | 2 | 3 | 1 | 6 |

g24 | 2 | linear | 0 | 2 | 0 | 0 | 2 |

Prob. . | $n$ . | Type of $f$ . | LI . | NI . | LE . | NE . | $mact$ . |
---|---|---|---|---|---|---|---|

g01 | 13 | quadratic | 9 | 0 | 0 | 0 | 6 |

g02 | 20 | nonlinear | 0 | 2 | 0 | 0 | 1 |

g03 | 10 | polynomial | 0 | 0 | 0 | 1 | 1 |

g04 | 5 | quadratic | 0 | 6 | 0 | 0 | 2 |

g05 | 4 | cubic | 2 | 0 | 0 | 3 | 3 |

g06 | 2 | cubic | 0 | 2 | 0 | 0 | 2 |

g07 | 10 | quadratic | 3 | 5 | 0 | 0 | 6 |

g08 | 2 | nonlinear | 0 | 2 | 0 | 0 | 0 |

g09 | 7 | polynomial | 0 | 4 | 0 | 0 | 2 |

g10 | 8 | linear | 3 | 3 | 0 | 0 | 6 |

g11 | 2 | quadratic | 0 | 0 | 0 | 1 | 1 |

g12 | 3 | quadratic | 0 | 1 | 0 | 0 | 0 |

g13 | 5 | nonlinear | 0 | 0 | 0 | 3 | 3 |

g14 | 10 | nonlinear | 0 | 0 | 3 | 0 | 3 |

g15 | 3 | quadratic | 0 | 0 | 1 | 1 | 2 |

g16 | 5 | nonlinear | 4 | 34 | 0 | 0 | 4 |

g17 | 6 | nonlinear | 0 | 0 | 0 | 4 | 4 |

g18 | 9 | quadratic | 0 | 13 | 0 | 0 | 6 |

g19 | 15 | nonlinear | 0 | 5 | 0 | 0 | 0 |

g20 | 24 | linear | 0 | 6 | 2 | 12 | 16 |

g21 | 7 | linear | 0 | 1 | 0 | 5 | 6 |

g22 | 22 | linear | 0 | 1 | 8 | 11 | 19 |

g23 | 9 | linear | 0 | 2 | 3 | 1 | 6 |

g24 | 2 | linear | 0 | 2 | 0 | 0 | 2 |

### 7.2 ARCH vs Active-Set ES

In this section, we compare ARCH with the active-set ES (Arnold, 2017).

#### 7.2.1 Settings

We follow the experimental setup of Arnold (2017). Note that the test problems used in Arnold (2017), known as the Michalewicz/Schoenauer (1996) test set are equivalent to the first 11 constrained problems in the CEC 2006 testbed. The initial mean vector $m(0)$ is sampled from the uniform distribution $U[LB,UB]$, and then projected onto the boundary of the feasible domain by the repair operator defined by each CHT if the sampled point is infeasible. The initial step size and covariance matrix are set to $\sigma (0)=0.2min{UB-LB}$ and $C(0)=In$, respectively. A run is terminated if the algorithm reaches 1200 iterations or locates a feasible candidate solution $x$ with the objective value $f(x)<f*+\u025b|f*|$, where $f*$ is the optimal objective function value, as reported in Liang et al. (2006), and $\u025b\u2208{10-4,10-8}$ is referred to as the target accuracy. The other parameters of the CMA-ES are set to the default values listed in Table 1.

#### 7.2.2 Results and Discussion

We conduct 100 runs of each algorithm for each problem. The results are summarized in Table 4. The success rate is defined as the number of runs in which the algorithm can reach the target within 1200 iterations, divided by the total number of runs (100).

. | Target Accuracy . | Active-set ES . | ARCH . | |||
---|---|---|---|---|---|---|

Prob. . | $\u025b$ . | NFES (# Iterations) . | SR . | NFES . | # Iterations . | SR . |

$g01$ | $10-4$ | 30 | 61% | 154 | $14$ | 96% |

$10-8$ | 30 | 62% | 154 | $14$ | 96% | |

$g02$ | $10-4$ | – | 0% | – | – | 0% |

$10-8$ | – | 0% | – | – | 0% | |

$g03$ | $10-4$ | 463 | 100% | 900 | $90$ | 100% |

$10-8$ | 863 | 100% | 1720 | $172$ | 100% | |

$g04$ | $10-4$ | 22 | 100% | 176 | 22 | 100% |

$10-8$ | 24 | 100% | 176 | $22$ | 100% | |

$g05$ | $10-4$ | $36$ | 100% | 624 | 78 | 97% |

$10-8$ | $82$ | 100% | 1160 | 145 | 41% | |

$g06$ | $10-4$ | 5 | 100% | 6 | $1$ | 100% |

$10-8$ | 5 | 100% | 6 | $1$ | 100% | |

$g07$ | $10-4$ | 325 | 100% | 1635 | $163.5$ | 100% |

$10-8$ | 557 | 100% | 2705 | $270.5$ | 100% | |

$g08$ | $10-4$ | 107 | 38% | 510 | $85$ | 57% |

$10-8$ | 210 | 44% | 636 | $106$ | 57% | |

$g09$ | $10-4$ | 307 | 100% | 846 | $94$ | 100% |

$10-8$ | 582 | 100% | 1620 | $180$ | 100% | |

$g10$ | $10-4$ | 117 | 100% | 580 | $58$ | 100% |

$10-8$ | $236$ | 100% | 2985 | 298.5 | 100% | |

$g11$ | $10-4$ | 25 | 100% | 60 | $10$ | 100% |

$10-8$ | 73 | 100% | 258 | $43$ | 100% |

. | Target Accuracy . | Active-set ES . | ARCH . | |||
---|---|---|---|---|---|---|

Prob. . | $\u025b$ . | NFES (# Iterations) . | SR . | NFES . | # Iterations . | SR . |

$g01$ | $10-4$ | 30 | 61% | 154 | $14$ | 96% |

$10-8$ | 30 | 62% | 154 | $14$ | 96% | |

$g02$ | $10-4$ | – | 0% | – | – | 0% |

$10-8$ | – | 0% | – | – | 0% | |

$g03$ | $10-4$ | 463 | 100% | 900 | $90$ | 100% |

$10-8$ | 863 | 100% | 1720 | $172$ | 100% | |

$g04$ | $10-4$ | 22 | 100% | 176 | 22 | 100% |

$10-8$ | 24 | 100% | 176 | $22$ | 100% | |

$g05$ | $10-4$ | $36$ | 100% | 624 | 78 | 97% |

$10-8$ | $82$ | 100% | 1160 | 145 | 41% | |

$g06$ | $10-4$ | 5 | 100% | 6 | $1$ | 100% |

$10-8$ | 5 | 100% | 6 | $1$ | 100% | |

$g07$ | $10-4$ | 325 | 100% | 1635 | $163.5$ | 100% |

$10-8$ | 557 | 100% | 2705 | $270.5$ | 100% | |

$g08$ | $10-4$ | 107 | 38% | 510 | $85$ | 57% |

$10-8$ | 210 | 44% | 636 | $106$ | 57% | |

$g09$ | $10-4$ | 307 | 100% | 846 | $94$ | 100% |

$10-8$ | 582 | 100% | 1620 | $180$ | 100% | |

$g10$ | $10-4$ | 117 | 100% | 580 | $58$ | 100% |

$10-8$ | $236$ | 100% | 2985 | 298.5 | 100% | |

$g11$ | $10-4$ | 25 | 100% | 60 | $10$ | 100% |

$10-8$ | 73 | 100% | 258 | $43$ | 100% |

Comparing the median number of $f$-calls for reaching the same accuracy, $\u025b$, in Table 4, we can observe that the active-set ES achieves the target accuracy with a lower number of $f$-calls. However, the median number of iterations is lower for ARCH, as it is the population-based approach. As the number of candidate solutions per iteration is increased, we expect the number of iterations to decrease. This is an advantage of ARCH when the candidate solutions can be evaluated on $f$ in parallel.

Focusing on problems g01 and g08, ARCH exhibits higher success rates than the active-set ES. This is possibly because ARCH is combined with the $(\mu ,\lambda )$ type ES, rather than the (1+1)-ES. As these problems have local minima, ARCH can reach the target at a higher rate than the active-set ES, which is (1+1)-ES based. None of the algorithms can solve problem g02, because this problem has a strongly multimodal landscape.

Another advantage that does not appear in this experiment is that we incorporate the covariance matrix adaptation, which is desirable for ill-conditioned and non-separable problems.

### 7.3 ARCH vs AL and MCR

#### 7.3.1 Settings

In this experiment, we use all 24 problems in the CEC 2006 testbed. Following the default setup in Liang et al. (2006), we regard each run as successful if a feasible candidate solution $x$ with the objective function value $f(x)-f*\u2a7d10-4$ is located within $5\xd7105$$f$-calls, where $f*$ is the optimal objective value.

Because multimodal functions exist in the CEC 2006 testbed, we use the BIPOP restart strategy (Hansen, 2009), which updates the population size and initial step size for every restart. Instead of randomly sampling an initial mean vector $m(0)$ from the entire search space, we randomly sample one feasible solution from the above prepared feasible set in each restart. The initial step size and covariance matrix are set to $\sigma (0)=exp1n\u2211i=1nln[UB]i-[LB]i5$ and $C(0)=diag(UB-LB5\sigma (0))2$, respectively, with the step size scaled by the BIPOP strategy in each restart. The other parameters of the CMA-ES are set to the default values listed in Table 1, and we follow the restart condition described in Hansen (2009).

#### 7.3.2 Results and Discussion

All results are summarized in Table 5. The statistical significance is tested using the two-sided Mann–Whitney rank test, with a significance level of $5%/m$, where $m=42$ is the number of tests (Bonferroni correction).^{6}

. | AL . | MCR . | ARCH . | |||
---|---|---|---|---|---|---|

Prob. . | NFES . | SR . | NFES . | SR . | NFES . | SR . |

g01 | – | 0% | $320245(4.92)$ | 100% | $737(0.38)\u2218$ | 100% |

g02 | – | 0% | – | 0% | – | 0% |

g03 | – | 0% | $10670(0.33)$ | 100% | $1030(0.75)\u2218$ | 100% |

g04 | – | 0% | $7728(0.25)$ | 100% | $120(0.21)\u2218$ | 100% |

g05 | $22056(10.00)+$ | 100% | $269447(47.96)$ | 8% | $737(1.46)\u2218*$ | 100% |

g06 | $8981(43.50)$ | 4% | $1332(0.00)+$ | 100% | $90(3.62)\u2218*$ | 100% |

g07 | $14415(2.42)$ | 100% | $19860(0.17)$ | 100% | $2782(3.88)\u2218*$ | 100% |

g08 | $468(0.46)$ | 100% | $162(0.17)$ | 100% | $390(3.08)$ | 100% |

g09 | $2079(0.00)+$ | 100% | $5328(0.00)$ | 100% | $1791(2.42)\u2218$ | 100% |

g10 | – | 0% | $41780(1.12)$ | 100% | $2990(0.33)\u2218$ | 100% |

g11 | $294(0.00)+$ | 100% | $3204(0.88)$ | 100% | $54(0.17)\u2218*$ | 100% |

g12 | $15601(6.08)$ | 100% | $2870(1.38)+$ | 100% | $902(6.21)\u2218*$ | 100% |

g13 | $14208(2.12)$ | 100% | – | 0% | $2686(4.96)*$ | 100% |

g14 | $418341(16.17)$ | 12% | $16830(0.46)+$ | 100% | $2172(2.50)\u2218*$ | 100% |

g15 | $2317(0.42)+$ | 100% | $72239(11.17)$ | 100% | $175(0.92)\u2218*$ | 100% |

g16 | $5816(0.54)$ | 100% | $8664(0.46)$ | 100% | – | 0% |

g17 | $29400(5.92)$ | 100% | – | 0% | $1814(4.17)*$ | 100% |

g18 | $11490(2.75)$ | 100% | $8810(1.58)$ | 100% | $3939(6.83)\u2218*$ | 100% |

g19 | – | 0% | $127308(1.38)$ | 100% | $5772(0.08)\u2218$ | 100% |

g20 | – | 0% | – | 0% | – | 0% |

g21 | – | 0% | – | 0% | – | 0% |

g22 | – | 0% | – | 0% | – | 0% |

g23 | – | 0% | $96078(34.38)$ | 4% | $9386(4.62)\u2218$ | 100% |

g24 | $7422(3.33)$ | 100% | $600(0.21)+$ | 100% | $54(0.38)\u2218*$ | 100% |

. | AL . | MCR . | ARCH . | |||
---|---|---|---|---|---|---|

Prob. . | NFES . | SR . | NFES . | SR . | NFES . | SR . |

g01 | – | 0% | $320245(4.92)$ | 100% | $737(0.38)\u2218$ | 100% |

g02 | – | 0% | – | 0% | – | 0% |

g03 | – | 0% | $10670(0.33)$ | 100% | $1030(0.75)\u2218$ | 100% |

g04 | – | 0% | $7728(0.25)$ | 100% | $120(0.21)\u2218$ | 100% |

g05 | $22056(10.00)+$ | 100% | $269447(47.96)$ | 8% | $737(1.46)\u2218*$ | 100% |

g06 | $8981(43.50)$ | 4% | $1332(0.00)+$ | 100% | $90(3.62)\u2218*$ | 100% |

g07 | $14415(2.42)$ | 100% | $19860(0.17)$ | 100% | $2782(3.88)\u2218*$ | 100% |

g08 | $468(0.46)$ | 100% | $162(0.17)$ | 100% | $390(3.08)$ | 100% |

g09 | $2079(0.00)+$ | 100% | $5328(0.00)$ | 100% | $1791(2.42)\u2218$ | 100% |

g10 | – | 0% | $41780(1.12)$ | 100% | $2990(0.33)\u2218$ | 100% |

g11 | $294(0.00)+$ | 100% | $3204(0.88)$ | 100% | $54(0.17)\u2218*$ | 100% |

g12 | $15601(6.08)$ | 100% | $2870(1.38)+$ | 100% | $902(6.21)\u2218*$ | 100% |

g13 | $14208(2.12)$ | 100% | – | 0% | $2686(4.96)*$ | 100% |

g14 | $418341(16.17)$ | 12% | $16830(0.46)+$ | 100% | $2172(2.50)\u2218*$ | 100% |

g15 | $2317(0.42)+$ | 100% | $72239(11.17)$ | 100% | $175(0.92)\u2218*$ | 100% |

g16 | $5816(0.54)$ | 100% | $8664(0.46)$ | 100% | – | 0% |

g17 | $29400(5.92)$ | 100% | – | 0% | $1814(4.17)*$ | 100% |

g18 | $11490(2.75)$ | 100% | $8810(1.58)$ | 100% | $3939(6.83)\u2218*$ | 100% |

g19 | – | 0% | $127308(1.38)$ | 100% | $5772(0.08)\u2218$ | 100% |

g20 | – | 0% | – | 0% | – | 0% |

g21 | – | 0% | – | 0% | – | 0% |

g22 | – | 0% | – | 0% | – | 0% |

g23 | – | 0% | $96078(34.38)$ | 4% | $9386(4.62)\u2218$ | 100% |

g24 | $7422(3.33)$ | 100% | $600(0.21)+$ | 100% | $54(0.38)\u2218*$ | 100% |

The MCR and AL are CHTs that are often used for simulation-based constraints. They do not exploit the fact that the constraints are explicit, but assume that $f$ is defined on the infeasible domain. With the exception of problems g08 and g16, ARCH overwhelmingly outperforms the MCR and AL, which do not explicitly utilize the fact that $g$-calls are computationally cheaper than $f$-calls, and perform as many $f$-calls as $g$-calls. This indicates that ARCH can efficiently exploit the fact that the constraints are explicit, even if the objective function is defined on the infeasible domain.

ARCH does not reach the target for problems g02, g16, and g20 to g22. For g02, g20, g21, and g22, no algorithms can reach the target value. Problem g02 has a strongly multimodal landscape, and it is difficult to locate the global optimum among many local optima. Problems g20 to g22 have many nonlinear equality constraints, and it is difficult to grasp the global landscape without using the function values outside of the feasible domain. In particular, a feasible solution has not yet been determined for problem g20 (Liang et al., 2006). Problem g16 is a unique problem that can be solved by the AL and MCR, but not ARCH. In ARCH, the internal optimizer (SLSQP) used in the repair operation fails to locate the solution. This may occur if there are too many complex constraints.

## 8 Conclusions

A constrained continuous optimization problem has been addressed in this study, in which the constraints are assumed to be explicitly written as mathematical expressions and their evaluation time is negligible compared to that of the objective function. We do not assume that the objective function values are defined outside of the feasible domain. Our proposed CHT for the CMA-ES, known as ARCH, can handle explicit and nonlinear constraints. ARCH is designed to be invariant to any element-wise transformation of the objective and constraint functions, as well as to any affine transformation of the search space coordinate, so as to preserve the invariance properties of the underlying CMA-ES. The invariance properties are mathematically proven and empirically validated. To the best of the authors' knowledge, this is the first approach for explicit constraints that is invariant to those transformations.

Two sets of experiments revealed the effectiveness of ARCH. The first demonstrated that the convergence speed of ARCH improved compared with BCH, in the especially ill-conditioned and nonseparable objective function while it is almost the same as in the well-conditioned problem. These results could be attributed to the fact that adapting the coefficient $\alpha $ allows for faster adaptation of the covariance matrix $C$, and the affine invariance allows for a search behavior that resembles one on the well-conditioned function. The second experiment revealed that ARCH is overwhelmingly more efficient than the CHTs that do not exploit the explicit constraints. This implies that the explicit constraints should be exploited in the constraint handling, which ARCH effectively achieves. However, note that the CHTs used in the comparison are limited to those available to the CMA-ES. Further comparison with state-of-the-art methods is necessary to show the usefulness of ARCH among existing approaches. On the other hand, the comparison to the active-set (1+1)-ES showed a possibility to further improve the efficacy of ARCH in terms of the number of the objective function calls. Incorporating the idea of the active-set ES into the repair operator in ARCH is a possible direction of a future work.

ARCH is designed to solve the constrained continuous optimization problems where the constraints are all explicit. Box-constrained optimization problems are such examples, where ARCH exhibits superior performance to an existing box constraint handling technique. However, it is often the case that there exist explicit constraints and simulation-based constraints at the same time, where the explicit constraints are prerequisites to a simulator that computes the objective function value and the simulation-based constraint violation values. ARCH is expected to be inefficient for such problems as it requires a substantial amount of constraint evaluations. To deal with the combination of these two types of constraints, one might need to combine an explicit constraint handling such as ARCH and a simulation-based constraint handling such as the multiple constraint ranking (MCR) technique. The invariance properties of ARCH will play an important role to preserve the invariance properties of the entire search algorithm when two CHTs are combined. This is an important direction of future work.

## Notes

^{1}

Our implementation of ARCH is available in a GitHub repository (https://github.com/naoking158/ARCH).

^{2}

Note that we introduce $\gamma \sigma $ and $\gamma c$ that does not appear in a standard formulation in order to treat the initialization effect of the evolution paths and write it short. See Akimoto and Hansen (2020) for more detail.

^{3}

This transformation is applied after the transformation from the equality constraint to the inequality one described in the introduction is performed.

^{4}

The invariance of CMA-ES can be proven by using $\u2225C(t)-1pc\u2225$ instead of $\u2225p\sigma \u2225$ of the step size update formula in Section 2.1 (note the meaning changes because what is accumulated is not z). However, the step size formula using $p\sigma $ is generally employed, and no adverse effect of this difference has been empirically observed. Therefore, we use $p\sigma $ as well in this article.

^{5}

We observed in preliminary experiments that the best result was obtained by using Eq. (13).

^{6}

Since the tests have been performed excluding the results where no runs reached the target, the number of tests was 42, not $24\xd73=72$.

## Acknowledgments

This work was supported by the JSPS KAKENHI under Grant Numbers 19H04179 and 19J21892.

## References

*Theoretical Computer Science*,

*Evolutionary Computation*,

*Theoretical Computer Science*,

*International Journal for Simulation and Multidisciplinary Design Optimization*,

*Computers & Structures*, 187:

*Lecture Notes–Monograph Series. Institute of Mathematical Statistics*,

*A software package for sequential quadratic programming*. Technical Report DFVLR-FB 88-28, DLR German Aerospace Center---Institute for Flight Mechanics, Koln, Germany.