## Abstract

Pareto-based multi-objective evolutionary algorithms experience grand challenges in solving many-objective optimization problems due to their inability to maintain both convergence and diversity in a high-dimensional objective space. Exiting approaches usually modify the selection criteria to overcome this issue. Different from them, we propose a novel meta-objective (MeO) approach that transforms the many-objective optimization problems in which the new optimization problems become easier to solve by the Pareto-based algorithms. MeO converts a given many-objective optimization problem into a new one, which has the same Pareto optimal solutions and the number of objectives with the original one. Each meta-objective in the new problem consists of two components which measure the convergence and diversity performances of a solution, respectively. Since MeO only converts the problem formulation, it can be readily incorporated within any multi-objective evolutionary algorithms, including those non-Pareto-based ones. Particularly, it can boost the Pareto-based algorithms' ability to solve many-objective optimization problems. Due to separately evaluating the convergence and diversity performances of a solution, the traditional density-based selection criteria, for example, crowding distance, will no longer mistake a solution with poor convergence performance for a solution with low density value. By penalizing a solution in term of its convergence performance in the meta-objective space, the Pareto dominance becomes much more effective for a many-objective optimization problem. Comparative study validates the competitive performance of the proposed meta-objective approach in solving many-objective optimization problems.

## 1 Introduction

A great number of real-world applications involve multiple objectives to be optimized simultaneously. These Multi-objective Optimization Problems (MOPs) are commonly seen in a variety of disciplines, such as industrial scheduling (Han et al., 2015, 2017; Li et al., 2018). The conflict of objectives implies that there is no single optimal solution to an MOP, rather a set of trade-off solutions, known as the Pareto optimal solutions.

Over the past two decades, a large number of Multi-Objective Evolutionary Algorithms (MOEAs) have been proposed. Compared to traditional mathematical programming techniques, MOEAs are particularly suited in searching for the Pareto optimal set in one single run. Generally, the task of MOEAs is to find solutions not only close to the Pareto optimal front (i.e., convergence measure), but also uniformly distributed (i.e., diversity measure).

Among various methods, Pareto-based MOEAs are the most popular ones, for example, Nondominated Sorting Genetic Algorithm II (NSGA-II) (Deb et al., 2002) and Strength Pareto Evolutionary Algorithm 2 (SPEA2) (Zitzler et al., 2001), to name a few. These algorithms first adopt the Pareto dominance principle to select nondominated solutions which are always preferred for good convergence. Then a density-based selection criterion is employed to promote a good diversity among the solutions. Pareto-based MOEAs have been proven successful in solving a large number of MOPs.

However, for MOPs with more than three objectives which are often referred to as Many-objective Optimization Problems (He and Yen, 2016; Liu, Gong, Sun, and Jin, 2017; Liu et al., 2018b), the performance of Pareto-based MOEAs generally deteriorates appreciably. One main reason is that the Pareto dominance-based selection criterion becomes very ineffective, since the proportion of nondominated solutions in the population rises considerably as the number of objectives increases, and then the density-based criterion alone plays a crucial role in environmental and mating selections. However, the density-based criterion often fails in a high-dimensional space, since a solution far away from the Pareto optimal front often has a lower density value.

Intuitively, there are two approaches to improving the performance of Pareto-based MOEAs in solving MaOPs: (1) redefining the dominance relationship (Laumanns et al., 2002; Jaimes et al., 2010; Sato et al., 2011; Yang et al., 2013; He et al., 2014); (2) improving the diversity-based secondary selection criterion or replacing it with another selection criterion (Deb and Jain, 2013; Li et al., 2014; Zhang et al., 2015). These approaches modify either the Pareto dominance or density-based selection criterion, and have shown great potential in many-objective optimization.

In this study, different from the exiting approaches of modifying the selection criteria in Pareto-based MOEAs, we propose a novel Meta-Objective approach (MeO for short) of modifying the many-objective optimization problems. By the proposed method, a many-objective optimization problem is transformed into a new optimization problem, which has the same Pareto optimal solution set as the original one, but is easier to be solved by the Pareto-based MOEAs. As mentioned above, the design goals of Pareto-based MOEAs are to utilize the Pareto dominance and density-based selection criteria to search for solutions with good convergence and diversity, respectively. However, the goals are often mistreated in many-objective optimization, due to the low efficiency of Pareto dominance and the misleading convergence information in density estimation. In our proposed meta-objective approach, these goals become apparent to reach when solving the new optimization problem. To be specific, the new optimization problem has the same number of objectives as the original optimization problem. Each meta-objective in the new optimization problem consists of two components which measure the convergence and diversity performances of a solution, respectively. The convergence component in each meta-objective is the same, and it represents the distance of a solution to the Pareto optimal front. On the other hand, the diversity components in each meta-objective are different, and represent the positions of the solution according to different orientations. The diversity components lead to conflicts among the meta-objectives. Since all the meta-objective have the same convergence component, a solution closer to the Pareto optimal front, that is, with better convergence performance, is more likely to dominate the other solution in the meta-objective space than that in the original objective space. For the same reason, the density-based selection criterion can focus on diversity components and will be no longer misguided by the convergence information. Therefore, the Pareto-based MOEAs are better equipped to solve the MaOPs by incorporating MeO.

The main contributions of this work can be summarized as follows:

First, a meta-objective method, termed MeO, is proposed, which transforms a given MaOP into a new one. There are two components in each meta-objective in the new MaOP, which measure the convergence and diversity performances of a solution, respectively. Solving the new MaOP by an MOEA leads to a solution set with a good overall performance. This method provides a new idea of managing convergence and diversity in many-objective optimization, and it can be seamlessly integrated into any MOEA, including those of non-Pareto-based ones. Moreover, it is computationally cheap and does not require a set of predefined reference points or vectors.

Next, three alternatives to the convergence component are proposed based on the Pareto rank method, the $Lp$ scalarizing method, and the double-rank method, respectively, since the true Pareto optimal front is actually unknown. The first two are feasible but ineffective. Therefore, the third one is recommended with the following advantages: 1) it is theoretically consistent with the Pareto dominance criterion; 2) it provides an adequate selection pressure in the high-dimensional objective space; and 3) it is insensitive to the settings of control parameters.

Last but most importantly, MeO provides a general extension of Pareto-based algorithms in many-objective optimization. By integrating MeO, the performances of Pareto-based MOEAs, for example, NSGA-II and SPEA2, are significantly boosted in solving MaOPs. On one hand, the density-based selection criteria in these algorithms will no longer incorrectly estimate a solution's density value, since the convergence and diversity performances of a solution are separately evaluated in the meta-objective space. On the other hand, the Pareto dominance will become much more effective, since the meta-objective values of a solution are penalized in term of its convergence performance.

The remainder of this article is organized as follows. In Section 2, fundamental definitions in multi-objective optimization are given for the completeness of the presentation and the motivation of this work is also elaborated. The proposed MeO is then described in detail in Section 3. Section 4 demonstrates the effectiveness of MeO by incorporating it into two representative MOEAs. The comparison of MeO with the state-of-the-art MOEAs are given in Section 5. Section 6 concludes the article and provides pertinent observations and future research directions.

## 2 Preliminaries

### 2.1 Basic Definitions in Multi-Objective Optimization

In multi-objective optimization, the following concepts have been well defined and widely applied.

$ParetoDominance:$ For any two different solutions to the problem in (1), $x1,x2\u2208S$, if $\u2200m=1,\u2026,M,fmx1\u2264fmx2$, and $\u2203i=1,\u2026,M,fix1<fix2$, then $x1$ dominates $x2$, denoted as $x1\u227ax2$.

$Paretooptimalset:$ For a solution to the problem in (1), $x*\u2208S$, if there is no $x\u2208S$ satisfying $x\u227ax*$, then $x*$ is the Pareto optimal solution. All such solutions form a set often called the Pareto optimal solution set ($PS$).

$Paretooptimalfront:$ The image of the Pareto optimal solution set in the objective space is known as the Pareto optimal front ($PF$).

$Idealpoint:$ The ideal point $z*=(z1*,\u2026,zM*)T$, where $zm*$ is the infimum of $fm(x),m=1,\u2026,M$.

### 2.2 Many-Objective Evolutionary Optimization

As mentioned in Section 1, many-objective optimization problems have more than three objectives. For the traditional Pareto-based MOEAs, their ability of solving MaOPs are usually improved by the following two approaches. The first is redefining the dominance relationship. A large body of research along this line has been reported, for example, dominance area control (Sato et al., 2011), Pareto partial dominance (Jaimes et al., 2010), $\u025b$-dominance (Laumanns et al., 2002), and fuzzy Pareto dominance (He et al., 2014). These modified dominance relationships have been shown to be able to improve the convergence of MOEAs for solving MaOPs. Recently, a Grid-based Evolutionary Algorithm (GrEA) (Yang et al., 2013) was developed based on the concept of $\u025b$-dominance and showed encouraging performance in many-objective optimization. The second is to improve the diversity-based secondary selection criterion or replace it with another selection criterion. In Laumanns et al. (2002), a diversity management operator is adopted to meet the requirement on diversity in selection. By shifting the position of a solution in sparse regions, a Shift-based Density Estimation (SDE) was proposed to discard solutions with poor convergence (Li et al., 2014). NSGA-III, an extension of NSGA-II, was developed in Deb and Jain (2013), which employs a reference-point-based selection criterion to promote diversity. Recently, a Knee point-driven Evolutionary Algorithm (KnEA) (Zhang et al., 2015) was proposed by preferring local knee solutions among nondominated ones.

In addition to enhancing the Pareto-based MOEAs, the decomposition-based MOEAs have been found to be a promising alternative to solve MaOPs. A well-regarded representative of decomposition-based algorithms is MOEA/D (Zhang and Li, 2007). In MOEA/D, a set of well-distributed reference vectors are first defined. Based on these reference vectors, a given MaOP is aggregated into a number of scalarizing functions. Individuals in a population are guided to search towards the Pareto optimal front in the directions specified by the reference vectors via minimizing the values of the scalarizing functions. MOEA/D has been demonstrated as efficient in solving MaOPs. However, one known problem of MOEA/D is that uniformly distributed reference vectors do not necessarily lead to uniformly distributed solutions, particularly for problems with irregular (i.e., nonuniform) Pareto optimal fronts. Several methods to adaptively generating reference vectors have been proposed to address this issue (Qi et al., 2014; Liu, Gong, Sun, and Zhang, 2017). In addition, Ensemble Fitness Ranking with a Ranking Restriction scheme (EFR-RR) (Yuan et al., 2016) provides another means to achieve uniformly distributed solutions in a high-dimensional objective space, in which only solutions close to a reference vector in the objective space will be further considered.

Indicator-Based Evolutionary Algorithms (IBEA) (Zitzler and Künzli, 2004) are theoretically well-supported options to the Pareto-based MOEAs. It adopts a performance indicator to account for both convergence and diversity of a solution. Solutions with the best values of the indicator are preferred. Among others, hypervolume is a widely used indicator in IBEAs. Unfortunately, the computational complexity for calculating hypervolume increases exponentially as the number of objectives increases, which makes it computationally prohibitive for MaOPs. To address this issue, HypE (Bader and Zitzler, 2011) uses Monte Carlo simulations to estimate hypervolume, where the accuracy of the estimated hypervolume can be traded off against the available computational resources. Recently, employing the R2 indicator, an improved Many-Objective Metaheuristic Based on the R2 Indicator (MOMBI-II) (Hernández Gómez and Coello Coello, 2015) was proposed to further enhance the efficiency of IBEAs for solving MaOPs.

From the above, we can see that there have been various algorithms developed for solving MaOPs. On the other hand, if we can extract some characteristics from an MaOP and transform it into a new optimization problem, it would be easier for the existing algorithms to obtain the Pareto optimal solutions.

### 2.3 Meta-Objective Methods in Evolutionary Optimization

The meta-objective approach has been used in evolutionary optimization, where a meta-objective is referred to as a new formulation about the original objective(s) or the characteristic extracted from the optimization problem. In the meta-objective approach, a new optimization problem is formed based on the meta-objective(s), which has the same or similar optimal solutions with the original problem. This means we can solve the original problem by solving the new one, especially when we do not have effective means in solving the original one. For example, in constrained evolutionary optimization (Takahama et al., 2005; Wang and Cai, 2012), the solution's degree of constraint violations is regarded as a meta-objective. Optimizing this new objective can lead to feasible solutions to the original constrained optimization problem. For another example in multimodal evolutionary optimization (Wang et al., 2015), the meta-objectives are constructed based on the original objective and the solutions' positions in the variable space. By solving the new multi-objective formed by these meta-objectives, multiple optimal solutions to the original multimodal optimization problem can be found.

Up to now, there exist very few meta methods for many-objective evolutionary optimization. Very recently, a Bi-Goal Evolutionary approach (BiGE) (Li et al., 2015) was proposed for solving MaOPs, in which two meta-objectives are constructed in terms of the convergence and diversity performances of a solution. The meta-objective about convergence is the sum of all objective functions, and the meta-objective about diversity is a share function based on niche technique. The experiments showed that a solution set with a well-balanced performance to the original MaOP can be achieved by optimizing the revised bi-objective optimization problem. However, the necessity of optimizing convergence and diversity as two conflicting objectives should be further investigated. In an ideal situation, only a single solution with both meta-objectives minimized can be obtained. This suggests that the original and new optimization problems do not have the same Pareto optimal set. Therefore, we may not achieve the whole Pareto optimal set of the original optimization problem by optimizing the new optimization problem.

In this study, we propose a novel meta-objective approach for many-objective optimization, where the meta-objectives are also constructed based on convergence and diversity. Different from Li et al. (2015), convergence and diversity are not regarded as conflict objectives, but components in every meta-objective. Each meta-objective has the same convergence component, thus the Pareto optimal solutions to the original optimization problem have great advantages in the meta-objective space. In addition, the diversity components in each meta-objective are different, which leads to conflicts among the meta-objectives. In this way, the new optimization problem has a diverse Pareto optimal set, which is the same as that of the original optimization problem. We will prove this in Subsection 3.1.

### 2.4 Motivation

As mentioned in the last subsection, a meta-objective can be constructed based on the characteristics extracted from the optimization problem. For a multi-objective optimization problem, its characteristics distinguished from a single-objective optimization problem is that not only the convergence but also the diversity performances of solutions should be simultaneously considered by an optimizer. Therefore, constructing meta-objectives in terms of these two performances is considered a natural idea. In this study, we propose a meta-objective approach to convert the given MOP formulated by (1) into a new MOP. When constructing this new MOP, we should make clear that these meta-objectives conflict with each other. If not, the new MOP will be inherently a single-objective optimization problem, and it would be difficult to achieve diverse solutions by solving the new MOP.

Recently, there has been a growing interest in balancing convergence and diversity in multi- and many-objective optimization (Adra and Fleming, 2011; Li et al., 2012, 2015; Yuan et al., 2016; Cheng et al., 2015). Some researchers argue that convergence and diversity are two requirements that conflict with each other. Particularly, these requirements are formulated as a bi-objective optimization problem in Li et al. (2015). In fact, the rationale for balancing the convergence and diversity performances of solutions is that the selection criteria are involved with both performances. For instance, the PBI (Penalty-Boundary Intersection) function in MOEA/D is composed of $d1$ and $d2$, which measure the distances of a solution to the origin (i.e., convergence measure) and to the reference vector (i.e., diversity measure), respectively. Then a solution with a smaller value of $d1$ but a large value of $d2$ might be preferred. Selecting this solution is not helpful to maintaining diversity in a high-dimensional objective space. Therefore, MOEA/D may not perform well in some MaOPs, since it lacks a strategy of balancing $d1$ and $d2$. To overcome this shortcoming in MOEA/D, EFR-RR (Yuan et al., 2016) only considers solutions with the smaller values of $d2$. As mentioned in the last subsection, the meta-objective about convergence in BiGE (Li et al., 2015) is the sum of all the objectives. Since the contour curves of this meta-objective is usually different from the geometric shape of the Pareto optimal front, solely optimizing it results in solutions in specific regions, which means that it is inherently involved with diversity information. This is why the meta-objective about convergence can be traded off with the other meta-objective about diversity. What conflicts with each other is the diversity information in both meta-objectives, but not convergence and diversity.

From the above discussions, we notice that whether or not convergence and diversity conflict with each other depends on how to define them. If solutions located exactly on the Pareto optimal front and uniformly distributed are regarded as having good convergence and diversity performances, then these two performances do not conflict with each other. For example, in Figure 1, when searching for an ideal solution set, improving the convergence or diversity performance of a solution set does not imply that the other performance should be degraded. In Figure 1a, the solutions remain uniformly distributed when approximating to the true Pareto optimal front. Similarly, in Figure 1b, diversifying the archived solution set dose not result in degeneration in convergence performance. Therefore, convergence and diversity are not necessarily balanced, but simultaneously optimized.

There may exist two issues in the above method. The first is that $d(x)$ cannot be calculated since the true Pareto optimal front is actually unknown. To address this issue, in Subsection 3.2, we will give three alternative measures based on the Pareto rank, the $Lp$ scalarizing function, and the double-rank method for estimating the convergence performance of a solution. The first two are feasible but ineffective. Thus we propose the third method by taking the advantages and avoiding the disadvantages of the first two. The other issue is that the new optimization problem has the same number of objectives as the given one, which means that the new problem is also an MaOP. However, this issue is actually trivial. In Subsection 3.5, we will explain the reason why the new optimization problem can be easily solved by a traditional Pareto-based MOEA, even though the number of objectives could be large. Additionally, our experimental results will demonstrate the effectiveness of the proposed method in Sections 4 and 5.

## 3 The Proposed Method

### 3.1 The Meta-Objective Method (MeO) for MOPs

^{6}.

For an intuitive understanding, Figure 2b shows how to map a solution in Figure 2a from the original objective space to the meta-objective space for a bi-objective problem, where $d(x)$ is short for $d(r,f(x))$ and $\theta m,m=1,2$ for $\theta (vm,f(x)-z*)$.

According to the definitions in Subsection 2.1, we have the following two theorems.

Any Pareto optimal solution to the original optimization problem defined as in (1) is a Pareto optimal solution to the new optimization problem as (4).

^{5}:

Since the convergence component, $fC(x*)$, of a Pareto optimal solution to the original optimization problem must be zero, the theorem will be proven if we can show that when $fC(x*)=0$, $x*$ is definitely a Pareto optimal solution to the new optimization problem. If this assertion would not hold, then there exists another solution, $x'$, that dominates $x*$. According to Definition ^{1}, $\u2211m=1Mfm'(x')<\u2211m=1Mfm'(x*)$, i.e., $\eta MfC(x')+\u2211m=1MfmD(x')<\eta MfC(x*)+\u2211m=1MfmD(x*)$. Since $\u2211m=1MfmD(x)=\u2211m=1Mcos2\theta (vm,f(x)-z*)=\u2211m=1M|fm(x)-zm*|2|f(x)-z*|2=1$, then $fC(x')<fC(x*)=0$. This contradicts the fact that $fC(x')\u22650$. Consequently, we can infer that there are no other solutions that dominate $x*$. This means that $x*$ is a Pareto optimal solution to the new optimization problem. We have thus proved the theorem.$\u25a1$

The Pareto optimal front of the new optimization problem is located on a hyperplane formed by $\u2211m=1Mfm'=1$.

^{6}:

According to Theorem ^{5}, for any Pareto optimal solution, $x*$, $fC(x*)=0$, and we get $\u2211m=1MfmD(x*)=1$. This means that $x*$ is on a hyperplane formed by $\u2211m=1Mfm'=1$ in the meta-objective space.$\u25a1$

Theorem ^{5} implies that all Pareto optimal solutions to the original optimization problem can be achieved by successfully solving the new optimization problem. However, there exists a critical issue when using this method. Since the true Pareto optimal front is unavailable in a priori, it is impractical to obtain $d(r,f(x))$ and $z*$. In multi-objective evolutionary optimization, $zm*$ can be well replaced with the known minimum of $fm(x)$ during the evolution. Therefore, how to estimate $d(r,f(x))$ as the convergence component is the major issue. In view of this, we present three alternatives to the convergence component, which will be described in detail in the following subsection.

### 3.2 Alternatives to the Convergence Component

#### 3.2.1 The Pareto Rank Method

#### 3.2.2 The $Lp$ Scalarizing Method

The main advantage of the $Lp$ scalarizing method is that it provides a larger selection pressure than that of the Pareto rank method. However, this method has the following three deficiencies. The first is that the process of converting objectives is not strictly consistent with the Pareto dominance criterion. In other words, a dominated solution to the original optimization problem may become non-dominated under the new optimization problem, and be preferentially selected. The second is that this method is very sensitive to the setting of parameter $\eta $ as defined in (5). Obtaining a proper setting of $\eta $ is non-trivial, if not impossible, especially when the problem has a complex Pareto optimal front. The last but not least is that although Wang, Zhang et al. (2016) presented a strategy of adaptively selecting an appropriate value of $p$ for MOEA/D, one single scalarizing function cannot properly measure the solutions on the whole Pareto front in this study. The bias with respect to its contour curves may result in solutions located in specific regions. An illustrative example is provided in Figure 4a. Solutions $A$–$J$ are candidates for a bi-objective optimization problem with a concave Pareto optimal front. Since the geometric shape of the Pareto optimal front is actually unknown, the $Lp$ scalarizing method with $p=1$ is simply chosen. Then solutions $A$, $B$, $I$, and $J$ will be preferred, since they have smaller values of the $Lp$ scalarizing function values than the rest. Consequently, solutions in the final solution set may be overly crowded in edge regions. However, the double-rank method can overcome this issue (as shown in Figure 4b), which will be presented in the next subsection.

#### 3.2.3 The Double-Rank Method

When employing this method, $\eta $ can simply be set to 1. Only nondominated solutions with the smallest $Lp$ scalarizing function value among their neighbors have zero value of $f^C$ and are termed as the optimal solutions. This method has the following four advantages: (1) the process of converting objective is strictly consistent with the Pareto dominance criterion due to $rank(x)$; (2) it offers a large selection pressure in the high-dimensional objective space due to $rankl(x)$; (3) it is insensitive to the settings of parameters. Although $\theta l$ is an adjustable parameter, it does not need to be tuned in most situations; and (4) the bias with respect to the contour curves of the $Lp$ scalarizing function can be neglected. Please see Figure 4b for an example. Compared to the method in Figure 4a, solutions in the non-edge regions will also be preferred, since some solutions (e.g., $D$, $F$, and $H$) have the smallest values of the $Lp$ scalarizing function in their neighborhoods. Additionally, it can be seen from Figure 4b that if $p$ is set to $\u221e$, $H$ will no longer be assumed as the best solution, since its neighbor, $G$, has a smaller value of the $Lp$ scalarizing function. However, if the population is infinite and the neighborhood size is infinitesimal, this discrepancy resulted from different settings of $p$ will become negligibly small.

The proposed double-rank method inherently utilizes the idea of niching (Liu et al., 2018a; Liu, Yen et al., 2018) and shares some common ideas with constrained decomposition-based methods, e.g., localized weighted sum (Wang et al., 2018) and localized PBI methods (Wang, Ishibuchi et al., 2016), where they all evaluate solutions according to their scalarizing function values in their local areas. The main difference between them is that the double-rank method does not require any reference vector. Another difference is that the double-rank method allows solutions in the same rank to compete on equal terms, which further promotes diversity in the solution set.

### 3.3 Incorporating MeO into MOEAs

MeO can be integrated within any existing MOEAs. Since the generational MOEAs are most commonly seen, we give a flow chart of incorporating MeO into them in Figure 5. We denote an MOEA combined with MeO as MeO+MOEA in this study, for example, MeO+NSGA-II. Specifically, the versions with the Pareto rank method, the $Lp$ scalarizing method, and the double-rank method are denoted as MeO$I$+MOEA, MeO$II$+MOEA, and MeO$III$+MOEA, respectively. The unique difference of MeO+MOEA from the original MOEA is that individuals are evaluated in terms of their meta-objectives, which are shown in the shaded parts of the flow chart. Note that in this study, when employing a normalization procedure to tackling a scaled problem, the original objective values will be normalized according to the maximum and minimum original objective values in the current population before calculating the meta-objective values.

In addition, the computational complexity analysis of MeO+MOEA is given in Section I in the Supplementary Material available at https://www.mitpressjournals.org/doi/suppl/10.1162/evco_a_00243.

### 3.4 Why Does MeO Work for MaOPs?

After the above description of MeO, one may be interested in the reason why MeO+MOEA could be effective for MaOPs when the MOEA is a traditional Pareto-based one and the new problem also has a large number of objectives. We now explain the reasons under the following two situations: (1) selecting solutions with the same value of the convergence component; and (2) selecting solutions with different values of the convergence component.

#### 3.4.1 Situation 1

This usually happens when the Pareto rank method is employed as an alternative to the convergence component. In the original Pareto-based MOEAs, when solutions are in the same Pareto rank, the diversity-based selection criteria, for example, the crowding distance (Deb et al., 2002) or the $k$-nearest neighbor (Zitzler et al., 2001), may prefer the solutions far from the Pareto optimal front since they are located in sparse regions. In other words, the poor convergence performance of a solution is mistaken for its good diversity performance. This is one of reasons why the Pareto-based MOEAs fail in solving MaOPs. In NSGA-III, this issue is solved by a new diversity-based selection criterion. The solutions are mapped into a hyperplane and evaluated by their distances to the predefined reference points. In the proposed method in this study, if different solutions have the same value of the convergence component, they will be located on a hyperplane, however, in the meta-objective space. In this situation, only their diversity performances will be traded off. Therefore, the aforementioned diversity-based selection criteria (Deb et al., 2002; Zitzler et al., 2001) can also be effective. Figure 6 provides an illustrative example, where solutions $A$–$E$ are nondominated solutions. Solution $A$ will be estimated to have a low density value in the original objective space. In contrast, when employing the Pareto rank method as an alternative to the convergence component, all these solutions will be mapped into a hyperplane in the meta-objective space, and solution $A$ will be estimated to have a high density value. Compared to NSGA-III, the main advantage of the proposed method is that any density-based selection criterion can be adopted and no reference points need to be predefined.

#### 3.4.2 Situation 2

This often happens when the $Lp$ scalarizing method or the double-rank method is employed as an alternative to the convergence component. In this situation, solutions with large values of the convergence component will be penalized in the meta-objective space, whilst those with small values of the convergence component will have more chances to become the dominators. This effect is somewhat similar to that of $\epsilon $-dominance which has been proven capable for solving MaOPs (Hadka and Reed, 2012; Yang et al., 2013). When comparing two solutions which do not dominate each other in the objective space, if the difference of the convergence components of the two solutions is larger than a positive constant, i.e., $\epsilon $, the one with a smaller value will dominate the other in the meta-objective space. Figure 7 provides an illustrative example. In Figure 7a, solutions $A$ and $B$ do not dominate each other in the original objective space, and $fC(A)$ and $fC(B)$ are their distances to the Pareto optimal front, respectively. If we only consider the diversity components of solutions $A$ and $B$, we can obtain points $A'$ and $B'$ in Figure 7b. Assuming that $\eta =1$ in this case, after adding the convergence components, i.e., $fC(A)$ and $fC(B)$, they are mapped to $A$ and $B$ in the meta-objective space. In this mapping process, we notice that if the difference of $fC(A)$ and $fC(B)$ is larger than the vertical or horizontal distance (these two distances are the same) between $A'$ and $B'$, $B$ will be located in the domination area of $A$ in the meta-objective space. This means $\epsilon $ is actually the vertical or horizontal distance between $A'$ and $B'$. From Figure 7, we can draw the following observation. For any two solutions $A$ and $B$, if $fC(A)<fC(B)$ and $\eta \xb7|fC(A)-fC(B)|>\epsilon $, solution $A$ will dominate $B$ in the meta-objective space. Consequently, the Pareto dominance becomes much more effective in MaOPs after incorporating MeO. It is worth noting that $A'$ and $B'$ are in the range of $[0,1]$, and then $\epsilon $ is in the range of $(0,1]$. This is the reason why $\eta $ can be set to 1 in the Pareto rank method and the double-rank method. Under this circumstances, if solution $A$ dominates solution $B$ in the original objective space, $\eta \xb7|fC(A)-fC(B)|$ will be definitely larger than $\epsilon $. As a result, solution $A$ will necessarily dominate $B$ in the meta-objective space.

In addition, SDE (Li et al., 2014) is another algorithm that penalizes solutions with poor convergence performances, where the penalization is considered in the density-based selection criterion. In SDE, solutions with poor convergence performances are shifted (mapped) to new locations that close to other solutions, thus they will have larger density values. SDE has been integrated with NSGA-II and SPEA2, respectively. However, it only works well with the latter and is computationally very expensive. On the other hand, both NSGA-II and SPEA2 perform very well after integrating with MeO.

## 4 Investigations on the Effect of MeO on Two Pareto-Based MOEAs

In this section, we validate the performance of MeO by integrating it into two Pareto-based MOEAs, that is, NSGA-II and SPEA2, which results in two new algorithms, denoted as MeO+NSGA-II and MeO+SPEA2, respectively. There are two reasons for choosing these two MOEAs. The first is that both are classic MOEAs and have been applied in a wide range of optimization problems with success. The second is that both are generally believed incapable of solving MaOPs. The experiments in this section are divided into the following three parts. Firstly, we compare MeO$I$+NSGA-II and MeO$I$+SPEA2 with their corresponding original versions in high-dimension DTLZ benchmark problems. Following that, we further compare three different alternatives to the convergence component in the design of MeO+NSGA-II.

### 4.1 Test Problems, Performance Indicators, and Parameter Settings

We choose two widely used test suites, that is, DTLZ (Deb et al., 2005) and WFG (Huband et al., 2006), for empirical comparisons in this study. We consider these scalable problems with 4, 6, 8, and 10 objectives in this article. The number of variables is set to $M+4$ for DTLZ1, $M+19$ for DTLZ7, and $M+9$ for the other DTLZ problems. The distance- and position-related parameters are set to 24 and $M-1$, respectively, for the WFG problems. Please refer to Deb et al. (2005) and Huband et al. (2006) for detailed descriptions of the DTLZ and WFG suites.

In order to compare different algorithms on these test problems, we adopt the Inverted Generation Distance Plus (IGD$+$) (Ishibuchi et al., 2015) as a performance metric which gives a comprehensive quantification of both the convergence and diversity of a solution set. It is weakly Pareto compliant and thus more accurate on evaluation than its original version, IGD. The smaller the value of IGD$+$ of an algorithm, the better the performance of the algorithm. IGD$+$ requires a reference set, whose points are typically uniformly distributed on the true Pareto optimal front of a test problem. In this study, over 5,000 reference points are evenly sampled from the true Pareto optimal front of each test problem. Note that solutions and the reference points will be normalized based on the true Pareto optimal front, when calculating IGD$+$.

The following parameter settings are adopted by all the compared algorithms. Simulated binary crossover and polynomial mutation are applied as the crossover and mutation operators, respectively, where both distribution indexes are set to 20. The crossover and mutation probabilities are 1.0 and $1/n$, respectively, where $n$ is the number of decision variables. The termination criterion is that the population has evolved for the predefined maximum number of generations. For DTLZ1, DTLZ3, DTLZ6, and WFG1, the maximum number of generations is set to 1,000, while 300 is used for the rest test functions. The setting of termination criterion is the same as those in the recent studies (Li et al., 2015; Yang et al., 2013). The population size, $N$, is set to 120, 132, 156, and 275, when $M$ is 4, 6, 8, and 10, respectively. All these parameter settings are the same as those in Section 5 for a fair comparison. All the experimental results in this article are obtained by executing 50 independent runs of each algorithm on each optimization problem. The Wilcoxon's rank sum test is employed to determine whether one algorithm shows a statistically significant difference with the other, and the null hypothesis is rejected at a significant level of 0.05.

### 4.2 Comparisons among NSGA-II, SPEA2, MeO$I$+NSGA-II, and MeO$I$+SPEA2

In this subsection, we apply NSGA-II, SPEA2 and their new versions (i.e., MeO$I$+NSGA-II and MeO$I$+SPEA2 with the Pareto rank method as the convergence measure) to solve the DTLZ problems. Table 1 shows the mean values of IGD$+$ obtained by all competing algorithms on the DTLZ problems. The results in boldface indicate that MeO$I$+NSGA-II (MeO$I$+SPEA2) performs significantly better its original version. In Table 1, the performance score (Bader and Zitzler, 2011) of each algorithm on each test problem is given in gray scale after the mean IGD$+$ value. A darker tone corresponds to a larger performance score. For a test problem, the performance score of an algorithm is the number of the competing algorithms which perform significantly worse than it according to IGD$+$. Moreover, we give the average performance score (APS) of each algorithms on all the test problems at the bottom of Table 1. Note that in this article, notation DTLZ$i$-$M$ refers to DTLZ$i$ (i.e., $i\u22081,2,3,4,5,6,7$) with $M$ objectives (i.e., $M\u22084,6,8,10$), and WFG$i$-$M$ has a similar definition.

As can be seen from Table 1, the performances of both NSGA-II and SPEA2 are significantly boosted on most problems when integrating with MeO. These results show the effectiveness of MeO when solving MaOPs. However, MeO$I$+SPEA2 does not show significantly better performance than SPEA2 on some 4- and 6-objective test problems, for example, DTLZ1-4 and -6, DTLZ5-4, DTLZ6-4, and DLTZ7-4, and -6. In addition, neither MeO$I$+NSGA-II nor MeO$I$+SPEA2 performs better than its original version on some DTLZ5 and DTLZ6 test problems. The most possible reason is that the Pareto rank method does not provide a great selection pressure. The Pareto fronts of DTLZ5 and DTLZ6 only have two dimensions. Therefore, when solving DTLZ5 and DTLZ6 with any number of objectives and some other test problems with a relatively smaller number of objectives (e.g., 4 or 6), MeO$I$+NSGA-II and MeO$I$+SPEA2 do not show more evident strength than their original versions.

Since MeO$I$+SPEA2 not only performs worse than MeO$I$+NSGA-II according to APS but also is extraordinarily time-consuming (see Table 1 in the Supplementary Material), we further investigate the performance of MeO+NSGA-II with different convergence components in the next subsection.

### 4.3 Comparisons among MeO$I$+NSGA-II, MeO$II$+NSGA-II, MeO$III$+NSGA-II, and MeO$*$+NSGA-II

In this subsection, we investigate the performance of MeO+NSGA-II with three different alternatives to the convergence component. We employ two settings of $p$ (1 and 2) in MeO$II$+NSGA-II and MeO$III$+NSGA-II to observe their different behaviors. We also include the results obtained by the method that calculates the convergence component precisely (denoted as MeO$*$+NSGA-II), since the true Pareto fronts of DTLZ problems are known. Note that $\eta $ is set to 1 in all the designs.

Table 1 lists the mean values of IGD$+$ obtained by different algorithms. According to Table 1, we can make the following observations: (1) MeO$III$+NSGA-II can achieve satisfactory results on most problems. There is hardly any significant difference between the results obtained by MeO$III$+NSGA-II with $p=1$ and $p=2$, which implies that the performance of MeO$III$ is not sensitive to the parameter $p$. (2) MeO$II$+NSGA-II with $p=1$ performs significantly better on DTLZ1 than that with $p=2$. This is attributed to the fact that the geometrical shape of the Pareto optimal front of DTLZ1 perfectly fits the contour curves of the $Lp$ scalarizing function with $p=1$. Generally, MeO$II$+NSGA-II with $p=2$ show better performance than that with $p=1$ according to APS. (3) MeO$II$+NSGA-II does not work as well as MeO$III$+NSGA-II on most problems. The reason is that $\eta $ is not tuned for each problem. However, MeO$II$+NSGA-II with $p=2$ performs very well on DTLZ4 and DTLZ5. (4) MeO$*$+NSGA-II achieve the best APS on all the problems. However, it is not as good as MeO$III$+NSGA-II on some test problems. One reason is that it behaves like MeO$II$+NSGA-II to some degree and thus is relatively sensitive to the setting of parameter $\eta $. Another reason is that the population could be far away from the true Pareto front at the beginning of evolution. Precisely calculating the convergence component based on the true Pareto front may result in the population converging into a small region, which degenerates the diversity performance.

In addition, we show the Pareto fronts achieved by NSGA-II and MeO$III$+NSGA-II on some 3-objective DTLZ test problems in Figure 1 in the Supplementary Material for a visual understanding of MeO$III$+NSGA-II's performance.

## 5 Comparisons with the State-of-the-Art Algorithms

In this section, we compare MeO$III$+NSGA-II with $p=2$ with seven state-of-the-art MaOEAs, that is, BiGE (Li et al., 2015), SDE (Li et al., 2014), NSGA-III (Deb and Jain, 2013), EFR-RR (Yuan et al., 2016), MOMBI-II (Hernández Gómez and Coello Coello, 2015), GrEA (Yang et al., 2013), and KnEA (Zhang et al., 2015) on DTLZ and WFG suites with 4-, 6-, 8-, and 10-objective.

### 5.1 Competing Algorithms

In this subsection, we briefly introduce the seven chosen algorithms. These algorithms cover all the main categories mentioned in Section 1 for many-objective optimization. All the competing algorithms as well as MeO are implemented by PlatEMO (Tian et al., 2017). The source code of MeO is available at https://github.com/yiping0liu.

(1) BiGE is a known meta-objective method designed for many-objective optimization. It simultaneously considers the convergence and diversity performances of a solution. Different from MeO proposed in this article, BiGE transforms an MaOP into a bi-objective optimization problem. A solution set with a good performance can be achieved by optimizing a bi-objective optimization problem.

(2) SDE is a recently proposed method that modifies the density estimation strategy in traditional Pareto-based MOEAs. By shifting the positions of individuals in sparse regions, individuals with poor convergence are penalized to have a high density value, and then discarded during the evolution. In this method, the version that integrates SDE into SPEA2 (denoted as SPEA2+SDE) is employed for comparisons due to its better performance compared with other variants.

(3) NSGA-III is advanced from a popular Pareto dominance-based MOEA, NSGA-II, for handling MaOPs. NSGA-III uses a reference-point-based selection criterion instead of a density-based counterpart (i.e., the crowding distance) in NSGA-II. When employing the Pareto rank method in MeO, both NSGA-III and MeO+MOEA map nondominated solutions into a hyperplane, whereas no reference points need to be defined in the latter.

(4) EFR-RR is based on the framework of NSGA-II. However, it is inherently a decomposition-based method like MOEA/D. EFR-RR also concentrates in balancing convergence and diversity for MaOPs. It first considers the diversity performance of a solution. Only when the perpendicular distance from the solution to the reference vector is small, will its scalarizing function value be calculated. This idea has also been applied to MOEA/D, termed MOEA/D-DU. Since EFR-RR performs slightly better than MOEA/D-DU according to the original study, we choose the former for comparisons.

(5) MOMBI-II is an indicator-based algorithm that uses the R2 indicator to guide the search. The R2 indicator is an attractive alternative to hypervolume, due to its low computational cost and weak-Pareto compatibility. MOMBI-II takes two key aspects into account, that is, using the scalarizing function and statistical information about the population's convergence to select individuals.

(6) GrEA exploits the potential of the grid-based approach to deal with MaOPs. The grid dominance and the grid difference are introduced to determine the relationship between individuals in a grid environment. Three grid-based criteria are incorporated into calculating the fitness of an individual to distinguish individuals in both mating and environmental selection processes. Moreover, a fitness adjustment strategy is developed to avoid partial overcrowding as well as to guide the search towards various directions in the archive.

(7) KnEA is established under the framework of NSGA-II, and it gives a priority to the knee points among nondominated solutions. To identify local knee points, a niche method based on hyperboxes is employed, in which the niche size can be adaptively tuned in terms of information from the population at the previous generations.

### 5.2 Extra Parameter Settings

In this subsection, we give the settings of extra parameters used in the above algorithms. For the common parameter settings, their settings are the same as those in Subsection 4.1.

To avoid the situation in which all reference points are located on the boundary of the Pareto optimal front for problems with a large number of objectives, the strategy of two-layered reference points (vectors) is used for NSGA-III, EFR-RR, and MOMBI-II. As a result, the population size of NSGA-III and EFR-RR cannot be arbitrarily specified. We set the population size (i.e., $N$) of all algorithms to be the same value for a fair comparison. The settings of parameters for controlling the number of reference points are listed in Table 2.

$M$ . | $p1$ . | $p2$ . | $N$ . |
---|---|---|---|

4 | 7 | 0 | 120 |

6 | 4 | 1 | 132 |

8 | 3 | 2 | 156 |

10 | 3 | 2 | 275 |

$M$ . | $p1$ . | $p2$ . | $N$ . |
---|---|---|---|

4 | 7 | 0 | 120 |

6 | 4 | 1 | 132 |

8 | 3 | 2 | 156 |

10 | 3 | 2 | 275 |

As suggested in the original study of EFR-RR, the Chebychev approach is chosen as the scalarizing method, the neighborhood size is set to $0.1N$, and $K$ is set to 2. Since the number of objectives and the population size are different from those of the original KnEA and GrEA, we adjust the threshold, $T$, in KnEA and the grid division, $div$, in GrEA according to the guidelines provided in the original studies to achieve the best performances of these algorithms. The settings of $T$ and $div$ for the DTLZ and WFG problems are listed in Table 3 in the Supplementary Material.

### 5.3 Results and Discussions

Table 3 shows the mean values of IGD$+$ and the performance scores obtained by different algorithms on the DTLZ and WFG problems. A darker tone corresponds to a larger performance score. The average performance score (APS) of each algorithm on all the test problems is at the bottom of Table 3.

From Table 3 we can draw the following conclusions.

MeO$III$+NSGA-II shows a competitive performance on most problems. It achieves the best APS among the eight competitors. It significantly outperforms the other algorithms on more test problems. However, it performs relatively poorly on some test problems, for example, WFG2 and WFG3, since both the setting of $p$ and $\theta l$ are not perfect for these problems. This suggests that if developing a method to better estimate the convergence component, the performance of MeO+MOEA can be further enhanced. One promising way is to adaptively tune $p$ and $\theta l$ for a given optimization problem. However, this is left for future study.

BiGE shows appealing results on some test problems, such as DTLZ5, WFG2, and WFG3, although it does not achieve any best result among the algorithms considered. BiGE works poorly on DTLZ1, DTLZ3, and DTLZ7. The reason is that the normalization procedure results in an improper niche size for these problems.

SPEA2+SDE shows a competitive performance on DTLZ1, DTLZ5, DTLZ6, DTLZ7, and WFG1. However, SPEA2+SDE is a very time-consuming design, whose runtime is no less than the original SPEA2. Please see Table 1 in the Supplementary Material for an elaborated discussion. Although SDE can also be combined with NSGA-II, which is computationally efficient, the performance of NSGA-II+SDE is much worse. Please refer to Li et al. (2014) for more details.

NSGA-III works quite well on most test problems. However, its performance scales poorly with the increasing number of objectives on DTLZ5, DTLZ6, WFG1, and WFG3. This is attributed to the fact that the reference vectors employed in NSGA-III are uniformly distributed in the whole objective space, whereas the Pareto optimal solutions (or nondominated solutions during the evolution) of these problems are not.

EFR-RR can effectively deal with most problems. Although it has the same issue as NSGA-III, it generally performs better than for NSGA-III on DTLZ5, DTLZ6, WFG1, and WFG3. The performance of both EFR-RR and NSGA-III can be further improved by integrating a strategy for adaptively generating the reference vectors (Qi et al., 2014).

MOMBI-II performs poorly on some scaled test problems, for example, DTLZ7, and WFG4 to 9, since it does not employ a normalization operator. In addition, it adopts the R2 indicator based on the reference vectors, which results in a low time consumption but shares the similar disadvantages as those in the decomposition-based algorithms.

GrEA achieves encouraging results on most problems. However, it is very sensitive to the choice of parameter $div$. In our experiments, we test a number of settings of $div$ for each instance to make sure that GrEA can obtain a satisfactory performance. If a method of adaptively adjusting $div$ can be further developed, the performance of GrEA will be considerably enhanced on various problems.

KnEA has an advantage of approximating the true Pareto optimal front by giving preferences to the knee points among nondominated solutions. It performs very well on DTLZ7, WFG1, and WFG2. Similar to GrEA, KnEA needs a proper setting of the parameter, $T$. A smaller value of $T$ can lead to more solutions with good convergence.

To sum up, MeO$III$+NSGA-II is very competitive compared with the state-of-the-art MOEAs. Unlike KnEA and GrEA, MeO$III$+NSGA-II is relatively insensitive to its design parameter, $\eta $. Additionally, different from NSGA-III, EFR-RR, and MOMBI-II, it does not require any reference vectors. Finally, it is computationally much cheaper than SPEA2+SDE.

In addition, we show the results of hypervolume obtained by these algorithms on DTLZ and WFG test problems in Table 2 in the Supplementary Material. Readers can refer to them if interested.

## 6 Conclusion

In this article, we have proposed a novel meta-objective method for many-objective evolutionary optimization, which converts a given MOP into a new MOP. Each meta-objective in the new MOP contains a unique diversity component that represents the position of a solution according to a coordinate axis in the original objective space, which results in the conflict among the meta-objectives. Furthermore, all the meta-objectives contain the same convergence component that measures the distance between the solution and the Pareto optimal front. A well-converged and distributed solution set can be obtained by optimizing the new MOP, even if the number of objectives is large. This method provides a new way to manage the requirements of convergence and diversity in many-objective optimization.

Since the Pareto optimal front is actually unknown, we have proposed three alternatives to the convergence component based on the Pareto rank method, the $Lp$ scalarizing method, and the double-rank method, respectively. The first is effective in solving MaOPs but occasionally has a low efficiency for some optimization problems. The second provides a needed selection pressure but is very sensitive to the settings of parameters. The third is recommended by taking the advantages of the above two methods while avoiding their disadvantages at the same time.

In this study, we combine MeO with two classic Pareto-based MOEAs, that is, NSGA-II and SPEA2, which results in two new algorithms, denoted as MeO+NSGA-II and MeO+SPEA2. The performances of the two new algorithms are significantly better than the original counterparts when solving MaOPs. Additionally, MeO+NSGA-II performs slightly better than MeO+SPEA2 and is computationally much cheaper.

To demonstrate the effectiveness of MeO, we test MeO+NSGA-II with the double-rank method on the DTLZ and WFG problems in comparison with seven state-of-the-art designs, namely, BiGE, SDE, NSGA-III, EFR-RR, MOMBI-II, GrEA, and KnEA. The experimental results demonstrate that MeO+NSGA-II is very competitive among the chosen algorithms.

To further promote the performance of MeO+MOEA, better estimating the convergence component is a must. Therefore, our future work includes the development of a strategy of adaptively tuning the parameters, $\theta l$ and $p$, in each neighborhood for the double-rank method. In addition, combining MeO with other MOEAs, for example, indicator- and decomposition-based algorithms, and investigating the behavior of the new corresponding algorithms in many-objective optimization are of great interest.

## Acknowledgments

This work was jointly supported by National Natural Science Foundation of China with grant No. 61803192, 61876075, 61773384, 61763026, 61673404, 61573361, and 61503220, National Basic Research Program of China (973 Program) with grant No. 2014CB046306-2, National Key R&D Program of China with grant No. 2018YFB1003802-01, and China Scholarship Council with grant No. 201606420005.