## Abstract

The quality of solution sets generated by decomposition-based evolutionary multi-objective optimisation (EMO) algorithms depends heavily on the consistency between a given problem's Pareto front shape and the specified weights' distribution. A set of weights distributed uniformly in a simplex often leads to a set of well-distributed solutions on a Pareto front with a simplex-like shape, but may fail on other Pareto front shapes. It is an open problem on how to specify a set of appropriate weights without the information of the problem's Pareto front beforehand. In this article, we propose an approach to adapt weights during the evolutionary process (called AdaW). AdaW progressively seeks a suitable distribution of weights for the given problem by elaborating several key parts in weight adaptation—weight generation, weight addition, weight deletion, and weight update frequency. Experimental results have shown the effectiveness of the proposed approach. AdaW works well for Pareto fronts with very different shapes: 1) the simplex-like, 2) the inverted simplex-like, 3) the highly nonlinear, 4) the disconnect, 5) the degenerate, 6) the scaled, and 7) the high-dimensional.

## 1 Introduction

Decomposition-based evolutionary multiobjective optimisation (EMO), crystallised in Zhang and Li (2007), is a general-purpose algorithm framework (termed as MOEA/D). It decomposes a multiobjective optimisation problem (MOP) into a number of single-objective (or multiobjective; Liu et al., 2014) optimisation subproblems on the basis of a set of weights (or called weight vectors) and then uses a search heuristic to optimise these subproblems simultaneously and cooperatively. Compared with conventional Pareto-based EMO, decomposition-based EMO has clear strengths, for example, providing high selection pressure toward the Pareto front (Hughes, 2005; Li, Zhang et al., 2014), being easy to work with local search operators (Ishibuchi et al., 2003; Martínez and Coello, 2012; Derbel et al., 2016), owning high search ability for combinatorial MOPs (Ishibuchi and Murata, 1998; Mei et al., 2011; Shim et al., 2012; Almeida et al., 2012; Cai et al., 2015), and being capable of dealing with MOPs with many objectives (Asafuddoula, Singh et al., 2015; Li, Deb et al., 2015; Yuan et al., 2016) and MOPs with a complicated Pareto set (Li and Zhang, 2009; Liu et al., 2014; Zhou and Zhang, 2016).

A key feature in MOEA/D is that the diversity of the evolutionary population is controlled explicitly by a set of weights (or a set of reference directions/points determined by this weight set). Each weight corresponds to one subproblem, ideally associated with one solution in the population; thus, diverse weights may lead to different Pareto optimal solutions. In general, a well-distributed solution set can be obtained if the set of weights and the Pareto front of a given problem share the same/similar distribution shape. In many existing studies, the weights are predefined and distributed uniformly in a unit simplex. This specification can make decomposition-based algorithms well-suited to MOPs with a “regular” (i.e., simplex-like) Pareto front, e.g., a triangle plane or a sphere. Figure 1a shows such an example, where a set of uniformly distributed weights corresponds to a set of uniformly distributed Pareto optimal solutions.

However, when the shape of an MOP's Pareto front is far from the standard simplex, a set of uniformly distributed weights may not result in a uniform distribution of Pareto optimal solutions. On MOPs with an “irregular” Pareto front (e.g., disconnected, degenerate, inverted simplex-like, or scaled), decomposition-based algorithms appear to struggle (Qi et al., 2014; Li et al., 2016; Ishibuchi, Setoguchi et al., 2017; Li et al., 2018). In such MOPs, some weights may have no intersection with the Pareto front. This could lead to several weights corresponding to one Pareto optimal solution. In addition, there may exist a big difference of distance between adjacent Pareto optimal solutions (obtained by adjacent weights) in different parts of the Pareto front. This is due to the inconsistency between the shape of the Pareto front and the shape of the weight distribution. Overall, no intersection between some weights and the Pareto front may cause the number of obtained Pareto optimal solutions to be smaller than that of weights, while the distance difference with adjacent solutions in different parts of the Pareto front can result in a non-uniform distribution of solutions.

Figure 1b gives an example that a set of Pareto optimal solutions is obtained by a set of uniformly distributed weights on an “irregular” Pareto front. As can be seen, weights $w3$ and $w4$ have no intersection with the Pareto front, and weights $w4$ and $w5$ correspond to only one Pareto optimal solution ($s5$). In addition, the obtained Pareto optimal solutions are far from being uniformly distributed, for example, the distance between $s1$ and $s2$ being considerably greater than that between other adjacent solutions.

The above example illustrates the difficulties of predefining weights in MOEA/D. It could be very challenging (or even impossible) to find a set of optimal weights beforehand for any MOP, especially in real-world scenarios where the information of a problem's Pareto front is often unknown.

A potential solution to this problem is to seek adaptation approaches that can progressively modify the weights during the evolutionary process. Several interesting attempts have been made along this line (Trivedi et al., 2017). A detailed review of these adaptation approaches will be presented in the next section.

Despite the potential advantages of these adaptation approaches for “irregular” Pareto fronts, the problem is far from being fully resolved. On one hand, varying the weights which are pre-set and ideal for problems with “regular” Pareto fronts may compromise the performance of an algorithm on these problems themselves. On the other hand, varying the weights materially changes the subproblems over the course of the optimisation, which could significantly deteriorate the convergence of the algorithm (Giagkiozis et al., 2013b). Overall, as pointed out in Li, Ding et al. (2015); Ishibuchi, Setoguchi et al. (2017), how to set the weights is still an open question; the need for effective methods is pressing.

In this article, we present an adaptation method (called AdaW) to progressively adjust the weights during the evolutionary process. AdaW updates the weights periodically based on the information produced by the evolving population itself, and then in turn guides the population by these weights which are of a suitable distribution for the given problem. AdaW focuses on several key parts in the weight adaptation process, namely, weight generation, weight addition, weight deletion, and weight update frequency. The main contributions of this work can be summarised as follows:

An approach to find out potential undeveloped, but promising weights is presented, with the aid of a well-maintained archive set.

An approach to delete unpromising weights is presented, taking into account both the number of solutions associated with the weight and the crowding degree in the space.

A design to make several parts in the weight adaptation process cohere as a whole is developed, enabling the algorithm to strike a balance between convergence and diversity on various problems.

## 2 Related Work

A basic assumption in MOEA/D is that the diversity of the weights will result in the diversity of the Pareto optimal solutions. This motivates several studies on how to generate a set of uniformly distributed weights (see Trivedi et al., 2017), such as the simplex-lattice design (Das and Dennis, 1998), two-layer simplex lattice design (Deb and Jain, 2014), multilayer simplex lattice design (Jiang and Yang, 2017), uniform design (Tan et al., 2013), and a combination of the simplex-lattice design and uniform design (Ma et al., 2014). A weakness of such systematic weight generators is that the number of generated weights is not flexible. This contrasts with the uniform random sampling method (Murata et al., 2001; Jaszkiewicz, 2002) which can generate an arbitrary number of weights for any dimension. In addition, some work has shown that if the geometry of the problem is known in priori then the optimal distribution of the weights for a specific scalarising function can be readily identified (Giagkiozis et al., 2013a; Wang, Zhang, and Zhang, 2016).

The variety of weight generators gives us ample options in initialising the weights, each of which provides an explicit way of specifying a set of particular search directions in decomposition-based optimisation. However, the precondition that these weight generators work well is the Pareto front of the problem sharing the simplex-like regular shape. An “irregular” Pareto front (e.g., disconnected, degenerate, inverted simplex-like, and scaled) may make these weight generators struggle, in which multiple weights may correspond to one single point. This leads to a waste of computational resources, and more importantly renders the algorithm's performance inferior.

An intuitive solution to this problem is to adaptively update the weights during the optimisation process. Several interesting attempts have been made along this line. Table 1, on the basis of some previous studies (Asafuddoula et al., 2018), summarises existing works of considering weight adaptation in decomposition-based EMO. These works have represented significant progress made in weight adaptation, with various features incorporated (see Table 1) and clear advantage exhibited (over the weight pre-setting method) on many irregular Pareto fronts. In addition, it is worth mentioning that some researchers also introduced weights into non-decomposition-based EMO, and conducted weight adaptation for Pareto-based search (Wang et al., 2013, 2015) and indicator-based search (Tian et al., 2018). Some other researchers adaptively adjusted the search directions according to the distribution of the evolutionary population, which in a sense can also be seen as a weight adaptation (Xiang, Zhou et al., 2017; Xiang, Peng et al., 2017).

Method . | Features/characteristics, strengths . | Weaknesses/limitations . |
---|---|---|

Random weights (Ishibuchi and Murata, 1998; Jin et al., 2001; Jaszkiewicz, 2002) | More computational cost allocated around the nondominated solutions; helpful for irregular Pareto fronts (Li, Ding et al., 2015). | Unable to maintain solutions' uniformity; poor convergence. |

EMOSA (Li and Landa-Silva, 2011) | Adapting the weights to diversify the population towards the unexplored parts of the Pareto front. | Systematic weight generation for adaptation, thus likely to struggle on irregular Pareto fronts. |

$pa\lambda $-MOEA/D (Jiang et al., 2011) | Sampling the regression curve of the weights on the basis of an external archive. | Assuming symmetry and continuity of the Pareto front (Asafuddoula et al., 2018). |

DMOEA/D (Gu et al., 2012) | Equidistant interpolation to update the weights; working well on bi-objective problems. | Hard to maintain uniformity of solutions on problems with more than two objectives. |

A-NSGA-III (Jain and Deb, 2014) | An ($m-1$)-simplex of reference points centered around a crowded reference point being added. | Systematic weight generation for adaptation; struggling on disconnected and degenerate Pareto fronts. |

MOEA/D-AWA (Qi et al., 2014) | Generating new weights by particular solutions in the archive; able to tackle problems with “sharp peak” and “low tail.” | Updating weights in the ending stage of evolution; not performing very well in uniformity maintenance. |

RVEA (Cheng et al., 2015, 2016) | Two weight adaptations being conducted to deal with scaled Pareto fronts, and irregular fronts, respectively. | Failing to maintain diversity of solutions on highly nonlinear Pareto fronts. |

MOEA/D-AM2M (Liu et al., 2016, 2018) | An adaptive weight update for MOEA/D-M2M (Liu et al., 2014) for degenerate Pareto fronts according to the angle among solutions (as a similarity measure). | Less global competition in finding multiple regions on the Pareto front (Liu et al., 2018). |

SOM for weight adaptation (Gu and Cheung, 2018) | Updating weights via training a self-organising map (SOM); effective for degenerate Pareto fronts. | Still difficult to maintain a good uniformity of solutions through limited training vectors. |

MOEA/D-ABD (Zhang et al., 2018) | Linear interpolation to update weights for discontinuous Pareto fronts. | Restricted to bi-objective problems. |

MOEA/D-MR (Wang, Zhang et al., 2017) | Updating weights via considering both ideal and nadir points. | Potentially struggling on problems with irregular Pareto fronts, for example, degenerate and discontinuous. |

MaOEA/D-2ADV (Cai et al., 2018) | Two types of weight adjustments: one for the number of weights and the other for the position of weights; achieving a good balance between convergence and diversity. | Designed for many-objective problems; only being evaluated on the DTLZ test problems most of which have regular Pareto fronts. |

g-DBEA (Asafuddoula et al., 2018) | An adaptive weight update for DBEA (Asafuddoula, Ray et al., 2015) via a “learning period”; storing the removed weights for the future use; capable of obtaining diversified solutions over the front. | Not performing very well in maintaining uniformity of solutions. |

Method . | Features/characteristics, strengths . | Weaknesses/limitations . |
---|---|---|

Random weights (Ishibuchi and Murata, 1998; Jin et al., 2001; Jaszkiewicz, 2002) | More computational cost allocated around the nondominated solutions; helpful for irregular Pareto fronts (Li, Ding et al., 2015). | Unable to maintain solutions' uniformity; poor convergence. |

EMOSA (Li and Landa-Silva, 2011) | Adapting the weights to diversify the population towards the unexplored parts of the Pareto front. | Systematic weight generation for adaptation, thus likely to struggle on irregular Pareto fronts. |

$pa\lambda $-MOEA/D (Jiang et al., 2011) | Sampling the regression curve of the weights on the basis of an external archive. | Assuming symmetry and continuity of the Pareto front (Asafuddoula et al., 2018). |

DMOEA/D (Gu et al., 2012) | Equidistant interpolation to update the weights; working well on bi-objective problems. | Hard to maintain uniformity of solutions on problems with more than two objectives. |

A-NSGA-III (Jain and Deb, 2014) | An ($m-1$)-simplex of reference points centered around a crowded reference point being added. | Systematic weight generation for adaptation; struggling on disconnected and degenerate Pareto fronts. |

MOEA/D-AWA (Qi et al., 2014) | Generating new weights by particular solutions in the archive; able to tackle problems with “sharp peak” and “low tail.” | Updating weights in the ending stage of evolution; not performing very well in uniformity maintenance. |

RVEA (Cheng et al., 2015, 2016) | Two weight adaptations being conducted to deal with scaled Pareto fronts, and irregular fronts, respectively. | Failing to maintain diversity of solutions on highly nonlinear Pareto fronts. |

MOEA/D-AM2M (Liu et al., 2016, 2018) | An adaptive weight update for MOEA/D-M2M (Liu et al., 2014) for degenerate Pareto fronts according to the angle among solutions (as a similarity measure). | Less global competition in finding multiple regions on the Pareto front (Liu et al., 2018). |

SOM for weight adaptation (Gu and Cheung, 2018) | Updating weights via training a self-organising map (SOM); effective for degenerate Pareto fronts. | Still difficult to maintain a good uniformity of solutions through limited training vectors. |

MOEA/D-ABD (Zhang et al., 2018) | Linear interpolation to update weights for discontinuous Pareto fronts. | Restricted to bi-objective problems. |

MOEA/D-MR (Wang, Zhang et al., 2017) | Updating weights via considering both ideal and nadir points. | Potentially struggling on problems with irregular Pareto fronts, for example, degenerate and discontinuous. |

MaOEA/D-2ADV (Cai et al., 2018) | Two types of weight adjustments: one for the number of weights and the other for the position of weights; achieving a good balance between convergence and diversity. | Designed for many-objective problems; only being evaluated on the DTLZ test problems most of which have regular Pareto fronts. |

g-DBEA (Asafuddoula et al., 2018) | An adaptive weight update for DBEA (Asafuddoula, Ray et al., 2015) via a “learning period”; storing the removed weights for the future use; capable of obtaining diversified solutions over the front. | Not performing very well in maintaining uniformity of solutions. |

However, it is still far from weight adaptation being a mature method in the sense that it is able to deal with a wide variety of MOPs as effectively as the weight pre-setting method dealing with simplex-like Pareto fronts. Some challenges and limitations are outlined as follows:

Difficulties in obtaining and generating new weights; that is, where to find promising weights. Many adaptation methods introduce new weights from a set of systematically generated weights, such as EMOSA (Li and Landa-Silva, 2011), A-NSGA-III (Jain and Deb, 2014), and RVEA (Cheng et al., 2016). However, such weights may still have no intersection with the problem's Pareto front, thus failing to guarantee the uniformity of the final solution set. Determining new weights by promising solutions produced previously during the evolutionary search may alleviate this issue. Yet this requires an additional archive to store these solutions, and how to maintain the archive is of high importance as it essentially determines the search directions of decomposition-based evolution.

Difficulties of deleting and adding weights. It is not easy to find current unpromising weights. Even when each weight is solely associated with one individual, the population may still not have good uniformity, for example, in a highly convex Pareto front. In addition, where to add weights is also a trick question, as we do not know if the sparsely distributed weights represent undeveloped regions that need to be explored or unpromising regions that have no intersection with the Pareto front.

Difficulties of setting weight update frequency. Varying the weights essentially changes the subproblems. If the update is not frequent enough, there are lots of computational resources wasted on unpromising search directions (Asafuddoula et al., 2018). If the update is too frequent, individuals are likely to

*wander*around the search space, which significantly affects the convergence of the algorithm (Giagkiozis et al., 2013b; Qi et al., 2014). Especially when the algorithm approaches the end of the search process, a change of weights may lead to the algorithm ending up returning a well-diversified but poorly converged population.Challenges to adapt weights for different Pareto fronts. Many adaptation methods are designed or suitable for certain types of Pareto fronts; for example, $pa\lambda $-MOEA/D (Jiang et al., 2011) for connected Pareto fronts, DMOEA/D (Gu et al., 2012) and MOEA/D-ABD (Zhang et al., 2018) for bi-objective MOPs, and MOEA/D-AM2M (Liu et al., 2018) and SOM-based weight adaptation (Gu and Cheung, 2018) for degenerate Pareto fronts. In addition, some methods designed for many-objective problems may not perform very well on problems with two or three objectives, especially in maintaining uniformity of solutions (Cai et al., 2018; Asafuddoula et al., 2018).

The above discussions motivate our work. In this article, we propose a weight adaptation method via elaborating several key parts in weight adaptation—weight generation, weight deletion, weight addition, and weight update frequency. Our goal is to present a decomposition-based algorithm which is able to handle various Pareto fronts, the regular and irregular.

## 3 The Proposed Algorithm

### 3.1 Basic Idea

When optimising an MOP, the current nondominated solutions (i.e., the best solutions found so far) during the evolutionary process can indicate the evolutionary status (Li, Yang et al., 2014; Liu et al., 2016, 2018). The nondominated solution set, with the progress of the evolution, gradually approximates the Pareto front, thus being likely to reflect the shape of the Pareto front when it is well maintained. Although such a set probably evolves slowly in comparison with the evolutionary population which is driven by the scalarising function in decomposition-based evolution, the set may be able to provide new search directions that are unexplored by the scalarising function-driven population.

Figure 2 gives an illustration of updating the search directions (weights) of the population by the aid of a well-maintained archive set of nondominated solutions. As can be seen, before the update a set of uniformly distributed weights correspond to a poorly distributed population along the Pareto front. After the update, the two solutions from the archive ($a3$ and $a7$) whose areas are not explored well in the population are added (Figure 2c) and their corresponding weights are considered as new search directions to guide the evolution ($w7$ and $w8$). In contrast, the weights that are associated with crowded solutions ($s3$ and $s4$) in the population are deleted. Then, a new population is formed with unevenly distributed weights but well-distributed solutions.

The above is the basic idea of the weight adaptation in our proposed work. However, materialising it requires a proper handling of several important issues. They are as follows:

How to maintain the archive?

Which solutions from the archive should enter the evolutionary population to generate new weights?

How to generate weights on the basis of these newly entered solutions?

Which old weights in the population should be deleted?

What is the frequency of updating the weights? That is, how long should we allow the population to evolve by the current weights?

In next several subsections, we will describe in sequence how we handle these issues, followed by the main framework of the algorithm.

### 3.2 Archive Maintenance

It is worth mentioning that there are two slight differences of the settings from that in Li et al. (2016). First, parameter $k$ for the $k$th nearest neighbour was set to 3 in Li et al. (2016), while here $k$ is set to the number of objectives of the problem. There are two reasons for this change. The first is that as shown in Li et al. (2016), $k$ is not very sensitive to the performance of the maintenance operation; it works fairly well within a big range like $[2,10]$. The second reason is that a larger $k$ may be more suitable for many-objective optimisation as it can lead to more emphases put on boundary solutions, which are important points to tell problem characteristics. The second difference is that the median, instead of the average in Li et al. (2016), of the distances from all the solutions to their $k$th nearest neighbour is considered. This could alleviate the effect of the dominance resistant solutions (DRS), that is, the solutions with a quite poor value in some objectives but with (near) optimal values in some other objectives (Ikeda et al., 2001).

### 3.3 Weight Addition

In AdaW, we aim to add weights (into the evolutionary population) whose search directions/areas are *undeveloped* and *promising*. Both criteria are measured by contrasting the evolutionary population with the archive set. For the former, if the niche in which a solution of the archive is located has no solution in the evolutionary population, it is likely that niche is undeveloped. For the latter, if a solution of the archive performs better on its search direction (weight) than any solution of the evolutionary population, it is likely that the niche of that solution is promising.

^{1}in the evolutionary population, and further determine the solutions associated with the neighbouring weights. Finally, we compare these solutions with the candidate solution on the basis of the candidate solution's weight. Formally, let $q$ be a candidate solution in the archive and $wq$ be its corresponding weight. Let $wp$ be one of the neighbouring weights of $wq$ in the evolutionary population, and $p$ be the solution associated with $wp$ in the evolutionary population. We define that $q$ outperforms $p$ on the basis of $wq$, if

### 3.4 Weight Generation

Given a reference point, the optimal weight to a solution (e.g., $w7$ to $a3$ in Figure 2c) with respect to the Tchebycheff scalarising function can be easily generated. This is already a frequently used approach in the weight adaptation (Gu et al., 2012; Qi et al., 2014).

^{2}and $w=(\lambda 1,\lambda 2,\u2026,\lambda m)$ be the optimal weight to a solution $q$ in the Tchebycheff scalarising function. Then it holds that

### 3.5 Weight Deletion

The above deletion operation is repeated until the number of the weights restores (i.e., back to $N$). However, there may exist one situation that even when every solution in the population corresponds to only one weight, the number of the weights still exceeds $N$. In this situation, we use the same diversity maintenance method of Section 3.2 to iteratively delete the most crowded solution (along with its weight) in the population until the number of the weights reduces to $N$.

### 3.6 Weight Update Frequency

The timing and frequency of updating the weights of the evolutionary population play an important role in weight adaptation methods. Since varying the weights essentially changes the subproblems to be optimised, a frequent change can significantly affect the convergence of the algorithm (Giagkiozis et al., 2013b). In AdaW, the weight update operation is conducted every $5%$ of the total generations/evaluations. In addition, when the algorithm approaches the end of the optimisation process, a change of the weights may lead to the solutions evolving insufficiently along those specified search directions (weights). Therefore, AdaW does not change the weights during the last $10%$ generations/evaluations.

### 3.7 Algorithm Framework

Algorithm 1 gives the main procedure of AdaW. As can be seen, apart from the weight update (Steps 21–25) and archive operations (Steps 5, 13–16, and 18–20), the remaining steps are the common steps in a generic decomposition-based algorithm. Here, we implemented them by a widely used MOEA/D version in Li and Zhang (2009), and the Tchebycheff scalarising function was used despite the fact that AdaW can be implemented by other scalarising functions with respect to its weight addition and deletion. That is, the steps of the initialisation (Steps 1–4), mating selection (Step 9), variation operation (Step 10), reference point update (Step 11), and population update (Step 12) follow the practice in Li and Zhang (2009).

Additional computational costs of AdaW (in comparison with the basic MOEA/D) are from the archiving operations and weight update. In one generation of AdaW, updating the archive (Steps 13–16) requires $O(mNNA)$ comparisons, where $m$ is the number of the problem's objectives, $N$ is the population size, and $NA$ is the archive size. Maintaining the archive (Steps 18–20) requires $O(mNA2)$ comparisons (Li et al., 2016). The computational cost of the weight update is governed by three operations, weight addition (Step 22), weight deletion (Step 23), and neighbouring weight update (Step 24). In the weight addition, undeveloped solutions are first determined. This includes calculating the niche radius and finding out undeveloped solutions, which require $O(mNA2)$ and $O(mNNA)$ comparisons, respectively. After $L$ undeveloped solutions are found, we check if they are promising by comparing them with the solutions that their neighbouring weights correspond to. The computational complexity of finding the neighbours of the $L$ weights is bounded by $O(mLN)$ or $O(TLN)$ ($T$ denotes the neighbourhood size), whichever is greater. Then, checking if these $L$ solutions are promising requires $O(mTL)$ comparisons. In the weight deletion, considering the situation that one solution shared by multiple weights requires $O(LN)$ comparisons and removing the weights which are associated with crowded solutions requires $O(m(L+N)2)$ comparisons (Li et al., 2016). Finally, after the weight deletion completes, updating the neighbours of each weight in the population requires $O(mN2)$ or $O(TN2)$ comparisons, whichever is greater.

To sum up, since $O(N)=O(NA)$ and $0\u2a7dL\u2a7dNA$, the additional computational cost of AdaW is bounded by $O(mN2)$ or $O(TN2)$ whichever is greater, where $m$ is the number of objectives and $T$ is the neighbourhood size. This governs the proposed algorithm, given a lower time complexity ($O(mTN)$) required in the basic MOEA/D (Zhang and Li, 2007).

## 4 Results

Three state-of-the-art weight adaptation approaches, A-NSGA-III (Jain and Deb, 2014), RVEA (Cheng et al., 2016), and MOEA/D-AWA (Qi et al., 2014), along with the baseline MOEA/D (Li and Zhang, 2009), were considered as peer algorithms^{3} to evaluate the proposed AdaW. These adaptations had been demonstrated to be competitive on MOPs with various Pareto fronts. In MOEA/D, the Tchebycheff scalarising function was used in which “multiplying the weight” was replaced with “dividing the weight” in order to obtain more uniform solutions (Qi et al., 2014; Deb and Jain, 2014).

In view of the goal of the proposed work, we selected 17 test problems with a variety of representative Pareto fronts from the existing problem suites (Van Veldhuizen, 1999; Zitzler et al., 2000; Deb et al., 2005; Deb and Saxena, 2005; Deb and Jain, 2014; Jain and Deb, 2014; Cheng et al., 2017). According to the properties of their Pareto fronts, we categorised the problems into seven groups to challenge the algorithms in balancing the convergence and diversity of solutions. They are as follows:

problems with a simple-like Pareto front: DTLZ1, DTLZ2, and convex DTLZ2 (CDTLZ2).

problems with an inverted simple-like Pareto front: inverted DTLZ1 (IDTLZ1) and inverted DTLZ2 (IDTLZ2).

problems with a highly nonlinear Pareto front: SCH1 and FON.

problems with a disconnect Pareto front: ZDT3 and DTLZ7.

problems with a degenerate Pareto front: DTLZ5 and VNT2.

problems with a scaled Pareto front: scaled DTLZ1 (SDTLZ1), scaled DTLZ2 (SDTLZ2), and SCH2.

problems with a high-dimensional Pareto front: 10-objective DTLZ2 (DTLZ2-10), 10-objective inverted DTLZ1 (IDTLZ1-10), and DTLZ5(2,10).

All the problems were configured as described in their original papers (Van Veldhuizen, 1999; Zitzler et al., 2000; Deb et al., 2005; Deb and Saxena, 2005; Deb and Jain, 2014; Jain and Deb, 2014; Cheng et al., 2017).

To compare the performance of the algorithms, the inverted generational distance (IGD) (Coello and Sierra, 2004) and hypervolume (Zitzler and Thiele, 1999) were used. IGD, which measures the average distance from uniformly distributed points along the Pareto front to their closest solution in a set, can provide a comprehensive assessment of the convergence and diversity of the set. To calculate IGD, we need a reference set that well represents the problem's Pareto front, and the assessment result may heavily depend on the specification of the reference set (Ishibuchi et al., 2018b). In most of the test problems used in our study, their Pareto fronts are known (e.g., the ZDT and DTLZ suites and the variants of the DTLZ problems). For them, we considered around 10,000 evenly distributed points along the Pareto front as the reference set. For remaining test problems, their reference sets were obtained from http://delta.cs.cinvestav.mx/∼ccoello/EMOO/.

However, IGD is not Pareto-compliant in the sense that it does not certainly prefer a Pareto-dominating set to a Pareto-dominated set. So we also used the Pareto-complaint indicator hypervolume. Hypervolume measures the volume of the objective space enclosed by a solution set and a reference point. Following the practice in Li, Yang, Liu et al. (2014), the reference point of DTLZ1, DTLZ2, SCH1, FON, ZDT3, DTLZ7, DTLZ5, VNT2, and SCH2 was set to $(1,1,1)$, $(2,2,2)$, $(5,5)$, $(2,2)$, $(2,2,7)$, $(2,2,2)$, $(5,16,12)$, and $(2,17)$, respectively. For the remaining problems, we considered common settings, that is, $(2,2,2)$ for CDTLZ2 and IDTLZ2, $(1,1,1)$ for IDTLZ1, $(2,2,\u2026,2)$ for DTLZ2-10 and DTLZ5(2,10), $(1,1,\u2026,1)$ for IDLTZ1, $(0.55,5.5,55)$ for SDTLZ1, and $(1.1,11,110)$ for SDTLZ2. Note that it is not necessary to normalise the solution set when measuring the hypervolume value for scaled problems, provided that the range of the Pareto front is taken into account in setting the reference point (Li and Yao, 2019).

In addition, for a visual understanding of the search behaviour of the five algorithms, we also plotted their final solution set in a single run on all the test problems. This particular run was associated with the solution set which obtained the median of the IGD values out of all the runs.

All the algorithms were given real-valued variables. Simulated binary crossover (SBX) (Agrawal et al., 1995) and polynomial mutation (PM) (Deb, 2001) (with the distribution indexes 20) were used to perform the variation. The crossover probability was set to $pc=1.0$ and mutation probability to $pm=1/d$, where $d$ is the number of variables in the decision space.

In decomposition-based EMO, the population size which correlates with the number of the weights cannot be set arbitrarily. For a set of uniformly distributed weights in a simplex, we set 100, 105, and 220 for the 2-, 3-, and 10-objective problems, respectively. Like many existing studies, the number of function evaluations was set to 25,000, 30,000 and 100,000 for 2-, 3-, and 10-objective problems, respectively. Each algorithm was executed with 30 independent runs on each problem.

Parameters of the peer algorithms were set as specified/recommended in their original papers. In MOEA/D, the neighbourhood size, the probability of parent solutions selected from the neighbours, and the maximum number of replaced solutions were set to $10%$ of the population size, 0.9, and $1%$ of the population size, respectively. In RVEA, the rate of changing the penalty function and the frequency to conduct the reference vector adaptation were set to 2 and 0.1, respectively. In MOEA/D-AWA, the maximal number of adjusting subproblems and the computational resources for the weight adaptation were set to $0.05N$ and $20%$, respectively. In addition, the size of the external population in MOEA/D-AWA was set to $1.5N$.

Several specific parameters are required in the proposed AdaW. As stated in Section 3.6, the time of updating the weights and the time of not allowing the update were every $5%$ of the total generations and the last $10%$ generations, respectively. Finally, the maximum capacity of the archive was set to $2N$.

Tables 3 and 4 give the IGD and hypervolume results (mean and standard deviation) of the five algorithms on all the 17 problems, respectively. The better mean for each problem was highlighted in boldface. To have statistically sound conclusions, the Wilcoxon's rank sum test (Zitzler et al., 2008) at a 0.05 significance level was used to test the significance of the differences between the results obtained by AdaW and the four peer algorithms.

Problem . | $m$ . | $d$ . | Properties . | Problem . | $m$ . | $d$ . | Properties . |
---|---|---|---|---|---|---|---|

DTLZ1 | 3 | 7 | Simplex-like, Linear, Multimodals | DTLZ5 | 3 | 12 | Degenerate, Concave |

DTLZ2 | 3 | 12 | Simplex-like, Concave | VNT2 | 3 | 2 | Degenerate, Convex |

CDTLZ2 | 3 | 12 | Simplex-like, Convex | SDTLZ1 | 3 | 7 | Scaled, Simplex-like, Linear, Multimodals |

IDTLZ1 | 3 | 7 | Inverted simplex-like, Linear, Multimodals | SDTLZ2 | 3 | 12 | Scaled, Simplex-like, Concave |

IDTLZ2 | 3 | 12 | Inverted simplex-like, Concave | SCH2 | 2 | 1 | Scaled, Discontinuous, Convex |

SCH1 | 2 | 1 | Highly nonlinear, Convex | DTLZ2-10 | 10 | 19 | Many-objective, Simplex-like, Concave |

FON | 2 | 2 | Highly nonlinear, Concave | IDTLZ1-10 | 10 | 19 | Many-objective, Inverted simplex-like, Linear, Multimodals |

ZDT3 | 2 | 30 | Disconnected, Mixed | DTLZ5(2,10) | 10 | 19 | Many-objective, Degenerate, Concave |

DTLZ7 | 3 | 22 | Disconnected, Mixed, Multimodal |

Problem . | $m$ . | $d$ . | Properties . | Problem . | $m$ . | $d$ . | Properties . |
---|---|---|---|---|---|---|---|

DTLZ1 | 3 | 7 | Simplex-like, Linear, Multimodals | DTLZ5 | 3 | 12 | Degenerate, Concave |

DTLZ2 | 3 | 12 | Simplex-like, Concave | VNT2 | 3 | 2 | Degenerate, Convex |

CDTLZ2 | 3 | 12 | Simplex-like, Convex | SDTLZ1 | 3 | 7 | Scaled, Simplex-like, Linear, Multimodals |

IDTLZ1 | 3 | 7 | Inverted simplex-like, Linear, Multimodals | SDTLZ2 | 3 | 12 | Scaled, Simplex-like, Concave |

IDTLZ2 | 3 | 12 | Inverted simplex-like, Concave | SCH2 | 2 | 1 | Scaled, Discontinuous, Convex |

SCH1 | 2 | 1 | Highly nonlinear, Convex | DTLZ2-10 | 10 | 19 | Many-objective, Simplex-like, Concave |

FON | 2 | 2 | Highly nonlinear, Concave | IDTLZ1-10 | 10 | 19 | Many-objective, Inverted simplex-like, Linear, Multimodals |

ZDT3 | 2 | 30 | Disconnected, Mixed | DTLZ5(2,10) | 10 | 19 | Many-objective, Degenerate, Concave |

DTLZ7 | 3 | 22 | Disconnected, Mixed, Multimodal |

Property . | Problem . | MOEA/D . | A-NSGA-III . | RVEA . | MOEA/D-AWA . | AdaW . |
---|---|---|---|---|---|---|

Simplex-like | DTLZ1 | 1.909E–02(3.1E–04)$\u2020$ | 2.463E–02(8.0E–03)$\u2020$ | 1.974E–02(2.2E–03) | 1.941E–02(6.1E–04) | 1.944E–02(3.1E–04) |

DTLZ2 | 5.124E–02(4.6E–04) | 5.222E–02(1.4E–03)$\u2020$ | 5.020E–02(7.3E–05)$\u2020$ | 5.070E–02(3.8E–04)$\u2020$ | 5.126E–02(6.0E–04) | |

CDTLZ2 | 4.388E–02(1.0E–04)$\u2020$ | 8.766E–02(2.8E–02)$\u2020$ | 4.198E–02(1.4E–03)$\u2020$ | 3.879E–02(3.2E–03)$\u2020$ | 2.852E–02(5.9E–04) | |

Inverted simplex-like | IDTLZ1 | 3.175E–02(7.9E–04)$\u2020$ | 2.091E–02(1.5E–03)$\u2020$ | 6.404E–02(4.6E–02)$\u2020$ | 2.698E–02(6.2E–04)$\u2020$ | 1.961E–02(4.8E–04) |

IDTLZ2 | 9.010E–02(1.5E–04)$\u2020$ | 7.200E–02(6.7E–03)$\u2020$ | 7.736E–02(1.7E–03)$\u2020$ | 7.166E–02(5.2E–03)$\u2020$ | 5.037E–02(6.2E–04) | |

Highly nonlinear | SCH1 | 4.835E–02(1.7E–03)$\u2020$ | 5.411E–02(9.7E–03)$\u2020$ | 4.643E–02(4.1E–03)$\u2020$ | 2.604E–02(3.6E–03)$\u2020$ | 1.703E–02(1.5E–04) |

FON | 4.596E–03(1.6E–05)$\u2020$ | 5.333E–03(4.5E–04)$\u2020$ | 5.161E–03(1.8E–04)$\u2020$ | 4.739E–03(5.2E–05)$\u2020$ | 4.632E–03(8.3E–05) | |

Disconnect | ZDT3 | 1.107E–02(5.1E–04)$\u2020$ | 3.735E–02(4.1E–02)$\u2020$ | 9.128E–02(4.2E–02)$\u2020$ | 3.125E–02(5.1E–02)$\u2020$ | 4.840E–03(5.6E–04) |

DTLZ7 | 1.297E–01(1.1E–03)$\u2020$ | 7.079E–02(2.3E–03)$\u2020$ | 1.012E–01(4.6E–03)$\u2020$ | 1.318E–01(9.0E–02)$\u2020$ | 5.275E–02(6.0E–04) | |

Degenerate | DTLZ5 | 1.811E–02(1.0E–05)$\u2020$ | 9.759E–03(1.2E–03)$\u2020$ | 6.816E–02(5.3E–03)$\u2020$ | 9.584E–03(2.9E–04)$\u2020$ | 3.976E–03(2.4E–04) |

VNT2 | 4.651E–02(2.7E–04)$\u2020$ | 2.143E–02(3.2E–03)$\u2020$ | 3.492E–02(4.6E–03)$\u2020$ | 1.961E–02(7.4E–04)$\u2020$ | 1.155E–02(2.3E–04) | |

Scaled | SDTLZ1 | 5.584E+00(2.0E+00)$\u2020$ | 7.426E–01(4.2E–02)$\u2020$ | 1.522E+00(2.0E+00)$\u2020$ | 2.988E+00(5.0E–01)$\u2020$ | 6.571E–01(6.0E–02) |

SDTLZ2 | 6.071E+00(2.0E+00)$\u2020$ | 1.357E+00(4.7E–02)$\u2020$ | 1.295E+00(1.7E–02)$\u2020$ | 4.176E+00(5.7E–01)$\u2020$ | 1.244E+00(5.2E–02) | |

SCH2 | 1.049E–01(2.6E–04)$\u2020$ | 5.109E–02(4.4E–02)$\u2020$ | 4.488E–02(3.6E–04)$\u2020$ | 5.538E–02(3.0E–03)$\u2020$ | 2.097E–02(3.1E–04) | |

Many objectives | DTLZ2-10 | 5.172E–01(1.4E–02) | 5.314E–01(6.2E–02)$\u2020$ | 4.924E–01(2.6E–05)$\u2020$ | 5.234E–01(3.1E–02) | 5.202E–01(1.4E–02) |

IDTLZ1-10 | 2.721E–01(7.7E–03)$\u2020$ | 1.507E–01(6.5E–03)$\u2020$ | 2.461E–01(9.0E–03)$\u2020$ | 2.421E–01(9.0E–03)$\u2020$ | 1.071E–01(3.3E–03) | |

DTLZ5(2,10) | 1.708E–01(1.6E–03)$\u2020$ | 4.431E–01(1.0E–01)$\u2020$ | 1.520E–01(2.3E–02)$\u2020$ | 3.830E–02(1.3E–02)$\u2020$ | 2.150E–03(1.8E–05) |

Property . | Problem . | MOEA/D . | A-NSGA-III . | RVEA . | MOEA/D-AWA . | AdaW . |
---|---|---|---|---|---|---|

Simplex-like | DTLZ1 | 1.909E–02(3.1E–04)$\u2020$ | 2.463E–02(8.0E–03)$\u2020$ | 1.974E–02(2.2E–03) | 1.941E–02(6.1E–04) | 1.944E–02(3.1E–04) |

DTLZ2 | 5.124E–02(4.6E–04) | 5.222E–02(1.4E–03)$\u2020$ | 5.020E–02(7.3E–05)$\u2020$ | 5.070E–02(3.8E–04)$\u2020$ | 5.126E–02(6.0E–04) | |

CDTLZ2 | 4.388E–02(1.0E–04)$\u2020$ | 8.766E–02(2.8E–02)$\u2020$ | 4.198E–02(1.4E–03)$\u2020$ | 3.879E–02(3.2E–03)$\u2020$ | 2.852E–02(5.9E–04) | |

Inverted simplex-like | IDTLZ1 | 3.175E–02(7.9E–04)$\u2020$ | 2.091E–02(1.5E–03)$\u2020$ | 6.404E–02(4.6E–02)$\u2020$ | 2.698E–02(6.2E–04)$\u2020$ | 1.961E–02(4.8E–04) |

IDTLZ2 | 9.010E–02(1.5E–04)$\u2020$ | 7.200E–02(6.7E–03)$\u2020$ | 7.736E–02(1.7E–03)$\u2020$ | 7.166E–02(5.2E–03)$\u2020$ | 5.037E–02(6.2E–04) | |

Highly nonlinear | SCH1 | 4.835E–02(1.7E–03)$\u2020$ | 5.411E–02(9.7E–03)$\u2020$ | 4.643E–02(4.1E–03)$\u2020$ | 2.604E–02(3.6E–03)$\u2020$ | 1.703E–02(1.5E–04) |

FON | 4.596E–03(1.6E–05)$\u2020$ | 5.333E–03(4.5E–04)$\u2020$ | 5.161E–03(1.8E–04)$\u2020$ | 4.739E–03(5.2E–05)$\u2020$ | 4.632E–03(8.3E–05) | |

Disconnect | ZDT3 | 1.107E–02(5.1E–04)$\u2020$ | 3.735E–02(4.1E–02)$\u2020$ | 9.128E–02(4.2E–02)$\u2020$ | 3.125E–02(5.1E–02)$\u2020$ | 4.840E–03(5.6E–04) |

DTLZ7 | 1.297E–01(1.1E–03)$\u2020$ | 7.079E–02(2.3E–03)$\u2020$ | 1.012E–01(4.6E–03)$\u2020$ | 1.318E–01(9.0E–02)$\u2020$ | 5.275E–02(6.0E–04) | |

Degenerate | DTLZ5 | 1.811E–02(1.0E–05)$\u2020$ | 9.759E–03(1.2E–03)$\u2020$ | 6.816E–02(5.3E–03)$\u2020$ | 9.584E–03(2.9E–04)$\u2020$ | 3.976E–03(2.4E–04) |

VNT2 | 4.651E–02(2.7E–04)$\u2020$ | 2.143E–02(3.2E–03)$\u2020$ | 3.492E–02(4.6E–03)$\u2020$ | 1.961E–02(7.4E–04)$\u2020$ | 1.155E–02(2.3E–04) | |

Scaled | SDTLZ1 | 5.584E+00(2.0E+00)$\u2020$ | 7.426E–01(4.2E–02)$\u2020$ | 1.522E+00(2.0E+00)$\u2020$ | 2.988E+00(5.0E–01)$\u2020$ | 6.571E–01(6.0E–02) |

SDTLZ2 | 6.071E+00(2.0E+00)$\u2020$ | 1.357E+00(4.7E–02)$\u2020$ | 1.295E+00(1.7E–02)$\u2020$ | 4.176E+00(5.7E–01)$\u2020$ | 1.244E+00(5.2E–02) | |

SCH2 | 1.049E–01(2.6E–04)$\u2020$ | 5.109E–02(4.4E–02)$\u2020$ | 4.488E–02(3.6E–04)$\u2020$ | 5.538E–02(3.0E–03)$\u2020$ | 2.097E–02(3.1E–04) | |

Many objectives | DTLZ2-10 | 5.172E–01(1.4E–02) | 5.314E–01(6.2E–02)$\u2020$ | 4.924E–01(2.6E–05)$\u2020$ | 5.234E–01(3.1E–02) | 5.202E–01(1.4E–02) |

IDTLZ1-10 | 2.721E–01(7.7E–03)$\u2020$ | 1.507E–01(6.5E–03)$\u2020$ | 2.461E–01(9.0E–03)$\u2020$ | 2.421E–01(9.0E–03)$\u2020$ | 1.071E–01(3.3E–03) | |

DTLZ5(2,10) | 1.708E–01(1.6E–03)$\u2020$ | 4.431E–01(1.0E–01)$\u2020$ | 1.520E–01(2.3E–02)$\u2020$ | 3.830E–02(1.3E–02)$\u2020$ | 2.150E–03(1.8E–05) |

“$\u2020$” indicates that the result of the peer algorithm is significantly different from that of AdaW at a 0.05 level by the Wilcoxon's rank sum test.

Property . | Problem . | MOEA/D . | A-NSGA-III . | RVEA . | MOEA/D-AWA . | AdaW . |
---|---|---|---|---|---|---|

Simplex-like | DTLZ1 | 9.738E-01(1.8E-04)$\u2020$ | 9.710E-01(3.5E-03)$\u2020$ | 9.733E-01(1.1E-03) | 9.734E-01(7.2E-04) | 9.735E-01(2.7E-04) |

DTLZ2 | 7.418E+00(1.2E-04)$\u2020$ | 7.412E+00(4.6E-03) | 7.418E+00(5.1E-04)$\u2020$ | 7.420E+00(1.1E-03)$\u2020$ | 7.412E+00(6.7E-03) | |

CDTLZ2 | 7.947E+00(7.9E-05)$\u2020$ | 7.937E+00(7.0E-03)$\u2020$ | 7.944E+00(1.6E-03)$\u2020$ | 7.949E+00(4.5E-04)$\u2020$ | 7.952E+00(1.7E-04) | |

Inverted simplex-like | IDTLZ1 | 6.808E-01(1.4E-03)$\u2020$ | 6.646E-01(4.1E-03)$\u2020$ | 6.159E-01(5.7E-02)$\u2020$ | 6.844E-01(1.1E-03) | 6.839E-01(2.1E-03) |

IDTLZ2 | 6.557E+00(2.1E-03)$\u2020$ | 6.617E+00(3.4E-02)$\u2020$ | 6.615E+00(7.0E-03)$\u2020$ | 6.687E+00(1.4E-02)$\u2020$ | 6.728E+00(3.7E-03) | |

Highly nonlinear | SCH1 | 2.224E+01(2.7E-03)$\u2020$ | 2.223E+01(1.9E-02)$\u2020$ | 2.225E+01(1.4E-02)$\u2020$ | 2.227E+01(3.5E-03) | 2.227E+01(8.2E-04) |

FON | 3.062E+00(2.1E-04)$\u2020$ | 3.057E+00(7.3E-03)$\u2020$ | 3.060E+00(3.4E-04)$\u2020$ | 3.062E+00(8.4E-04) | 3.061E+00(3.2E-03) | |

Disconnect | ZDT3 | 4.808E+00(4.4E-03)$\u2020$ | 4.517E+00(3.2E-01)$\u2020$ | 4.299E+00(3.3E-01)$\u2020$ | 4.632E+00(3.3E-01)$\u2020$ | 4.812E+00(5.0E-03) |

DTLZ7 | 1.341E+01(1.4E-03)$\u2020$ | 1.328E+01(6.3E-02)$\u2020$ | 1.310E+01(5.5E-02)$\u2020$ | 1.303E+01(1.1E+00)$\u2020$ | 1.347E+01(2.7E-02) | |

Degenerate | DTLZ5 | 6.076E+00(1.8E-05)$\u2020$ | 6.056E+00(3.4E-02)$\u2020$ | 5.936E+00(2.4E-02)$\u2020$ | 6.091E+00(8.9E-04)$\u2020$ | 6.102E+00(6.0E-03) |

VNT2 | 1.878E+00(2.0E-04)$\u2020$ | 1.905E+00(3.0E-03)$\u2020$ | 1.882E+00(5.9E-03)$\u2020$ | 1.912E+00(6.7E-04)$\u2020$ | 1.916E+00(2.8E-04) | |

Scaled | SDTLZ1 | 1.084E+02(8.2E+00)$\u2020$ | 1.392E+02(6.5E-01) | 1.283E+02(2.4E+01)$\u2020$ | 1.216E+02(3.0E+00)$\u2020$ | 1.393E+02(1.1E+00) |

SDTLZ2 | 5.736E+02(1.6E+01)$\u2020$ | 7.400E+02(1.8E+00)$\u2020$ | 7.437E+02(2.6E+00)$\u2020$ | 6.353E+02(2.6E+01)$\u2020$ | 7.483E+02(1.2E+00) | |

SCH2 | 3.794E+01(1.1E-03)$\u2020$ | 3.815E+01(1.8E-01)$\u2020$ | 3.814E+01(6.9E-03)$\u2020$ | 3.813E+01(1.1E-02)$\u2020$ | 3.825E+01(4.2E-03) | |

Many objectives | DTLZ2-10 | 1.024E+03(6.5E-02) | 1.023E+03(8.9E-01)$\u2020$ | 1.024E+03(1.2E-02)$\u2020$ | 1.023E+03(7.4E-01)$\u2020$ | 1.024E+03(2.1E-02) |

IDTLZ1-10 | 6.351E-03(1.8E-04)$\u2020$ | 1.159E-02(7.9E-04)$\u2020$ | 4.505E-03(3.6E-04)$\u2020$ | 9.189E-03(5.6E-04)$\u2020$ | 2.631E-02(1.2E-03) | |

DTLZ5(2,10) | 6.365E+02(5.9E+00)$\u2020$ | 5.444E+02(7.8E+01)$\u2020$ | 6.362E+02(5.9E+01)$\u2020$ | 6.800E+02(1.1E+01)$\u2020$ | 7.067E+02(4.6E-01) |

Property . | Problem . | MOEA/D . | A-NSGA-III . | RVEA . | MOEA/D-AWA . | AdaW . |
---|---|---|---|---|---|---|

Simplex-like | DTLZ1 | 9.738E-01(1.8E-04)$\u2020$ | 9.710E-01(3.5E-03)$\u2020$ | 9.733E-01(1.1E-03) | 9.734E-01(7.2E-04) | 9.735E-01(2.7E-04) |

DTLZ2 | 7.418E+00(1.2E-04)$\u2020$ | 7.412E+00(4.6E-03) | 7.418E+00(5.1E-04)$\u2020$ | 7.420E+00(1.1E-03)$\u2020$ | 7.412E+00(6.7E-03) | |

CDTLZ2 | 7.947E+00(7.9E-05)$\u2020$ | 7.937E+00(7.0E-03)$\u2020$ | 7.944E+00(1.6E-03)$\u2020$ | 7.949E+00(4.5E-04)$\u2020$ | 7.952E+00(1.7E-04) | |

Inverted simplex-like | IDTLZ1 | 6.808E-01(1.4E-03)$\u2020$ | 6.646E-01(4.1E-03)$\u2020$ | 6.159E-01(5.7E-02)$\u2020$ | 6.844E-01(1.1E-03) | 6.839E-01(2.1E-03) |

IDTLZ2 | 6.557E+00(2.1E-03)$\u2020$ | 6.617E+00(3.4E-02)$\u2020$ | 6.615E+00(7.0E-03)$\u2020$ | 6.687E+00(1.4E-02)$\u2020$ | 6.728E+00(3.7E-03) | |

Highly nonlinear | SCH1 | 2.224E+01(2.7E-03)$\u2020$ | 2.223E+01(1.9E-02)$\u2020$ | 2.225E+01(1.4E-02)$\u2020$ | 2.227E+01(3.5E-03) | 2.227E+01(8.2E-04) |

FON | 3.062E+00(2.1E-04)$\u2020$ | 3.057E+00(7.3E-03)$\u2020$ | 3.060E+00(3.4E-04)$\u2020$ | 3.062E+00(8.4E-04) | 3.061E+00(3.2E-03) | |

Disconnect | ZDT3 | 4.808E+00(4.4E-03)$\u2020$ | 4.517E+00(3.2E-01)$\u2020$ | 4.299E+00(3.3E-01)$\u2020$ | 4.632E+00(3.3E-01)$\u2020$ | 4.812E+00(5.0E-03) |

DTLZ7 | 1.341E+01(1.4E-03)$\u2020$ | 1.328E+01(6.3E-02)$\u2020$ | 1.310E+01(5.5E-02)$\u2020$ | 1.303E+01(1.1E+00)$\u2020$ | 1.347E+01(2.7E-02) | |

Degenerate | DTLZ5 | 6.076E+00(1.8E-05)$\u2020$ | 6.056E+00(3.4E-02)$\u2020$ | 5.936E+00(2.4E-02)$\u2020$ | 6.091E+00(8.9E-04)$\u2020$ | 6.102E+00(6.0E-03) |

VNT2 | 1.878E+00(2.0E-04)$\u2020$ | 1.905E+00(3.0E-03)$\u2020$ | 1.882E+00(5.9E-03)$\u2020$ | 1.912E+00(6.7E-04)$\u2020$ | 1.916E+00(2.8E-04) | |

Scaled | SDTLZ1 | 1.084E+02(8.2E+00)$\u2020$ | 1.392E+02(6.5E-01) | 1.283E+02(2.4E+01)$\u2020$ | 1.216E+02(3.0E+00)$\u2020$ | 1.393E+02(1.1E+00) |

SDTLZ2 | 5.736E+02(1.6E+01)$\u2020$ | 7.400E+02(1.8E+00)$\u2020$ | 7.437E+02(2.6E+00)$\u2020$ | 6.353E+02(2.6E+01)$\u2020$ | 7.483E+02(1.2E+00) | |

SCH2 | 3.794E+01(1.1E-03)$\u2020$ | 3.815E+01(1.8E-01)$\u2020$ | 3.814E+01(6.9E-03)$\u2020$ | 3.813E+01(1.1E-02)$\u2020$ | 3.825E+01(4.2E-03) | |

Many objectives | DTLZ2-10 | 1.024E+03(6.5E-02) | 1.023E+03(8.9E-01)$\u2020$ | 1.024E+03(1.2E-02)$\u2020$ | 1.023E+03(7.4E-01)$\u2020$ | 1.024E+03(2.1E-02) |

IDTLZ1-10 | 6.351E-03(1.8E-04)$\u2020$ | 1.159E-02(7.9E-04)$\u2020$ | 4.505E-03(3.6E-04)$\u2020$ | 9.189E-03(5.6E-04)$\u2020$ | 2.631E-02(1.2E-03) | |

DTLZ5(2,10) | 6.365E+02(5.9E+00)$\u2020$ | 5.444E+02(7.8E+01)$\u2020$ | 6.362E+02(5.9E+01)$\u2020$ | 6.800E+02(1.1E+01)$\u2020$ | 7.067E+02(4.6E-01) |

“$\u2020$” indicates that the result of the peer algorithm is significantly different from that of AdaW at a 0.05 level by the Wilcoxon's rank sum test.

### 4.1 On Simplex-Like Pareto Fronts

On MOPs with a simplex-like Pareto front, decomposition-based algorithms are expected to perform well. Figures 3–5 plot the final solution set of the five algorithms on DTLZ1, DTLZ2, and CDTLZ2, respectively. As can be seen, MOEA/D, RVEA, MOEA/D-AWA, and AdaW can all obtain a well-distributed solution set, despite the set of AdaW not being so “regular” as that of the other three algorithms. An interesting observation is that A-NSGA-III (adapting the weights in NSGA-III) appears to struggle in maintaining the uniformity of the solutions, especially for DTLZ1 and CDTLZ2. This indicates that adapting the weights may compromise the performance of decomposition-based approach on simplex-like Pareto fronts, as NSGA-III had been demonstrated to work very well on these three MOPs (Deb and Jain, 2014). In addition, it is worth mentioning that on the convex DTLZ2 there is an interval between the outer and inner solutions in the solution sets of MOEA/D, RVEA and MOEA/D-AWA. In contrast, the proposed AdaW has no such interval, thereby returning better IGD and hypervolume results as shown in Tables 3 and 4. Another note is on the preservation of the extreme solutions (e.g., $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$ for DTLZ2) in AdaW. Preserving the extreme solutions is not an trivial task in a weight adaptation process, as their corresponding extreme weights can be easily discarded, particularly after the normalisation of the weights (Ishibuchi, Doi et al., 2017). Interestingly, as shown in the three figures, AdaW is doing well in preserving the extreme solutions. This occurrence is due to the fact that the extreme weights can be seen to be located in relatively sparse regions (as not many weights around them), and thus they are unlikely to be eliminated during the weight deletion process. But they are not definitely preserved—they are missing in some situations.

### 4.2 On Inverted Simplex-Like Pareto Fronts

The proposed AdaW has shown a clear advantage over its competitors on this group. Figures 6–7 plot the final solution set of the five algorithms on IDTLZ1 and IDTLZ2, respectively. As shown, many solutions of MOEA/D and MOEA/D-AWA concentrate on the boundary of the Pareto front. The solutions of A-NSGA-III have a good coverage but are not distributed very uniformly, while the solutions of RVEA are distributed uniformly but their number is apparently less than the population size. For AdaW, an inverted simple-like Pareto front has no effect on the algorithm's performance, and the obtained solution set has a good coverage and uniformity over the whole front. However, an interesting observation is that when looking at the hypervolume results on IDTLZ1 in Table 4, MOEA/D-AWA is preferred to AdaW. This is because the optimal distribution of solutions for hypervolume maximisation may not be even, as shown in Ishibuchi et al. (2018a) and Li, Yang et al. (2015).

### 4.3 On Highly Nonlinear Pareto Fronts

The peer algorithms perform differently on the two instances of this group. On the problem with a concave Pareto front (i.e., FON), all the algorithms work well (see Figure 9), despite A-NSGA-II and RVEA performing slightly worse than the other three. In contrast, on the problem with a convex Pareto front (i.e., SCH1), only the proposed AdaW can obtain a well-distributed solution set, and the others fail to extend their solutions to the boundary of the Pareto front (see Figure 8). This indicates that the convex Pareto front still poses a challenge to decomposition-based approach even if some weight adaptations are introduced.

### 4.4 On Disconnected Pareto Fronts

Figures 10 and 11 plot the final solution set of the five algorithms on ZDT3 and DTLZ7, respectively. On ZDT3, only AdaW and A-NSGA-III can maintain a good distribution of the solution set. MOEA/D and MOEA/D-AWA show a similar pattern, with their solutions distributed sparsely on the upper-left part of the Pareto front. The set obtained by RVEA has many dominated solutions. On DTLZ7, only the proposed algorithm works well. The peer algorithms either fail to lead their solutions to cover the Pareto front (MOEA/D and MOEA/D-AWA), or struggle to maintain the uniformity (A-NSGA-III), or produce some dominated solutions (RVEA).

### 4.5 On Degenerate Pareto Fronts

Problems with a degenerate Pareto front pose a big challenge to decomposition-based approaches because the ideal weight set is located in a lower-dimensional manifold than its initial setting (Li et al., 2018). On this group of problems, the proposed algorithm has shown a significant advantage over its competitors (see Figures 12 and 13). It is worth noting that VNT2 has a mixed Pareto front, with both ends degenerating into two curves and the middle part being a triangle-like plane. As can be seen from Figure 13, the solution set of AdaW has a good distribution over the whole Pareto front.

### 4.6 On Scaled Pareto Fronts

Figures 14–16 plot the final solution set of the five algorithms on SDTLZ1, SDTLZ2 and SCH2, respectively. For the first two problems, AdaW, A-NSGA-III, and RVEA work fairly well, but the solutions obtained by RVEA are not so uniform as those obtained by the other two algorithms on SDTLZ1. For SCH2, which also has a disconnected Pareto front, AdaW significantly outperforms its competitors, with the solution set being uniformly distributed over the two parts of the Pareto front. In fact, all the competitors, except the original MOEA/D, use the normalisation operation in their calculation. However, as pointed out in Ishibuchi, Doi et al. (2017), the normalisation in decomposition-based algorithms may cause the degradation of the diversity of solutions in the population. So, the normalisation may not work on all scaled problems. Interestingly, our algorithm performs well on all the scaled problems. One probable explanation is that AdaW not only considers the normalisation of the current population, but also the normalisation of the archive which stores a set of well-distributed nondominated solutions, and then uses the archive to guide the weight update (via a comparison between the population and archive). This could avoid the diversity loss of solutions (and their associated weights) during the normalisation process.

### 4.7 On Many-Objective Problems

This section evaluates the performance of the proposed AdaW on many-objective problems by considering three instances, the 10-objective DTLZ2, 10-objective IDTLZ1, and DTLZ5(2,10) where the number of objectives is 10 and the true Pareto front's dimensionality is 2.

For the 10-objective DTLZ2 which has a simplex-like Pareto front, all five algorithms appear to work well (see Figure 17) despite that there exist several solutions of AdaW not fully converging into the Pareto front. We may not be able to conclude the distribution difference of the algorithms by the parallel coordinates plots (Li et al., 2017), but all the algorithms seem to perform similarly according to the IGD and hypervolume results in Tables 3 and 4.

For the many-objective problems whose Pareto front is far from the standard simplex, a clear advantage of AdaW over its competitors is shown (see Figures 18 and 19). The peer algorithms either fail to cover the whole Pareto front (i.e., MOEA/D, A-NSGA-III, and MOEA/D-AWA on the 10-objective IDTLZ1 and MOEA/D and MOEA/D-AWA on DTLZ5(2,10)), or struggle to converge into the front (i.e., RVEA on the 10-objective IDTLZ1 and A-NSGA-III and RVEA on DTLZ5(2,10)). In contrast, the proposed AdaW has shown its ability in dealing with irregular Pareto fronts in the high-dimensional space, by which a spread of solutions over the whole Pareto front is obtained.

### 4.8 Discussions

Methods involving weight update need a parameter for controlling the update frequency, except for those which change weights every generation, for example, A-NSGA-III (Jain and Deb, 2014). However, a frequent weight change may lead to the solutions wandering around the search space (Giagkiozis et al., 2013b). Like in MOEA/D-AWA (Qi et al., 2014), we used a percentage value ($5%$) of the maximum generations/evaluations as the update frequency. Also, the algorithm does not allow the weights to change in the last $10%$ generations/evaluations. In a situation where the maximum generation number is not applicable (i.e., the total number of generations being not as the termination condition of the algorithm, for example, in particular real-world scenarios), we recommend to update the weights every $5\xd7m$ generations, where $m$ denotes the number of objectives.

The test problems considered in our experimental studies all are unconstrained. However, constrained MOPs are widely seen in real-world scenarios. There exist several constraint-handling techniques used in decomposition-based EMO, such as the feasibility-first scheme (Fonseca and Fleming, 1998) used in NSGA-III (Deb and Jain, 2014) and RVEA (Cheng et al., 2016), and the epsilon level comparison (Asafuddoula et al., 2012) used in DBEA (Asafuddoula et al., 2015) and g-DBEA (Asafuddoula et al., 2018). The proposed AdaW can easily incorporate these constraint-handling techniques via slightly modifying the selection operation of the algorithm.

Finally, it is worth pointing out that because AdaW adopts a Pareto nondominated archive set for weight generation, the algorithm may struggle if the archive set cannot well represent the whole Pareto front. The problems proposed in Liu et al. (2014) challenge EMO algorithms in such a way. The nondominated solutions in those problems lie on a very small area of the search space. Once the extremities of the Pareto front are achieved by the algorithm, all other solutions will be dominated. This makes it very difficult to obtain the central part of the front. This difficulty applies to all algorithms that use Pareto dominance as the primary selection criterion (e.g., Pareto-based algorithms) or use a criterion providing higher selection pressure than Pareto dominance (e.g., most indicator-based and decomposition-based algorithms). We leave addressing this issue as an important topic of our future study.

## 5 Conclusions

Adaptation of the weights during the optimisation process provides a viable approach to enhance existing decomposition-based EMO. This article proposed an adaptation method to periodically update the weights by contrasting the current evolutionary population with a well-maintained archive set. From experimental studies on seven categories of problems with various properties, the proposed algorithm has shown its high performance over a wide variety of different Pareto fronts.

However, it is worth noting that the proposed algorithm needs more computational resources than the basic MOEA/D. The time complexity of AdaW is bounded by $O(mN2)$ or $O(TN2)$ whichever is greater (where $m$ is the number of objectives and $T$ is the neighbourhood size), in contrast to $O(mTN)$ of MOEA/D. In addition, AdaW also incorporates several parameters, such as the maximum capacity of the archive and the time of updating the weights. Although these parameters were fixed on all test problems in our study, customised settings for specific problems may lead to better performance. For example, a longer duration allowing the weights evolving along the constant weights is expected to achieve better convergence on problems with many objectives. Another potential improvement is from the weight deletion operation, where one may consider deleting the weight that has the biggest angle from the solution (instead of the one that has the worst scalarising function value), as the ideal case is to see solutions exactly lie on the search directions determined by the weights.

## Acknowledgments

The authors would like to thank Dr. Liangli Zhen for his help in the experimental study. This work has been supported by the Science and Technology Innovation Committee Foundation of Shenzhen (ZDSYS201703031748284), Shenzhen Peacock Plan (KQTD2016112514355531), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), and EPSRC (EP/J017515/1 and EP/P005578/1).

## Notes

^{1}

The definition of neighbouring weights is based on that in MOEA/D (i.e., the Euclidean distance between weights).

^{2}

The reference point in decomposition-based algorithms is often set to be equal to or slightly smaller than the best value found so far (Wang, Zhang, Zhou et al., 2016; Qi et al., 2014); here we set it to $10-4$ smaller than the best value throughout the method, following the suggestions in Wang, Xiong et al. (2017). This setting is also adopted in the calculation of the scalarising function.

^{3}

The codes of all the peer algorithms were from http://bimk.ahu.edu.cn/index.php?s=/Index/Software/index.html (Tian, Cheng, Zhang, and Jin, 2017).