## Abstract

In multiobjective optimization, set-based performance indicators are commonly used to assess the quality of a Pareto front approximation. Based on the scalarization obtained by these indicators, a performance comparison of multiobjective optimization algorithms becomes possible. The and the hypervolume (HV) indicator represent two recommended approaches which have shown a correlated behavior in recent empirical studies. Whereas the HV indicator has been comprehensively analyzed in the last years, almost no studies on the indicator exist. In this extended version of our previous conference paper, we thus perform a comprehensive investigation of the properties of the indicator in a theoretical and empirical way. The influence of the number and distribution of the weight vectors on the optimal distribution of solutions is analyzed. Based on a comparative analysis, specific characteristics and differences of the and HV indicator are presented. Furthermore, the indicator is integrated into an indicator-based steady-state evolutionary multiobjective optimization algorithm (EMOA). It is shown that the so-called -EMOA can accurately approximate the optimal distribution of solutions regarding .

## 1 Introduction

Multiobjective optimization comprises the optimization of vectors of *m* objectives. We assume minimization tasks in the following, as maximization problems can easily be transferred to those by negation. A solution in the objective space is said to dominate a solution (), iff is better than or equal to in all objectives () and better than in at least one (). Only the first condition is necessary for to weakly dominate (). The aim of multiobjective optimization is to detect the Pareto optimal front .

Evolutionary multiobjective algorithms (EMOAs) are a specific class of solvers for multiobjective problems. They basically aim at approximating the true Pareto front of the problem at hand by minimizing the distance to the true Pareto front (convergence) and simultaneously covering all its parts (spread) in a well-defined way (distribution; Zitzler, 1999). Several performance indicators were introduced (Zitzler et al., 2008, 2003) to assess one or all of these quality aspects of Pareto front approximations.

The hypervolume (HV; Zitzler and Thiele, 1998) and the indicator (Hansen and Jaszkiewicz, 1998) are two recommended approaches (Knowles et al., 2006) which simultaneously evaluate all desired aspects of a Pareto front approximation. The HV fulfills the property of strict monotonicity, that is, the indicator value of a set *A* that dominates the set *B* has to be higher than the indicator value for the set *B*, assuming that the indicator is to be maximized. In contrast, the indicator is only weakly monotonic, that is, in the case where *A* weakly dominates *B*. In practice, this typically makes no difference for problems with continuous variables as it is unlikely that two solutions with the exact same objective values exist. Hence, the indicator has been considered as an alternative for the HV for two reasons. On the one hand, the runtime of the HV is expected to increase exponentially with respect to the number of objectives *m* as its computation is known to be NP-hard (Bringmann and Friedrich, 2008). The best known algorithms to exactly compute the hypervolume indicator are all exponential in the number of objectives *m*, although there was some important recent progress: an time algorithm (Bringmann, 2012) was beaten for 6, 5, and 4 objective functions by an bound (Yildiz and Suri, 2012) and, most recently, a proof of an upper bound was given by Chan (2013) with *n* being the number of solutions. On the other hand, the distributions obtained using the HV are biased toward the knee regions of the Pareto front (Auger et al., 2009b). Note that in practice, the first argument against the hypervolume indicator is nowadays less convincing as, for example, in the 2- and 3-objective case algorithms running in time are known (Beume et al., 2009) and the hypervolume indicator (and single solutions’ contributions to it) can be efficiently approximated by Monte Carlo sampling (Bader and Zitzler, 2011; Bringmann and Friedrich, 2009). Despite this point, we believe that there is strong interest in an in-depth investigation of the indicator and its relation to the HV.

As the monotonicity of both the HV and the indicator is guaranteed (Zitzler et al., 2003), similar behavior of the indicators can be expected. Ishibuchi et al. even proposed a scalarization-based approach equal to the indicator for approximating the hypervolume of an approximation set (Ishibuchi et al., 2009, 2010; Tsukamoto et al., 2011). However, as the preferred distributions seem to be different, the degree of similarity might vary in different settings. In Wessing (2009) and Wessing and Naujoks (2010), optimal parameterizations of the multiobjective optimizer SMS-EMOA (Beume et al., 2007) were investigated. Correlation structures between the evaluations of Pareto fronts based on different quality indicators were analyzed in order to select the most suitable indicator for the used sequential parameter optimization (SPO) approach. It was shown that the HV and indicator are highly correlated on a huge number of randomly generated populations in different dimensions. Pearson’s correlation coefficient (Snedecor and Cochran, 1989) takes a statistically significant value of . However, the indicators are different enough that they led to statistically distinguishable results when used individually in the SPO framework.

In this paper, which is a revised and extended version of a former conference paper (Brockhoff et al., 2012), we aim at a more detailed understanding of the properties of the indicator. We perform an analysis about *how* the indicator differs from the HV indicator for which several theoretical properties are already known (Auger et al., 2009b, 2012). More specifically, we show that in the bi-objective case the indicator tends to place the points more concentrated in the central region of the Pareto front than the HV indicator and that the optimal placement of a point according to the indicator only depends on its two neighbors and a subset of the weight vectors. Furthermore, the finite sets of solutions maximizing the indicator among all feasible sets of solutions—the so-called *optimal -distributions of the indicator*—are approximated for standard test functions with different Pareto front shapes. We show that those have different characteristics compared to the optimal -distributions of the HV indicator (Auger et al., 2009b). They turn out to be not unique in general and to depend on the number and distribution of the weight vectors used. In addition to the results presented in Brockhoff et al. (2012), this paper also considers test problems with disconnected Pareto fronts within the experiments. The effect of the distribution of the weight vectors is analyzed by alternatively investigating a uniform coverage of the angle space in addition to the standard uniform coverage of the weight space. The influence of the weight vector distribution and the placement of the ideal point is investigated in detail in Wagner et al. (2013), where it is shown that the optimal -distributions of the indicator heavily depend on the weight vector distribution, the domain of the weight space, and the position of the ideal point.

Furthermore, we integrated the indicator into the general framework of indicator-based EMOAs. The contribution to the indicator is used as a secondary selection criterion within a respective steady state version, denoted as -EMOA in the following. In particular, we investigate whether the approximated optimal -distributions of the indicator can be accurately reproduced by the greedy -EMOA. Though preliminary experiments with the -EMOA have been already presented in Trautmann et al. (2013) on a few test functions, we here analyze in more detail whether the -EMOA is able to converge to the optimal -distributions for the indicator in particular on problems with disconnected Pareto fronts. The results are also compared with the standard SMS-EMOA of Beume et al. (2007).

To be precise, the paper was enhanced and improved w.r.t. Brockhoff et al. (2012) and Trautmann et al. (2013) regarding the following aspects: (a) additional investigations on test problems with disconnected Pareto fronts, (b) an increased accuracy of the approximation of the optimal -distributions w.r.t. the indicator, (c) improvements of the comparison plots of weight and angle space, (d) a more detailed analysis whether the optimal -distributions can actually be sufficiently approximated by a specifically designed EMOA based on -indicator based selection (-EMOA), (e) a revised notation, and (f) a few technical corrections (see Appendix).

Details of the indicators are given in Section 2. Properties of the indicator are discussed in Section 3, followed by an analysis and comparison of the optimal -distributions for both, the and the HV indicator, in Section 4. The -EMOA and related experiments are presented in Section 5. Finally, conclusions are drawn and an outlook is given in Section 6.

### 1.1 Notation

In this paper, we focus on optimal distributions in the objective space. The parameter vectors in the decision space are therefore omitted. Objective vectors are referred to using lowercase letters from the beginning of the alphabet. In cases where the positions on the bi-objective Pareto front are directly optimized, we refer to the respective coordinates of the first objective using for the points of the discrete approximation set. The corresponding values of the second objective with respect to the analytical equation of the Pareto front are denoted as .

## 2 *R*2 and HV Indicators

The *R* indicator family (Hansen and Jaszkiewicz, 1998) is based on utility functions which map a vector to a scalar utility value for assessing the relative quality of two Pareto front approximation sets.

Since the first summand is a constant if we assume *R* to be constant, we delete the first summand and call the resulting unary indicator also for simplicity.

Note that we assume *R* to be constant throughout the paper and will only refer to Definition 3 when we use the term indicator.

Different choices of the required utility function exist, for example, based on weighted sum and weighted Chebyshev functions or a combination of both. As suggested by Hansen and Jaszkiewicz (1998), we use the standard weighted Chebyshev function within the indicator in the following where is a given weight vector and is a utopian point.^{1}

*k*uniformly distributed weights in the space . Alternatively, a uniform distribution over the angle space can be considered for objectives. In this case, rays from the utopian point with different slopes for are constructed and then mapped to weight vectors by normalization.

^{2}A weight vector corresponding to a ray which directly crosses a point is denoted as .

In addition to an overall quality assessment of an approximation set *A*, the contribution of an individual solution to the value can be of interest.

The HV indicator (Zitzler and Thiele, 1998; Zitzler et al., 2008) describes the hypervolume of a bounded space dominated by an approximation set.

Analogous to Definition 5, the HV contribution of an individual solution reflects the influence of a single point on the quality of the approximation set.

As mentioned in Section 1, the HV indicator is a performance indicator which is strictly compliant with the Pareto dominance relation. Implementations and a comprehensive overview of the state of the art research w.r.t. the HV indicator are provided (for example, by Beume et al., 2012 and Bringmann, 2012).

## 3 Properties of the *R*2 Indicator

In order to interpret the values of a quality indicator during the performance assessment of multiobjective optimizers, it is crucial to understand the inherent preferences expressed by the choice of this indicator. To be more precise, it is fundamental information to know which solution sets of size achieve the maximum indicator values among all possible sets of a given size (Auger et al., 2009b). Only with this knowledge is it then possible to interpret the resulting indicator values of different multiobjective optimizer outcomes absolutely instead of only relatively as the *achieved* indicator values can be compared with the *maximum achievable* values. For the optimum sets of size , the term *optimal -distribution* was introduced and such sets have already been characterized theoretically for the standard, as well as for the weighted, HV indicator (Auger et al., 2009b, 2012). Before we investigate the concrete optimal -distributions on specific test functions, additionally for the indicator in Section 4, we first prove some general theoretical statements about the indicator.

### 3.1 Locality of the *R*2 Indicator Contribution

At first, we prove that for bi-objective problems the optimal placement of a point according to the indicator only depends on its two neighbors and only on a subset of the weight vectors. Note that for the HV indicator, such a locality property has also been proven (Auger et al., 2009b). Before formalizing the general locality property of the indicator, let us prove two preliminary statements.

Given a specific weight vector and a specific solution where *A* only contains nondominated solutions. Then all points to the left of (to the right of ) are resulting in a worse Chebyshev utility than if (if ).

*f*

_{1}value that is smaller than the one of () and their

*f*

_{2}value is larger than the one of (). These points lie in the filled box in the left plot of Figure 1—otherwise, would not be nondominated with respect to and lies to the left. In the case that , this means that for all other points and thus for the Chebyshev utility function, it holds that , that is, is better since we minimize the maximum weighted component. The proof of the other case follows a symmetric argument and is visualized in the right plot of Figure 1.

As a special case, we can investigate the specific weight vector that corresponds to a ray starting in the utopian point and going through (cf. Sec. 2).

For each solution , there exists a specific weight with and such that all other solutions which do not dominate the point have a worse weighted Chebyshev value compared to .

The proof follows the same ideas as the previous one. For and , it follows that . But then, for all solutions that do not dominate and are left of , and holds, such that which is equivalent to the fact that the in the weighted Chebyshev for is also larger and thus worse than the weighted Chebyshev function for . With a symmetric argument, we can prove that all solutions that do not dominate and are to the right of it are also worse.

As a result of the above lemmas, it follows that the optimal placement of a point according to the indicator, given that all other points are already placed optimally, only depends on its neighbor(s) as well as on a subset of weight vectors.

The optimal placement of a point with that maximizes the indicator, given that all other points are already known, only depends on and and the weight vectors targeting the areas of the Pareto front between these two points. Likewise, the optimal placements of *x*_{1} and only depend on their single neighbor.

According to the above lemmas, for all weight vectors to the left of , is better than *x _{i}* and, analogously, for all weight vectors to the right of , is better than

*x*. Consequently, the optimal placement of

_{i}*x*only depends on these vectors.

_{i}### 3.2 Optimal -Distributions for the *R*2 Indicator

With the above lemmas, we can now prove a few general statements about the optimal distribution of points with respect to the indicator. In particular for the special case of more points than weight vectors (), the optimal -distributions turn out to contain the intersection points of the rays corresponding to the weight vectors with the Pareto front.

In the case that , optimal -distributions for the indicator with given weight vectors contain all intersection points between the rays defined by the weights and the actual Pareto front.

As we have seen in Lemma 2, no nondominated solution is better with respect to a given weight vector than the solution actually lying on the corresponding ray. Together with the weak monotonicity of the indicator, this means that for a given weight vector, no other (feasible) solution can have a better indicator contribution for this weight than the intersection between the corresponding ray and the Pareto front. Assuming , the optimal -distribution consequently will include all those intersection points.

Note here that the optimal -distribution is unique in the case and given that all rays have to intersect with the Pareto front. In the case where is larger than the number *N* of intersection points between the rays and the Pareto front, the indicator has no influence on points. For the opposite case, we can show that the optimal -distributions in the case are also not always unique.

In some cases where , the optimal -distribution is not unique, that is, more than one optimal -distribution might exist.

*x*value of the third point while the indicator contributions with respect to and are fixed to their optimal value by the other two points lying on the extremes. With simple calculus, we obtain for the indicator value depending on the

*x*value of the third point with . The previous equation can be rewritten as with the minimum value of reached for , see also the right-hand plot of Figure 2. Hence, the optimal 3-distribution is not unique in this case and not even a finite number of distinct optimal 3-distributions exist, but infinitely many solution sets are optimal.

Let us briefly comment on the leftmost and rightmost solutions and the above assumption to place them on the boundaries of the Pareto front. Due to symmetry reasons, we only consider the placement of the leftmost point. With in the optimal interval , we know that the placement of the leftmost point is only influenced by and . Hence, we can compute the contribution of which results, again with simple calculus, in and which is minimized for .

Note that in the previous example, both the Pareto front and the weight vectors are symmetric with respect to the angle bisection of the first coordinate system quadrant and it is furthermore *not* proven that, in general, more than one optimal -distribution can occur for any choice of . Even more fundamentally, the question is open under which assumptions at least one optimal -distribution *exists*—for any choice of and . One simple example for the nonexistence of optimal -distributions is given below.

*EXAMPLE 1:*

*f*. The other weight vector shall define a weighted Chebyshev problem, the direction of which goes through the discontinuity, see Figure 3 for an illustration. Then, no optimal -distribution for the indicator exists: Placing points at the intersections of the weighted Chebyshev problems with the Pareto front results in the minimal possible term for of the summands in the indicator of Definition 4—leaving room for only one point to be placed optimally. In order to also minimize the last summand, the last point can approach the discontinuity of

*f*from the right, resulting in smaller and smaller indicator values. But the minimal value cannot be achieved because of the upper semi-continuity of

*f*. Approaching the discontinuity from the other side will always result in even larger indicator values.

## 4 Approximations of Optimal -Distributions for the *R*2 Indicator

The above theoretical results already gave some insights into the optimal distribution of points regarding the indicator. In this section, these insights are enhanced with empirical observations with respect to the optimal -distributions of this indicator and an analysis of the difference to the optimal -distributions of the HV indicator. To accomplish this, we consider the Pareto fronts of the three established test problems ZDT1 (Zitzler et al., 2000), DTLZ1, and DTLZ2 (Deb et al., 2002) because these fronts feature different shapes—convex, linear (as a special case of convexity), and concave. Their exact Pareto front definitions are (ZDT1), (DTLZ1), and (DTLZ2). On each of these test problems, we approximated the optimal -distributions for and the true ideal point as utopian using the CMA-ES (Hansen and Ostermeier, 2001) on the 10 respective *x* values. Other reference points were not considered as the effect of moving has been discussed in a recent paper (Wagner et al., 2013). Other points than the true ideal point would either result in a focus on particular areas of the front or in a reduction of intersection points between target vectors and the front which is not desired in this experiment.

To also analyze the effect of the number of weight vectors and to approximate the integral in Definition 1 which resembles an infinite number of weight vectors, the different multiples were considered. The same holds for different weight distributions by using uniform distributions in weight and angle space. On each combination of test problem, number of weight vectors, and weight distribution, 10 independent runs of the CMA-ES are performed in order to have a more accurate estimate of the optimum distribution and to also analyze the variability in the results.

For a smart initialization, we used the theoretical result of Theorem 1. Based on the given distribution, weight vectors were generated and the corresponding intersections between the rays and the Pareto front were computed using standard calculus. The intersections’ *x* coordinates were then used as initial mean vector for the CMA-ES runs.

In addition, two test problems with disconnected Pareto fronts were considered, ZDT3 (Zitzler et al., 2000) and DTLZ7 (Deb et al., 2002). The respective experimental setup in principle coincides with the previously described experiments. The ideal point for ZDT3 is set to , and the levels of considered weight vector numbers are extended by as the number of rays intersecting with the Pareto front is smaller than due to the gaps between the disconnected parts of the Pareto front.

Due to the disconnected Pareto fronts, the search for the optimal -distributions is restricted to several disconnected intervals, which imposes additional difficulties on the optimization process. In order to circumvent this situation, the intervals are artificially connected within the optimization process. The sum of the individual interval lengths is taken and each interval represents its proportion of in the connected space. The CMA-ES operates on the so-transformed decision space, that is, . Within the fitness function calculation, the transformed solutions are mapped back to the original intervals, as is of course the case for the final solutions of the CMA-ES optimization. The initial solutions are determined analogously on the transformed space. Due to the complex shape of the fronts and the additional transformation, an analytical solution is no longer possible. Therefore, binary search is used to calculate the intersections of the weight vectors with the objective space boundary.

### 4.1 Effect of the Number of Weight Vectors

The results of the different runs of the CMA-ES are shown in Figures 4 and 5. The figures on the left result from a uniform distribution in weight space while the respective ones on the right are based on uniformly distributing weight vectors in angle space. The positions of the initial solutions, that is, the optimum distribution for , is highlighted using black vertical lines with the exception of ZDT3 and DTLZ7. As in this case the initial solution sets do not represent the optimal 10-distribution regarding for 10 weight vectors, no vertical lines are added in the corresponding plots. The final positions after each run of the CMA-ES are depicted using gray dots. The best result for each setting is printed in black.

As a general observation, it can be stated that for the test problems with continuous Pareto fronts the variance of the results is higher for a lower number of weight vectors. In particular for , it can hardly be distinguished between the result distributions of the different positions. For , in contrast, the results of the CMA-ES hardly show any variance. It seems as if each intersection point is a local optimum position, making the optimization hard for fewer weight vectors and higher distances between these local optima. With an increasing number of weights, the local basins become so small that the CMA-ES can easily jump from one to another. In contrast, for ZDT3 and DTLZ7 the variability of the result distributions unfortunately does not decrease significantly for , although the variability of the resulting distributions is much higher for .

With respect to the approximated optimal positions, the increase of shows no significant effect for the convex Pareto front of ZDT1 and—given a uniform distribution in angle space—for the linear front of DTLZ1. The final points all lie close to the black lines indicating the optimal position for . For uniform weights on DTLZ1, the points tend to become denser in the center of the Pareto front, as they all move toward the center from their initial position. Interestingly, for DTLZ2 it is shown that the optimal positions of the extreme points change with increasing *N*, that is, for more than 100 weight vectors the distribution of the points is highly biased toward the right part of the front. No points but the extreme point at are placed within the interval . This may be caused by the possibility to strongly improve *f*_{2} without deteriorating *f*_{1} too much (cf. Figure 8).

Due to the fact that for ZDT3 and DTLZ7, the optimization has to deal with discontinuities we continue to see a large variance within the 10 CMA-ES runs for increasing numbers of weight functions. Especially in the two leftmost front parts, we can see this behavior in Figure 5. When the CMA-ES has, for example, three solutions within the first front part, it seems to be good to place one of them in the middle of it while the other two should lie on the borders. However, in the more unlikely case (due to the initialization) that only two solutions map to the first front part, the indicator is better (black dots). What can also be observed from Figure 5 is that, apart from the case of where more solutions exist than weight vectors intersect with the front, the number of weight vectors seems to have much less effect on the optimal -distribution than for the problems with connected fronts.

In all problems, and even for scenarios in which the optimum distribution changes with increasing , the distribution stabilizes after a specific number of weight vectors is exceeded. In our bi-objective experiments, this threshold lies at about . For higher , the actual integral seems to be well approximated by the discretization by means of the sum over a finite set of weight vectors.

### 4.2 Comparing the Hypervolume and the *R*2 Indicators

Despite the empirically shown correlated behavior (Wessing, 2009; Wessing and Naujoks, 2010), the indicators show specific preferences regarding the optimal distribution of the solutions on the Pareto front. For this purpose, it is analyzed how the indicator would rearrange the points of the optimal -distribution w.r.t. the HV indicator and vice versa. In Figures 6 and 7, the results for on the considered test problems are shown. Based on the optimal points on the *x* axis (black dots), the contributions to the other indicator are shown on the *y* axis. The latter are derived by analyzing the situation where the least-contributing point is shifted to the respective *x*; the contributions along the shift direction are considered. Recall that the contributions of a solution to the specific indicators are provided in Definitions 5 and 7. By this means, structural differences of the two indicators in the optimal placement of solutions can be investigated. Moreover, the approach allows for the assessment of the idea of Ishibuchi and colleagues (Ishibuchi et al., 2010; and Tsukamoto et al., 2011) to approximate the HV by means of the R2 indicator.

For the optimal -distributions of the HV indicator, it becomes obvious that the contributions tend to shift the points to the center of the front in order to reach a denser distribution in this region. The opposite tendency can be seen for the contributions to the HV indicator starting from the optimal -distribution of the indicator. For the linear front of DTLZ1, for which the contributions are symmetric, the size of this shift decreases toward the center. This result is consistent for both kinds of weight distributions in the indicator. For the disconnected fronts, the contributions mainly indicate a replacement of solutions from one region to another. The shifts within a region are rather small.

*x*values of corresponding points, that is, , from left to right (second to last row of Figure 8). In order to have resulting symmetric plots if the front itself is symmetric and to be able to compare results between different problems, we also plot the differences between the distributions’ points in the coordinate system along the front instead of their projection on the

*f*

_{1}axis (last row in Figure 8). The arc length along the front between the point and a point is thereby computed as where describes the front shape and is its derivative. For the three test problems of Figure 8, we can give analytical expressions for

*l*: All values are normalized by the total length of the front.

_{f}What can be seen from Figure 8 is that in all cases, the optimal -distributions of the indicators have a tendency toward the middle of the front, that is, the distances between neighboring points compared to the HV indicator are larger at the extremes and smaller in the middle of the front. These distances turn out to be higher in angle space than in weight space. Sine-shaped curves of different values result in the bottom of Figure 8. It is caused by the higher influence of balanced weight vectors, that is, vectors having almost similar weight components for each objective.

To understand the increase of influence for these weight vectors, consider the linear front of DTLZ1, weight vectors, and a significantly lower number of points. For a uniform distribution in weight space, we know (cf. initial distributions of Figure 4) that the intersection points of the vectors with the linear front are also uniformly distributed. Due to the selection of the best solution for each weight vector of the indicator, each point covers a subset of weight vectors in its direct neighborhood. Thereby, the optimum point for the extreme weight vector has a value of . Even for the next weight vector , this point obtains a value of compared to the optimum value of . In contrast, the optimum point for the balanced weight vector has a value of . For the next weight vector , the corresponding value of this point is compared to the optimum value of . Whereas we lose at the extremes, we lose in the center—the loss is more than five times greater in this region. As a consequence, the density of points becomes higher to compensate for this effect.

Note also that the bottom row of Figure 8 suggests that the absolute differences of the optimal -distributions between a uniform distribution of the weights and a uniform distribution in angle space seem to be independent of the shape of the Pareto front. On the other hand, we can observe that, for both weight distributions, the differences to the optimal -distribution of the HV indicator become smaller “the more concave” the Pareto front is.

The points at the right border dominate all intersection points between the Pareto boundary and the target vectors in the gap region. Even for the points on the upper left border of the next front segment, they provide almost the same value of the Chebyshev scalarization with respect to the corresponding target vectors.

The curvature of each segment monotonically decreases from left to right.

The latter is particularly interesting, as a higher density of points with slopes close to one has already been shown for the HV (Auger et al., 2009b). Indeed, differences in the point distributions between the HV and the indicator with equally distributed weight vectors in angle and weight space only occur if single points switch from one interval to an adjacent one while the positions of the remaining points stay the same. For DTLZ7, even the two different kinds of weight vector distributions lead to a very similar optimization result. It can already be observed in Figure 5 that although there is variability in the optimization results even for large numbers of weight vectors, the best solution found stays almost the same for . Note that the coordinate system change to plot the differences along the front as in Figure 8 is more difficult for the problems with disconnected fronts. It might also not bring additional insights as none of the fronts is symmetric such that we therefore do not show a similar plot here.

The disconnected fronts also allow the effect of the relative position of the reference point to the Pareto front to be assessed. As the central three Pareto front segments of ZDT3 are of comparable length and shape, they can be considered as results of different experiments with changing relative positions. A significant effect of the relative position can be observed. A third point is put to the central segment. In turn, the outer segment is only covered by a single point. This reflects the higher density due to the balanced weight vectors already observed for the linear front.

## 5 The *R*2-EMOA: An EMOA with *R*2 Indicator-Based Selection

When investigating the optimal -distributions for the indicator, the immediate question arises whether they can be reached by an actual optimization algorithm. For the (weighted) HV indicator, Auger et al. (2009a) already showed that the algorithm W-HypE can approximate the optimal -distributions. Here, we will investigate how the basic indicator-based selection scheme that is used in many state of the art EMOAs, such as in the SMS-EMOA (Beume et al., 2007) or in the MO-CMA-ES (Igel et al., 2007), can be translated to the indicator. In Trautmann et al. (2013), we already presented preliminary experiments on this topic but with the slightly different focus on integrating preferences of a decision maker into the search algorithm and also only for three test functions with connected Pareto fronts. In the following, we will present the idea behind the indicator-based selection in the -EMOA, discuss some implementation details, and show an experimental comparison between the -EMOA, the SMS-EMOA, and the optimal -distributions for the indicator on several well-known test problems with connected and disconnected Pareto fronts.

### 5.1 *R*2 Indicator-Based Selection

Algorithm 1 shows the pseudocode of the -EMOA with a simple indicator-based steady state selection. After a random initialization of the population and the generation of one new offspring per iteration, the -EMOA uses standard non-dominated sorting and deletes the solution with the worst rank that has the smallest indicator loss, or, in other words, the deletion of which results in the smallest indicator value of the remaining population. The utopian point and the set of weight vectors are direct parameters of the algorithm which makes it possible to integrate various preferences of a decision maker into the search as we have shown in a preliminary study (Trautmann et al., 2013). Here, we present results of a MATLAB implementation which is further optimized for speed and therefore runs similarly fast as the MATLAB implementation of the SMS-EMOA (Beume et al., 2007) for the tested instances and parameter settings.

### 5.2 Implementation Details of the *R*2-EMOA

The above described -EMOA has been implemented in MATLAB on the basis of the SMS-EMOA implementation of Kretschmar and Wagner, available via Beume et al. (2012) which also served as a reference algorithm.^{3} To optimize for speed, the weighted Chebyshev value for each solution and weight vector is only calculated once and stored throughout the algorithm. Furthermore, instead of calculating the value for all the solution sets , as in the pseudocode of Algorithm 1, we determine the worst solution within the set *R _{h}* directly. For each solution , we thereby calculate the sum of differences in Chebyshev values between solution and the second best solution for each weight vector over all the weight vectors, for which itself results in the best Chebyshev function value, see also Definition 5. The overall worst solution in

*R*is then the one with the smallest sum.

_{h}For the following experiments, we use a population size of in order to be able to compare the -EMOA results with the approximated optimal -distributions of the previous experiments. As variation operators, polynomial mutation and SBX crossover are used with the standard parameter setting from Beume et al. (2012); variable crossover probability: 0.9, distribution index for crossover: 15, variable mutation probability: 1/(number of variables), distribution index for mutation: 20, variable swap probability: 0.5). Online convergence detection (Wagner and Trautmann, 2010; Wagner et al., 2011) is turned off and we run each algorithm 10 times with independently chosen random seeds. Each run is stopped after 150,000 function evaluations. According to whether the weights are chosen uniformly in weight space or in angle space, we distinguish between the two algorithms -EMOA and -EMOA. In all cases, the number of weights is 500. As test problems, we again use the above DTLZ1, DTLZ2, DTLZ7, ZDT1, and ZDT3 functions with 6, 11, 21, 30, and 30 decision variables as suggested in Zitzler et al. (2000), Deb et al. (2002), and Deb et al. (2001; for DTLZ7). For the DTLZ7 problem, and different from the original publication, we restrict the variables to the interval instead of the interval in order to have a Pareto front with four disconnected front parts as in the results on the optimal -distributions above. The initial population is drawn by sampling within the feasible search space uniformly at random. The ideal point is except for the ZDT3 problem where it is chosen as . The SMS-EMOA chooses the reference point of the HV indicator adaptively as in Beume et al. (2007).

### 5.3 Comparison Baseline: Approximations of Optimal -Distributions

In order to compare the two R2-EMOA variants with the SMS-EMOA, we generated the best known approximation of the optimal -distribution for each test problem and 500 weight vectors in the following way. First, 10 independent CMA-ES runs with maximally three restarts each were started similar to the ones in Section 4. Second, a simple local search in terms of MATLAB’s fmincon was run for each test problem. And last, the CMA-ES approximations as well as the final R2-EMOA populations after 150,000 function evaluations were further refined by running the same fmincon local search with the CMA-ES and R2-EMOA outputs as starting solutions. The best of all those approximations of the optimal -distribution was taken as a baseline in the following experimental comparison.

### 5.4 Experimental Comparison

When looking at the differences between the indicator values obtained by the optimization algorithms and the indicator value of the best approximation of the optimal -distribution for in Figure 10, we can make two main observations.

First, the algorithm optimizing the indicator also used for the performance assessment (with weights uniformly distributed in either weight or angle space) results most of the time in the smallest differences to the approximation of the optimal one. This is actually expected as the algorithm optimizes the same indicator that is used for the comparison. The only exceptions are the runs for the discontinuous problems ZDT3 and DTLZ7 and uniform weights in angle space. Here, several algorithms also show a large variance, both in the indicator differences (Figure 10) and in objective space (Figure 11), especially for the DTLZ7 problem. The main reason for this seems to be the fact that not all algorithm runs cover all disconnected front parts. In particular for the DTLZ7 problem, it seems to be more difficult to produce a solution on a not yet covered front part than to (locally) optimize the indicator by moving solutions within an already covered part of the Pareto front.

The second observation is that the -EMOA is able to find solution sets that are close to the best known approximation of the optimum. In other words, the local search of fmincon only allowed for an additional gain in R2 indicator value between around and . Note that except for the case of DTLZ1 and uniform weight (where the stand-alone fmincon run resulted in the best R2 value), it was always the refinements of the R2-EMOA populations which defined the best known approximations.

## 6 Conclusions

In addition to the HV indicator, the indicator has been recommended as an indicator for performance assessment of multiobjective optimizers (Zitzler et al., 2003, 2008). Although the indicator is only weakly monotonic with respect to the dominance relation, while the HV indicator is strictly monotonic, it has been reported that both indicators show a correlated behavior in practice (Wessing and Naujoks, 2010).

In this paper, which is a revised and extended version of a former conference paper (Brockhoff et al., 2012; see the Appendix of this paper for the corrections to the conference paper), we introduced a unary, completely equivalent, version of the indicator and investigated its properties and its differences from the HV indicator. Such kind of study was performed for the first time at this level of detail. It assisted in obtaining a deeper understanding of those popular indicators. In particular, it was shown that in the bi-objective case, the indicator has a bias toward the center of the Pareto front when compared to the HV indicator, which we demonstrated to be caused by the maximum operation of the Chebyshev function (see the discussion in Section 4.2). This finding is in contrast with the common assumption of a more uniform coverage obtained by the indicator. We proved that the optimal placement of a point according to the indicator only depends on its two neighbors and a subset of the weight vectors. Furthermore, approximations of the optimal -distributions for the indicator based on uniform weight distributions in weight as well as in angle space were generated for standard test functions with different Pareto front shapes. They showed different characteristics compared to the optimal -distributions of the HV indicator (Auger et al., 2009b), whereby the bias toward the center is even more pronounced for the uniform distribution in angle space.

A steady state EMOA based on the indicator contributions as secondary selection criterion, denoted as -EMOA, was also introduced. It was experimentally shown that the -EMOA successfully approximates the optimal 10-distributions regarding for the considered test functions. It turned out that disconnected Pareto fronts impose a much higher challenge on the -EMOA optimization process than continuous fronts. However, for all test problems, in most cases, similar approximations could be evolved compared to the approximations of the optimal 10-distributions conducted by exploiting the knowledge of the true Pareto front shape. In future studies, the potential of other multiobjective optimization approaches beyond the steady state concept will be investigated and additionally compared to other EMOA optimizing the R2-indicator such as MOEA/D (Zhang and Li, 2007) and MSOPS (Hughes, 2007).

Although the comparisons between the two indicators obtained important insight into their specific characteristics, several open questions need to be addressed in future studies beyond the previously mentioned aspects. One important issue is the scaling of the indicators with the number of objectives. It is expected that the differences between the indicators become larger with three and more objectives, as it is known that the HV indicator tends toward the extremes of the Pareto front if the reference point is far enough away. In contrast, a uniform distribution of the indicator’s weight vectors should still have the effect of focusing on the center of the front, as observed in the bi-objective case. Also, additional theoretical results on the optimal -distributions for the indicator would be valuable, for example, characterizing the exact optimal placements of solutions depending on the number and distribution of the weight vectors in terms of a density as in Auger et al. (2009b, 2012). These results may further improve the design of multi-objective optimization algorithms, as well as their performance assessment.

## Acknowledgments

This paper is based on investigations of the project D5 “Synthesis and multi-objective model-based optimization of process chains for manufacturing parts with functionally graded properties” as part of the collaborative research center SFB/TR TRR 30 which is kindly supported by the Deutsche Forschungsgemeinschaft (DFG). Heike Trautmann acknowledges support by the European Center of Information Systems (ERCIS). In addition, the authors acknowledge support by the French national research agency (ANR) within the Modèles Numérique project “NumBBO—Analysis, Improvement and Evaluation of Numerical Blackbox Optimizers” (ANR-12-MONU-0009-03).

## References

## Appendix: Corrections with Respect to the GECCO Version

Note that with respect to the previous version of the paper (Brockhoff et al., 2012), the following two errors have been corrected.

Definition 4 now reads and not

- The conversion from slopes into weights now reads “rays from the utopian point with different slopes for are constructed and then mapped to weight vectors by normalization” and the corresponding footnote has been corrected to “The normalized weight vector to the slope is given by as long as , and by if or if . In our case,
*x*= 1 and ”.

## Notes

^{1}

An objective vector that is not dominated by any feasible search point.

^{2}

The normalized weight vector to the ray is given by for , and by if or if .

^{3}

The source code of the -EMOA is available for download at http://inriadortmund.gforge.inria.fr/r2emoa/. It contains also a greedy version of the above steady state selection as a generalization to the case of offspring per iteration.