Many optimization problems arising in applications have to consider several objective functions at the same time. Evolutionary algorithms seem to be a very natural choice for dealing with multi-objective problems as the population of such an algorithm can be used to represent the trade-offs with respect to the given objective functions. In this paper, we contribute to the theoretical understanding of evolutionary algorithms for multi-objective problems. We consider indicator-based algorithms whose goal is to maximize the hypervolume for a given problem by distributing points on the Pareto front. To gain new theoretical insights into the behavior of hypervolume-based algorithms, we compare their optimization goal to the goal of achieving an optimal multiplicative approximation ratio. Our studies are carried out for different Pareto front shapes of bi-objective problems. For the class of linear fronts and a class of convex fronts, we prove that maximizing the hypervolume gives the best possible approximation ratio when assuming that the extreme points have to be included in both distributions of the points on the Pareto front. Furthermore, we investigate the choice of the reference point on the approximation behavior of hypervolume-based approaches and examine Pareto fronts of different shapes by numerical calculations.
Multi-objective optimization (Ehrgott, 2005) deals with the task of optimizing several objective functions at the same time. Here, several attributes of a given problem are employed as objective functions and are used to define a partial order, called preference order, on the solutions, for which the set of minimal (maximal) elements is sought. Usually, the objective functions are conflicting, which means that improvements with respect to one function can only be achieved when impairing the solution quality with respect to another objective function. Due to this, such problems usually do not have a single optimal function value. Instead, there is a set of optimal objective vectors which represents the different trade-offs of the different objective functions. Solutions that cannot be improved with respect to any function without impairing another one are called Pareto-optimal solutions. The objective vectors associated with these solutions are called Pareto-optimal objective vectors and the set of all these objective vectors constitutes the Pareto front.
In contrast to single-objective optimization, in multi-objective optimization the task is not to compute a single optimal solution but a set of solutions representing the different trade-offs with respect to the given objective functions. Most of the best-known single-objective polynomially solvable problems such as shortest path or minimum spanning tree become NP-hard when at least two weight functions have to be optimized at the same time. In this sense, multi-objective optimization is generally considered as more difficult than single-objective optimization.
Another, more promising, approach to deal with multi-objective optimization problems is to apply general stochastic search algorithms that evolve a set of possible solutions into a set of solutions that represent the trade-offs with respect to the objective functions. Well-known approaches in this field are evolutionary algorithms (Bäck et al., 1997) and ant colony optimization (Dorigo and Stutzle, 2004). Especially, multi-objective evolutionary algorithms (MOEAs) have been shown to be very successful when dealing with multi-objective problems (Coello Coello et al., 2002; Deb, 2001). Evolutionary algorithms work with a set of solutions called population which is evolved over time by applying crossover and mutation operators to produce new possible solutions for the underlying multi-objective problem. Due to this population-based approach, they are in a natural way well-suited for dealing with multi-objective optimization problems.
A major problem when dealing with multi-objective optimization problems is that the number of different trade-offs may be too large. This implies that not all trade-offs can be computed efficiently, this is, in polynomial time. In the discrete case the Pareto front may grow exponentially with respect to the problem size and may even be infinite in the continuous case. In such a case, it is not possible to compute the whole Pareto front efficiently, and the goal is to compute a good approximation consisting of a not too large set of Pareto-optimal solutions. It has been observed empirically that MOEAs are able to obtain good approximations for a wide range of multi-objective optimization problems.
The aim of this paper is to contribute to the theoretical understanding of MOEAs in particular with respect to their approximation behavior. Many researchers have worked on how to use evolutionary algorithms for multi-objective optimization problems and how to find solutions being close to the Pareto front and covering all parts of the Pareto front. However, often the optimization goal remains rather unclear as it is not stated explicitly how to measure the quality of an approximation that a proposed algorithm should achieve.
One popular approach to achieve the mentioned objectives is to use the hypervolume indicator (Zitzler and Thiele, 1999) for measuring the quality of a population. This approach has gained increasing interest in recent years (see, e.g., Beume et al., 2007; Igel et al., 2007; Knowles and Corne, 2003; Zitzler et al., 2007). The hypervolume indicator implicitly defines an optimization goal for the population of an evolutionary algorithm. Unfortunately, this optimization goal is rarely understood from a theoretical point of view. Auger and colleagues (2009) have shown that the slope of the front determines which objective vectors maximize the value of the hypervolume when dealing with continuous Pareto fronts. Comparing hypervolume-optimal sets and best-approximation sets in the worst-case scenario, Bringmann and Friedrich (2013b) observed that maximizing the hypervolume aligns well with an additive approximation while a good multiplicative approximation is achieved when the hypervolume of logarithmized axes is maximized (Friedrich et al., 2009). Whether additive or multiplicative approximation is appropriate, depends on the meaning of an objective: Additive approximation is invariant to shifting the objective function; multiplicative approximation is invariant to scaling the objective function. While Bringmann and Friedrich (2013b) studied the worst-case behavior, we examine the properties of common specific Pareto fronts. The aim of this paper is to further increase the theoretical understanding of the hypervolume indicator and examine its multiplicative approximation behavior.
As multi-objective optimization problems often involve a vast number of Pareto-optimal objective vectors, multi-objective evolutionary algorithms use a population of fixed size and try to evolve the population into a good approximation of the Pareto front. However, often it is not stated explicitly what a good approximation for a given problem is. One approach that allows a rigorous evaluation of the approximation quality is to measure the quality of a solution set with respect to its approximation ratio (Papadimitriou and Yannakakis, 2000). We follow this approach and examine the approximation ratio of a population with respect to all objective vectors of the Pareto front.
The advantage of the approximation ratio is that it gives a meaningful scalar value which allows us to compare the quality of solutions between different functions, different population sizes, and even different dimensions. This is not the case for the hypervolume indicator. A specific dominated volume does not give a priori any information how well a front is approximated. Also, the hypervolume measures the space relative to an arbitrary reference point (cf. Section 2.1). This (often unwanted) freedom of choice not only changes the distribution of the points, but also makes the hypervolumes of different solutions measured relative to a (typically dynamically changing) reference point very hard to compare.
Our aim is to examine whether a given solution set of search points maximizing the hypervolume (called the optimal hypervolume distribution) gives a good approximation measured with respect to the approximation ratio. We do this by investigating two classes of objective functions having two objectives each and analyze the optimal distribution for the hypervolume indicator and the one achieving the optimal approximation ratio.
In a first step, we assume that both sets of points have to include both optimal points regarding the given two single objective functions. We point out situations where maximizing the hypervolume provably leads to the best approximation ratio achievable by choosing Pareto-optimal solutions. After these theoretical investigations, we carry out numerical investigations to see how the shape of the Pareto front influences the approximation behavior of the hypervolume indicator and point out where the approximation given by the hypervolume differs from the best one achievable by a solution set of points. These initial theoretical and experimental results investigating the correlation between the hypervolume indicator and multiplicative approximations have been published as a conference version in Friedrich et al. (2009).
This paper extends its conference version in Section 4 to the case where the optimal hypervolume distribution is dependent on the chosen reference point. The reference point is a crucial parameter when applying hypervolume-based algorithms. It determines the area in the objective space where the algorithm focuses its search. As the hypervolume indicator itself, it is hard to understand the impact of the choice of the reference point. Different studies have been carried out on this topic and initial results on the optimal hypervolume distribution in the dependence of the reference point have been obtained in Auger et al. (2009) and Brockhoff (2010). We provide new insights into how the choice of the reference point may affect the approximation behavior of hypervolume-based algorithms. In our studies, we relate the optimal hypervolume distribution with respect to a given reference to the optimal approximation ratio obtainable when having the freedom to choose the points arbitrarily.
The rest of the paper is structured as follows. In Section 2, we introduce the hypervolume indicator and our notation of approximations. Section 3 gives analytic results for the approximation achievable by the hypervolume indicator under the assumption that both extreme points have to be included in the two distributions and reports on our numerical investigations into Pareto fronts having different shapes. In Section 4, we generalize our results and study the impact of the reference point on the optimal hypervolume distribution and relate this choice to the best possible overall approximation ratio when choosing points. Finally, we finish with some concluding remarks.
2 The Hypervolume Indicator and Multiplicative Approximations
In this paper, we consider bi-objective maximization problems for an arbitrary decision space . We are interested in the so-called Pareto front of P, which consists of all maximal elements of with respect to the weak Pareto dominance relation. We restrict ourselves to problems with a Pareto front that can be written as where is a continuous, differentiable, and strictly monotonically decreasing function. This allows us to denote with f not only the actual function , but also the front itself. We assume further that and hold.
We intend to find a solution set of Pareto-optimal search points that constitutes a good approximation of the front f.
2.1 Hypervolume Indicator
The hypervolume (HYP) measures the volume of the dominated portion of the objective space. It was first introduced for performance assessment in multi-objective optimization by Zitzler and Thiele (1999). Later on it was used to guide the search in various hypervolume-based evolutionary optimizers (Beume et al., 2007; Emmerich et al., 2005; Igel et al., 2007; Knowles et al., 2003; Zitzler and Künzli, 2004; Zitzler et al., 2007).
The hypervolume indicator is a popular second-level sorting criterion in many recent multi-objective evolutionary algorithms for several reasons. Apart from having a very intuitive interpretation, it is also the only common indicator that is strictly Pareto-compliant (Zitzler et al., 2003). Strictly Pareto-compliant means that given two solution sets A and B, the indicator values A higher than B if the solution set A dominates the solution set B. It has further been shown by Bringmann and Friedrich (2013b) that the worst-case approximation factor of all possible Pareto fronts obtained by any hypervolume-optimal set of fixed size is asymptotically equal to the best worst-case approximation factor achievable by any set of size .
In the last years, the hypervolume has become very popular and several algorithms have been developed to calculate it. The first one was the hypervolume by slicing objectives (HSO) algorithm, which was suggested independently by Zitzler (2001) and Knowles (2002). For it can be solved in (asymptotically optimal) time (Fonseca et al., 2006). The currently best asymptotic runtime for is (Yldz and Suri, 2012). The best known bound for large dimensions is (Bringmann, 2012).
On the other hand, Bringmann and Friedrich (2012) proved that all hypervolume algorithms must have a superpolynomial runtime in the number of objectives (unless ). Assuming the widely accepted exponential time hypothesis, the runtime must even be at least (Bringmann and Friedrich, 2013a). As this dashes the hope for fast and exact hypervolume algorithms, there are several estimation algorithms (Bader and Zitzler, 2011; Bringmann and Friedrich, 2010, 2012) for approximating the hypervolume based on Monte Carlo sampling.
In the following, we define our notion of approximation in a formal way. Let be a solution set and f a function that describes the Pareto front. We call a Pareto front convex if the function defining the Pareto front is a convex function. Otherwise, we call the Pareto front concave. Note that this differs from the notation used in Friedrich et al. (2009).
The approximation ratio of a solution set X with respect to f is defined according to Papadimitriou and Yannakakis (2000) as follows.
Figure 1(b) shows the area of the objective space that a certain solution set X-approximates for . Note that this area covers the entire Pareto front f. Since the objective vector is not -approximated for all , the approximation ratio of X is .
Our definition of approximation is similar to the definition of multiplicative -dominance given in Laumanns et al. (2002). In this paper, an algorithmic framework for discrete multi-objective optimization is proposed which converges to a -approximation of the Pareto front.
3 Results Independent of the Reference Point
The goal of this paper is to relate the above definition of approximation to the optimization goal implicitly defined by the hypervolume indicator. Using the hypervolume, the choice of the reference point decides which parts of the front are covered. In this section we avoid the additional influence of the reference point by considering only solutions where both extreme points have to be included. The influence of the reference point is studied in Section 4.
All the functions that we consider in this paper have positive and bounded domains and codomains. Furthermore, the functions that are under consideration do not have infinite or zero derivative at the extremes. Hence, choosing the reference point for appropriate ensures that the points and are contained in an optimal hypervolume distribution. A detailed calculation on how to choose the reference point such that and are contained in an optimal hypervolume distribution is given in Auger et al. (2009). Assuming that and have to be included in the optimal hypervolume distribution, the value of the volume is in this section independent of the choice of the reference point. Therefore, we write instead of .
Note that ``optimal hypervolume distributions'' are also called ``optimal -distributions'' (Auger et al., 2009; Brockhoff, 2010) or ``maximum hypervolume set'' (Bringmann and Friedrich, 2013b) in the literature.
We want to investigate the approximation ratio obtained by a solution set maximizing the hypervolume indicator in comparison to an optimal one. For this, we first examine conditions for an optimal approximation distribution . Later on, we consider two classes of functions f on which the optimal hypervolume distribution is equivalent to the optimal approximation distribution and therefore provably leads to the best achievable approximation ratio.
3.1 Optimal Approximations
We now consider the optimal approximation ratio that can be achieved placing points on the Pareto front given by the function f. The following lemma states a condition which allows for a check on whether a given set consisting of points achieves an optimal approximation ratio for a given function f.
Let be a Pareto front and be an arbitrary solution set with , , and for all . If there is a constant and a set with and for all , then is the optimal approximation distribution with approximation ratio .
We assume that a better approximation ratio than can be achieved by choosing a different set of solutions with , , and , , and show a contradiction.
The points zi, , are the points that are worst approximated by the set X. Each zi is approximated by a factor of . Hence, in order to obtain a better approximation than the one achieved by the set X, the points zi have to be approximated within a ratio of less than . We now assume that there is a point zi for which a better approximation is achieved by the set . Getting a better approximation of zi than means that there is at least one point with , as otherwise zi is approximated within a ratio of at least .
We assume w.l.o.g that and show that there is at least one point z with that is not approximated by a factor of or that holds. To approximate all points z with by a factor of , the inequality has to hold, as otherwise is approximated within a ratio of more than by . We iterate the arguments. In order to approximate all points in , has to hold, as otherwise is not approximated within a ratio of by . Considering , either one of the points z, , is not approximated within a ratio of by , or holds, which contradicts the assumption that includes and constitutes an approximation better than .
The case can be handled symmetrically, by showing that either , or there is a point that is not approximated within a ratio of by . This completes the proof.
We will use this lemma in the rest of the paper to check whether an approximation obtained by the hypervolume indicator is optimal as well as use these ideas to identify sets of points that achieve an optimal approximation ratio.
3.2 Analytic Results for Linear Fronts
The following theorem shows that the optimal approximation distribution coincides with the optimal hypervolume distribution.
Figure 2(a) shows the optimal distribution for and .
3.3 Analytic Results for a Class of Convex Fronts
We now consider the distribution of points on a convex front maximizing the hypervolume. In contrast to the class of linear functions where an optimal approximation can be achieved by distributing the points in an equally spaced manner along the front, the class of functions considered in this section requires that the points are distributed exponentially to obtain an optimal approximation.
As already argued, we want to make sure that optimal hypervolume distribution includes and . For the class of convex fronts that we consider, this can be achieved by choosing the reference point .
The following theorem shows that the optimal approximation distribution coincides with the optimal hypervolume distribution.
We have seen that the requirements of Lemma 1 are fulfilled. Hence, an application of Lemma 1 shows that the hypervolume indicator achieves an optimal approximation ratio when the Pareto front is given by with where is any constant.
3.4 Numerical Evaluation for Fronts of Different Shapes
The analysis of the distribution of an optimal set of search points tends to be hard or is impossible for more complex functions. Hence, resorting to numerical analysis methods constitutes a possible escape from this dilemma. This section is dedicated to the numerical analysis of a larger class of functions.
Our goal is to study the optimal hypervolume distribution for different shapes of Pareto fronts and investigate how the shape of such a front influences the approximation behavior of the hypervolume indicator. We examine a family of fronts of the shape xp where is a parameter that determines the degree of the polynomial describing the Pareto front. Furthermore, we allow scaling in both dimensions.
the symmetric front and
the asymmetric front .
Note that choosing corresponds to the well-known test function DTLZ1 (Deb et al., 2002). For , the shape of the front corresponds to functions DTLZ2, DTLZ3, and DTLZ4.
Our goal is to study the optimal hypervolume distribution for our parameterized family of Pareto fronts and relate it to an optimal multiplicative approximation. Therefore, we calculate for different functions fp and
the set of points which maximizes the dominated hypervolume, and
the set of points which minimizes the multiplicative approximation ratio.
It can be observed that the relative positions of the hypervolume points stay the same in Figures 3(a) and 4(a) while the relative positions achieving an optimal approximation change with scaling (cf. Figures 3(b) and 4(b)). Hence, the relative position of the points maximizing the hypervolume is robust with respect to scaling. But as the optimal point distribution for a multiplicative approximation is dependent on the scaling, the hypervolume cannot achieve the best possible approximation quality.
In the example of Figures 3 and 4, the optimal multiplicative approximation factor for the symmetric and asymmetric case is 1.021 (Figure 3(b)) and 1.030 (Figure 4(b)), respectively, while the hypervolume only achieves an approximation of 1.025 (Figure 3(a)) and 1.038 (Figure 4(a)), respectively. Therefore in the symmetric and asymmetric case of f2 the hypervolume is not calculating the set of points with the optimal multiplicative approximation.
We have already seen that scaling the function has a high impact on the optimal approximation distribution but not on the optimal hypervolume distribution. We want to investigate this effect in greater detail. The influence of scaling the parameter of different functions is depicted in Figure 5 for . For fixed it shows the achieved approximation ratio. As expected, the larger the asymmetry () the larger the approximation ratios. For concave fronts () the approximation ratios seem to converge quickly for large enough . The approximation of f2 tends toward the golden ratio for the optimal approximation and for the optimal hypervolume. For f3 they tend toward 1.164 and 1.253, respectively. Hence, for f2 and f3, the hypervolume is never more than 8% worse than the optimal approximation. This is different for the convex fronts (). There, the ratio between the hypervolume and the optimal approximation appears divergent.
Another important question is how the choice of the population size influences the relation between an optimal approximation and the approximation achieved by an optimal hypervolume distribution. We investigate the influence of the choice of on the approximation behavior in greater detail. Figure 6 shows the achieved approximation ratios depending on the number of points . For symmetric fp’s with and , the hypervolume achieves an optimal approximation distribution for all . The same holds for the linear function f1, independent of the scaling implied by and .
For larger populations, the approximation ratio of the hypervolume distribution and the optimal distribution decreases values. However, the performance of the hypervolume measure is especially poor even for larger for convex asymmetric fronts, that is, with (e.g., Figures 6(f) and 6(g)). Our investigations show that the approximation of an optimal hypervolume distribution may differ significantly from an optimal one depending on the choice of p. An important issue is whether the front is convex or concave (Lizarraga-Lizarraga et al., 2008). The hypervolume was thought to prefer convex regions to concave regions (Zitzler and Thiele, 1998) while Auger et al. (2009) showed that the density of points only depends on the slope of the front and not on convexity or concavity. To illuminate the impact of convex versus concave further, Figure 7 shows the approximation ratios depending on p. As expected, for , the hypervolume calculates the optimal approximation. However, the influence of p is very different for the symmetric and the asymmetric test function. For , the convex () fronts are much better approximated by the hypervolume than the concave () fronts (cf. Figure 7(a)–(d)). For , this is surprisingly the other way around (cf. Figure 7(e)–(h)).
4 Influence of the Reference Point
In all previous investigations, we have not considered the impact of the reference point. To allow a fair comparison, we assumed that the optimal approximation distribution and the optimal hypervolume distribution have to include both extreme points. This is clearly not optimal when considering the optimal approximation distribution. Therefore, we relax our assumption and allow any set consisting of points and raise the question how the optimal approximation distribution looks in this case. Considering the hypervolume indicator, the question arises whether this optimal approximation distribution can be achieved by choosing a certain reference point. Therefore, the goal of this section is to examine the impact of the reference point for determining optimal approximation distributions.
For this we have to redefine parts of the notation. We mark all variables with a hat (i.e., ) to make clear that we no longer require the extreme points to be included.
4.1 Optimal Approximations
Similar to Lemma 1, the following lemma states conditions for an optimal approximation distribution which does not have to contain the extreme points.
Let be a Pareto front and a solution set with for all . If there is a ratio and a set with for all such that
for all (where ) and
for all (where )
Assume there is a different solution set with for all and approximation ratio at most .
Since there is an index i with . Consider the smallest such index. We distinguish the two cases and .
Assume . Consider the point . Since , we derive as otherwise would contradict our assumption that achieves an approximation ratio of at most . Repeating the argument times leads to , which gives . This implies that the approximation of by is which contradicts the assumption that achieves an approximation ratio of at most .
Assume . Then all points within are not -approximated. The interval is not empty since due to and f strictly monotonically decreasing. We have another contradiction.
Altogether, we get that is the unique set achieving an approximation ratio of at most and therefore an optimal approximation distribution.
The previous lemma can be used to compute the overall optimal approximation distribution of for a given function describing the Pareto front. In the following, we will use this to compare it to the optimal hypervolume distribution depending on the chosen reference point. Again we consider the class of linear fronts and the class of convex fronts given in Section 3.
4.2 Analytic Results for Linear Fronts
We first consider linear fronts. The optimal multiplicative approximation factor can be easily determined with Lemma 4 as shown in the following theorem.
The approximation factor achieved by an optimal hypervolume distribution remains to be analyzed. The impact of the reference point for the class of linear functions has been investigated by Brockhoff (2010). Using his results, we can conclude the following theorem.
Theorem 2 follows immediately from Theorem 3 of Brockhoff (2010) by translating his minimization setting into our maximization setting. Knowing the set of points that maximize the hypervolume, we can now determine the achieved approximation depending on the chosen reference point.
4.3 Analytic Results for a Class of Convex Fronts
We now consider convex fronts and investigate the overall optimal multiplicative approximation first which does not have to include the extreme points. The following theorem shows how such an optimal approximation looks like and will serve later for the comparison to an optimal hypervolume distribution in dependence of the chosen reference point.
Now, we consider the optimal hypervolume distribution depending on the choice of the reference point and compare it to the optimal multiplicative approximation.
The first case and corresponds to the previous situation where we required that both extreme points be included. The statement of Theorem 9 for this case follows immediately from Equations (3) and (4) in Section 3.3. The second case and is more involved. First note that we consider only points that have a positive contribution with respect to the given reference point. Therefore, we assume that and holds.
4.4 Numerical Evaluation for Two Specific Fronts
We now use the theoretical results of this Section 4 on the approximation factor depending on the reference point and study two specific fronts as an example.
Evolutionary algorithms have been shown to be very successful for dealing with multi-objective optimization problems. This is mainly due to the fact that such problems are hard to solve by traditional optimization methods. The use of the population of an evolutionary algorithm to approximate the Pareto front seems to be a natural choice for dealing with these problems. The use of the hypervolume indicator to measure the quality of a population in an evolutionary multi-objective algorithm has become very popular in recent years. Understanding the optimal distribution of a population consisting of individuals is a hard task and the optimization goal when using the hypervolume indicator is rather unclear. Therefore, it is a challenging task to understand the optimization goal by using the hypervolume indicator as a quality measure for a population.
We have examined how the hypervolume indicator approximates Pareto fronts of different shapes and related it to the best possible approximation ratio. We started by considering the case where we assumed that the extreme points with respect to the given objective functions have to be included in both distributions. Considering linear fronts and a class of convex fronts, we have pointed out that the hypervolume indicator gives provably the best multiplicative approximation ratio that is achievable. To gain further insights into the optimal hypervolume distribution and its relation to multiplicative approximations, we carried out numerical investigations. These investigations point out that the shape as well the scaling of the objectives heavily influences the approximation behavior of the hypervolume indicator. Examining fronts with different shapes we have shown that the approximation achieved by an optimal set of points with respect to the hypervolume may differ from the set of points achieving the best approximation ratio.
After having obtained these results, we analyzed the impact of the reference points on the hypervolume distribution and compared the multiplicative approximation ratio obtained by this indicator to the overall optimal approximation that does not have to contain the extreme points. In general, the choice of the reference point determines the approximation ratio that a hypervolume-based algorithm can achieve. However, it is hard to determine the reference point that optimizes the approximation ratio as it depends on the multi-objective problem under consideration. Our investigations show that also in this case the hypervolume distribution can lead to an overall optimal approximation when the reference point is chosen in the right way for the class of linear and convex functions under investigation. Furthermore, our results point out the impact of the choice of the reference point with respect to the approximation ratio that is achieved as shown in Figures 9 and 10.
Our results provide insights into the connection of the optimal hypervolume distribution and approximation ratio for special classes of functions describing the Pareto fronts of multi-objective problems having two objectives. For future work, it would be interesting to obtain results for broader classes of functions as well as problems having more than two objectives.
The authors thank the anonymous reviewers for their constructive comments that helped to improve the presentation of the paper. This work was supported by grants DP130104395 and DP140103400 of the Australian Research Council (ARC) and European Commissions SAGE project (618091).