Abstract
Multiobjective evolutionary algorithms are successfully applied in many real-world multiobjective optimization problems. As for many other AI methods, the theoretical understanding of these algorithms is lagging far behind their success in practice. In particular, previous theory work considers mostly easy problems that are composed of unimodal objectives.
As a first step towards a deeper understanding of how evolutionary algorithms solve multimodal multiobjective problems, we propose the problem, a bi-objective problem composed of two objectives isomorphic to the classic jump function benchmark. We prove that the simple evolutionary multiobjective optimizer (SEMO) with probability one does not compute the full Pareto front, regardless of the runtime. In contrast, for all problem sizes and all jump sizes , the global SEMO (GSEMO) covers the Pareto front in an expected number of iterations. For , we also show the tighter bound , which might be the first runtime bound for an MOEA that is tight apart from lower-order terms. We also combine the GSEMO with two approaches that showed advantages in single-objective multimodal problems. When using the GSEMO with a heavy-tailed mutation operator, the expected runtime improves by a factor of at least . When adapting the recent stagnation-detection strategy of Rajabi and Witt (2022) to the GSEMO, the expected runtime also improves by a factor of at least and surpasses the heavy-tailed GSEMO by a small polynomial factor in . Via an experimental analysis, we show that these asymptotic differences are visible already for small problem sizes: A factor-5 speed-up from heavy-tailed mutation and a factor-10 speed-up from stagnation detection can be observed already for jump size 4 and problem sizes between 10 and 50. Overall, our results show that the ideas recently developed to aid single-objective evolutionary algorithms to cope with local optima can be effectively employed also in multiobjective optimization.
1 Introduction
Real-world problems often contain multiple conflicting objectives. For such problems, a single best solution cannot be determined without additional information. One solution concept for such problems is to compute a set of solutions, each of which cannot be improved without worsening in at least one objective (Pareto optima), and then let a decision maker choose one of these solutions. With their population-based nature, multiobjective evolutionary algorithms (MOEAs) are a natural choice for this approach and have indeed been very successfully applied here (Zhou et al., 2011).
Similar to the situation for single-objective evolutionary algorithms, the rigorous theoretical understanding of MOEAs falls far behind the success of these algorithms in practical applications. The classic works in this area have defined multiobjective, especially bi-objective, counterparts of well-analyzed single-objective benchmark functions used in evolutionary computation theory and have analyzed the performance of mostly very simple MOEAs on these benchmarks.
For example, in the problems COCZ (Laumanns et al., 2002) and OneMinMax (Giel and Lehre, 2010), the two objectives are both (conflicting) variants of the classic OneMax benchmark. The classic benchmark LeadingOnes was used to construct the LOTZ (Laumanns et al., 2004b) and WLPTNO (Qian et al., 2013) problems. These multiobjective benchmark problems are among the most intensively studied (Giel, 2003; Brockhoff et al., 2008; Doerr et al., 2013, 2016; Bian et al., 2018; Huang et al., 2019; Huang and Zhou, 2020; Covantes Osuna et al., 2020; Zheng et al., 2022; Bian and Qian, 2022; Zheng and Doerr, 2022). We note that these problems are all very easy. They are composed of unimodal objectives and they have the further property that from each set of solutions a set witnessing the Pareto front can be computed by repeatedly selecting a solution from , flipping a single bit in it, adding it to , and removing dominated solutions from . They are thus relatively easy to solve, as witnessed by typical runtimes such as on the OneMax-type problems or on the LeadingOnes-inspired benchmarks. These runtimes naturally are higher than for the single-objective counterparts due to the fact the Pareto front of the multiobjective versions has size , hence times more solutions have to be computed compared to the single-objective setting.
We defer a detailed discussion of the state of the art of rigorous analyses of MOEAs to Section 3 and state here only that, to the best of our knowledge, there is not a single runtime analysis work discussing in detail how MOEAs cope with problems composed of multimodal objectives.
Our contributions. This paper aims at a deeper understanding of how MOEAs cope with multiobjective problems consisting of natural, well-analyzed, multimodal objectives. In the theory of single-objective evolutionary computation, the class of Jump functions proposed by Droste et al. (2002) is a natural and intensively used multimodal function class that has inspired many ground-breaking results. Hence, in this paper, we design a bi-objective counterpart of the Jump function with problem size and jump size , called . It consists of one Jump function with respect to the number of ones and one Jump function with respect to the number of zeros; hence both objectives are isomorphic to classic jump functions. We compute its Pareto front in Theorem 7 and, as intended, observe that unlike the easy multimodal benchmarks, the Pareto front is not connected; that is, one cannot transform any solution on the Pareto front into any other solution on the Pareto front via one-bit flips. From this observation, we easily show that for all and all , the simple evolutionary multiobjective optimizer (SEMO) cannot find the full Pareto front of the OneJumpZeroJump benchmark (Theorem 9).
At the heart of this work are mathematical runtime analyses for the global SEMO (GSEMO). We first show that the classic version of this algorithm for all and finds the Pareto front of in iterations (and fitness evaluations) in expectation (Theorem 12). We show a matching lower bound of for (Theorem 13). Here and in the remainder, the asymptotic notation only hides constants independent of and . We note that our actual bounds are more precise. In particular, for , the expected runtime is apart from lower-order terms. This result might be the first runtime analysis of an MOEA that is tight apart from lower order terms.
We then try to reuse two ideas that led to performance gains for multimodal problems in single-objective optimization. Doerr et al. (2017) proposed the fast mutation operator in which the number of flipped bits follows a power-law distribution with (negative) exponent . They showed that the EA with this operator optimizes jump functions with jump size by a factor of faster than the standard EA. We show that the GSEMO with fast mutation also witnesses such an improvement over the standard GSEMO. More precisely, we prove an upper bound of for this algorithm (Theorem 20). We also provide a result with explicit leading constant, for the first time for an algorithm using fast mutation. We are optimistic that our non-asymptotic estimates for the number of flipped bits by this operator will be useful also in other works.
The stagnation-detection strategy of Rajabi and Witt (2022) is a second way to cope with multimodality (in single-objective optimization). So far used only in conjunction with the ( EA (and there mostly with the EA), it consists of counting for how long no improvement has been found and using this information to set the mutation rate (higher mutation rates when no improvement was made for longer time). With suitable choices of the parameters of this strategy, the EA can optimize jump functions with jump size faster than the standard EA by a factor of . This is the same rough estimate as for the EA with fast mutation. A more detailed calculation, however, shows that the stagnation-detection EA is faster than the fast EA by a factor of . We note that this difference usually is not extremely large since is usually chosen small ( was recommended by Doerr et al. (2017)) and has to be small to admit reasonable runtimes (recall that the runtime dependence on is ). Nevertheless, in experiments conducted by Rajabi and Witt (2022) the stagnation-detection EA was faster than the fast EA by around a factor of three. Different from fast mutation, it is not immediately clear how to incorporate stagnation detection into the GSEMO. Being an algorithm with non-trivial parent population, one question is whether one should count unsuccessful iterations globally or individually for each parent individual. Also, clearly the parameters of the algorithm need to be adjusted to the longer waiting times for an improvement. We succeeded in defining a stagnation-detection version of the GSEMO that effectively solves the OneJumpZeroJump problem, more precisely, that computes the full Pareto front in an expected runtime of , again a factor improvement over the classic GSEMO and reducing the runtime guarantee for the heavy-tailed GSEMO by a small factor of ; see Theorem 25.
Via a small set of experiments, we demonstrate that these are not only asymptotic differences. Compared to the standard GSEMO, we observe roughly a factor-5 speed-up with heavy-tailed mutation and a factor-10 speed-up with our stagnation-detection GSEMO, and this already for jump size and problem sizes between 10 and 50.
We note that this work is an extended version of the 7-page (excluding acknowledgments and references) preliminary paper (Doerr and Zheng, 2021) and differs in the following ways. We estimate the leading constant of the asymptotic upper bound of the runtime of the GSEMO with heavy-tailed mutation for not too large jump size, which has not been shown in our preliminary version, and also not been shown before for the theoretical works with the heavy-tailed mutation. Besides, this version adds a formal discussion on the multimodality in multiobjective problems in Section 3, and contains all mathematical proofs that had to be omitted in Doerr and Zheng (2021) for reasons of space.
The remainder of this paper is organized as follows. Section 2 introduces the basic definitions that will be used throughout this work. In Section 3, we discuss the relevant previous works. The function class is defined in Section 4. Our mathematical runtime analyses are presented in Sections 5 to 8. Section 9 shows our experimental results. A conclusion is given in Section 10.
2 Basic Definitions
A multiobjective optimization problem consists of optimizing multiple objectives simultaneously. In this paper, we restrict ourselves to the maximization of bi-objective pseudo-Boolean problems . We note that only sporadic theoretical works exist that discuss the optimization of more than two objectives (Laumanns et al., 2004b; Bian et al., 2018; Huang et al., 2021).
We briefly review the standard notation in multiobjective optimization. For any two search points , we say that
weakly dominates, denoted by , if and only if and ;
dominates, denoted by , if and only if and and at least one of the inequalities is strict.
We call Pareto optimal if and only if there is no such that . All Pareto optimal solutions form the Pareto set of the problem. The set of the function values of the Pareto set is called the Pareto front.
For most multiobjective problems, the objectives are at least partially conflicting and thus there is usually not a single Pareto optimum. Since a priori it is not clear which of several incomparable Pareto optima to prefer, the most common target is to compute the Pareto front, that is, compute a set of solutions such that equals the Pareto front . This is our objective in this work as well. We note that if the Pareto front is excessively large, then one has to resort to approximating it in a suitable manner, but this will not be our problem here.
We will use and to denote the number of ones and the number of zeros of the search point . We use to denote the set for and .
3 Previous Works and a Discussion of Multimodality in Multiobjective Problems
Since this work conducts a mathematical runtime analysis of an evolutionary algorithm for discrete search spaces, we primarily discuss the state of the art inside this subarea of evolutionary computation. The mathematical runtime analysis as a rigorous means to understand evolutionary algorithms was started in the 1990s with works like Mühlenbein (1992), Bäck (1993), and Rudolph (1997). The first analysis of MOEAs were conducted by Laumanns et al. (2002, 2004b), Giel (2003), and Thierens (2003). The theory of MOEAs here often followed the successful examples given by single-objective works. For example, the multiobjective benchmarks COCZ (Laumanns et al., 2004b) and OneMinMax (Giel and Lehre, 2010) are composed of two contradicting copies of the classic OneMax problem, the problems LOTZ (Laumanns et al., 2004b) and WLPTNO (Qian et al., 2013) follow the main idea of the LeadingOnes problem (Rudolph, 1997).
Due to the higher complexity of analyzing MOEAs, many topics with interesting results on single-objective EAs received almost no attention for MOEAs. For example, the first runtime analysis of a single-objective algorithm using crossover was conducted by Jansen and Wegener (2001) and was followed up by a long sequence of important works. In contrast, only sporadic works discuss crossover in MOEAs (Neumann and Theile, 2010; Qian et al., 2013; Huang et al., 2019; Bian and Qian, 2022). Dynamic parameter settings for single-objective EAs were first discussed by Jansen and Wegener (2005) and then received a huge amount of attention (see, e.g., Doerr and Doerr, 2020), whereas the only such result for MOEAs appears to be one result by Doerr et al. (2022). How single-objective evolutionary algorithms cope with noise was discussed already by Droste (2004), but despite numerous follow-up works in single-objective optimization, there is only a single mathematical runtime analysis of an MOEA in the presence of noise (Gutjahr, 2012).
In this work, we regard another topic that is fairly well-understood in the evolutionary theory community on the single-objective side, but where essentially nothing is known on the multiobjective side. It is well-known in general that local optima are challenging for evolutionary algorithms. Already one of the first runtime analysis works, the ground-breaking paper by Droste et al. (2002), proposes a multimodal (that is, having local optima different from the global optimum) benchmark of scalable difficulty. The jump function with difficulty parameter , later called gap size, resembles the easy OneMax function except that it has a valley of very low fitness around the optimum. This valley consists of the first Hamming levels around the optimum; the search points in distance form a local optimum that is easy to reach, but hard to leave. Mutation-based algorithms need to flip the right bits to go from any point on the local optimum to a search point of better fitness (which here is the global optimum). While other multimodal benchmarks have been proposed and have received some attention, most notably trap functions (Jägersküpper and Storch, 2007) and the DeceptiveLeadingBlocks problem (Lehre and Nguyen, 2019), it is safe to say that the Jump benchmark is the by far most studied one and that these studies have led to fundamental insights on how evolutionary algorithms cope with local optima.
For reasons of space, we only discuss some of the most significant works on jump functions. The first work (Droste et al., 2002) determined the runtime of the EA on Jump functions with gap size to be , showing that local optima can lead to substantial performance losses. This bound was tightened and extended to arbitrary mutation rates by Doerr et al. (2017). This analysis led to the discovery of a heavy-tailed mutation operator and more generally, established the use of power-law distributed parameters, an idea that led to many interesting results subsequently (e.g., Antipov et al., 2021; Corus et al., 2021; Antipov, Buzdalov, et al., 2022; Dang et al., 2022). Again, an analysis of how single-trajectory heuristics optimize jump functions led to the definition of a stagnation-detection mechanism, which in a very natural manner leads to a reduction of the time an EA is stuck on a local optimum (Rajabi and Witt, 2022). Jump functions are among the first and most convincing examples that show that crossover can lead to significant speed-ups over only using mutation as variation operator (Jansen and Wegener, 2002; Dang et al., 2018; Antipov, Doerr, et al., 2022). That estimation-of-distribution algorithms and ant-colony optimizers may find it easier to cope with local optima was also shown via jump functions (Hasenöhrl and Sutton, 2018; Doerr, 2021; Benbaki et al., 2021; Witt, 2023).
In contrast to this intensive discussion of multimodal functions in single-objective evolutionary optimization, almost no such results exist in multiobjective optimization. Before discussing this aspect in detail, let us agree (for this work) on the following terminology. We say that a function is unimodal if it has a unique maximum and if all other search points have a Hamming neighbor with strictly larger fitness. We say that is multimodal if it has local optima different from the global optimum. Here, we call local optimum a set of search points such that for all and such that all neighbors of , that is, all having a Hamming neighbor in , have a smaller -value. Note that a global optimum is also a local optimum according to this definition. With these definitions, unimodal functions are those where a hillclimber can reach the optimum via one-bit flips regardless of the initial solution. In contrast, multimodal functions definitely need a single-trajectory heuristic to flip more than one bit (when started in a suitable solution) or need the acceptance of truly inferior solutions. With our definition, plateaus of constant fitness render a function not unimodal, but they do not imply multimodality. This is convenient for our purposes as we shall not deal with such plateaus, for the simple reason that they behave again differently from unimodal functions and true local optima. We note that the role of plateaus in multiobjective evolutionary optimization has been extensively studied, among others, in Brockhoff et al. (2007), Friedrich et al. (2010, 2011), Qian et al. (2016), and Li et al. (2016).
We extend the definitions of unimodality and multimodality to multiobjective problems in the natural way: A multiobjective problem is unimodal if all its objectives are unimodal functions. Many such problems are easy, and even simple MOEAs employing one-bit flips as only variation operator—for example, the SEMO—can find the full Pareto front. Examples of such problems include the well-studied benchmarks inspired by the unimodal single-objective benchmarks OneMax and LeadingOnes. However, somewhat surprisingly, it is not true that the SEMO can cover the full Pareto front of all unimodal multiobjective problems as we now show.
There exists a unimodal multiobjective problem whose full Pareto front cannot be covered by the SEMO in an arbitrarily long time with a positive probability.
Generation . | Population . |
---|---|
0 | (100, (4,0)) |
1 | (100, (4,0)), (000, (0,1)) |
2 | (100, (4,0)), (010, (0,2)) |
3 | (101, (5,0)), (010, (0,2)) |
4 | (101, (5,0)), (011, (1,5)) |
Generation . | Population . |
---|---|
0 | (100, (4,0)) |
1 | (100, (4,0)), (000, (0,1)) |
2 | (100, (4,0)), (010, (0,2)) |
3 | (101, (5,0)), (010, (0,2)) |
4 | (101, (5,0)), (011, (1,5)) |
We now show that once the optimization process has reached the state described in the last line of Table 1, the Pareto front point (3, 3) can no longer be reached. Note that 110 is the only Pareto optimum for (3, 3), and its Hamming neighbors are 010, 100, and 111. Since , and , we know that 010 and 111 are strictly dominated by 011, and that 100 is strictly dominated by 101. Hence, even if any of them is generated, it cannot survive to the next generation when the current population is . Therefore, 110 cannot be generated (and (3, 3) cannot be reached) via the SEMO with the one-bit mutation.
We say that a multiobjective problem is multimodal if at least one objective is multimodal. We note that this does not automatically imply that the SEMO cannot find the full Pareto front, in fact, as the following proof of Lemma 2 shows, there are multi-objective problems consisting only of multimodal objectives such that the SEMO regardless of the initial solution computes the full Pareto front very efficiently. Clearly, our interest in this work are multimodal multiobjective problems that are harder to optimize. To avoid misunderstandings, we note that a second notion of multimodality exists. In multimodal optimization, the target is to find all or a diverse set of solutions having some optimality property (Preuss, 2015; Liang et al., 2016; Tanabe and Ishibuchi, 2020). This notion is unrelated to our notion of multimodality.
There is a bi-objective problem such that both objectives and have several local optima, but the SEMO algorithm, regardless of its initial solution, finds the full Pareto front in time .
We note that any solution is a Pareto optimum of . Indeed, we have for all . Consequently, if for some , then necessarily and thus does not dominate .
Since all solutions are Pareto optima and since has the same classes of search points of equal fitness, the optimization process of the SEMO on is identical to the one on OneMax. With Giel and Lehre's (2010) Theorem 3 and its proof, our claim is proven.
Having made precise what we understand as multimodal multiobjective problem, we review the known results on such problems.
To the best of our knowledge, only very few runtime analysis works on multimodal multiobjective problems exist, and none of them are focused on the possible difficulties arising from the multimodality. Roughly speaking, these can be divided into works that construct artificial problems to demonstrate, often in an extreme way, a particular strength or weakness of an algorithm, and works on classic combinatorial optimization problems that happen to be multimodal. The first set of works gave rise to the multimodal problems ZPLG, SPG, and Dec-obj-MOP. The first two of these were proposed by Qian et al. (2016) and are defined as follows.
In order to investigate the effect of mixing low-level heuristics, Qian et al. (2016) showed that mixing fair selection with respect to the decision space and the objective space is beneficial for ZPLG, and that mixing 1-bit and 2-bit mutation is efficient for SPG.
In the first theoretical study of decomposition-based MOEAs, Li et al. (2016) analyzed how the MOEA/D solves several multiobjective problems, among them the following multimodal one.
We note that the three problems ZPLG, SPG, and Dec-obj-MOP all appear slightly artificial. Also, they have only one multimodal objective. The first objective of ZPLG has local optima on , the first objective of SPG has local optima on and , and the second objective of Dec-obj-MOP has two local optima on .
The second type of works dealing with multimodal multiobjective problems are those which regard combinatorial optimization problems (e.g., Laumanns et al., 2004a; Kumar and Banerjee, 2006; Neumann, 2007; Neumann and Reichel, 2008; Greiner, 2009; Horoba, 2009; Qian et al., 2013, 2015, 2018; Feng et al., 2019; Qian et al., 2020; Roostapour et al., 2020, 2022). Combinatorial optimization problems almost always are multimodal, simply because a simple cardinality constraint suffices to render a problem multimodal. We note for some such problems the multimodality is not very strong; for example, for minimum spanning tree problems, flipping two bits suffices to leave a local optimum. Overall, all of these works are relatively problem-specific and we could not distill from these any general insights on how MOEAs cope with local optima.
4 The Problem
To study via mathematical means how MOEAs cope with multimodality, we now define and analyze a class of bi-objective functions of scalable difficulty. As mentioned above, this is strongly influenced by the single-objective Jump function class proposed in Droste et al. (2002), which is intensively used in the theory of single-objective evolutionary computation and which gave rise to many interesting results including that larger mutation rates help in the optimization of multimodal functions (Doerr et al., 2017), that crossover can help to cope with multimodality (Jansen and Wegener, 2002; Dang et al., 2018), and that estimation-of-distribution algorithms and the GA can significantly outperform classic evolutionary algorithms on multimodal problems (Hasenöhrl and Sutton, 2018; Doerr, 2021; Antipov and Doerr, 2020; Antipov, Doerr, et al., 2022).
We recall that for all and , the jump function with problem size and gap size is defined by , if and otherwise. Hence, for , this function has a valley of low fitness around its optimum , which can be crossed only by flipping specific bits (or accepting solutions with very low fitness). We define the function as a bi-objective counterpart of the function .
The Pareto set of the function is , and the Pareto front is .
We firstly prove that for any given , holds for all . If , then and , and at least one of two inequalities is strict. Without loss of generality, let . From , we have , hence . Since , we know . Then we have since .
If , then or . For , we have . Then . For , we have since from .
If , then since .
That is, we obtain which is contradictory to . Hence, is the Pareto set.
We secondly prove that for any given , there is such that . Since , we have .
If , we have and . Then we could take with , and have and . Then .
If , we have and . Then we could take with , and have and . Then .
Hence, is the Pareto front and this theorem is proven.
Theorem 7 implies that for , the Pareto front consists only of the two extremal solutions and . This appears to be a relatively special situation that is not very representative for multimodal multiobjective problems. For this reason, in the remainder, we shall only regard the case that . In this case, again from Theorem 7, we easily obtain in the following corollary a general upper bound on the size of any set of solutions without pair-wise weak domination, and thus also on the size of the population in the algorithms discussed in this work.
Let . Consider any set of solutions such that with respect to for all with . Then .
We first note that any non-extremal solution that lies on the Pareto front dominates any solution not on the Pareto front. Let . Therefore, either is a subset of the Pareto front or it contains no point on the Pareto front. In the first case, Theorem 7 immediately shows our claim. Assume that contains no point on the Pareto front. Let with . Assume without loss of generality that . Then . Hence contains at most one solution with . A symmetric argument shows that contains at most one solution with . Note that dominates any with and dominates any with . Hence .
5 SEMO Cannot Optimize Functions
The simple evolutionary multiobjective optimizer (SEMO), proposed by Laumanns et al. (2002), is a well-studied basic benchmark algorithm in multiobjective evolutionary theory (Qian et al., 2013; Li et al., 2016). It is a multiobjective analogue of the randomized local search (RLS) algorithm, which starts with a random individual and tries to improve it by repeatedly flipping a single random bit and accepting the better of parent and this offspring. As a multiobjective algorithm trying to compute the full Pareto front, the SEMO naturally has to work with a non-trivial population. This is initialized with a single random individual. In each iteration, a random parent is chosen from the population. It generates an offspring by flipping a random bit. The offspring enters the population if it is not weakly dominated by some individual already in the population. In this case, any individual dominated by it is removed from the population. The details of SEMO are shown in Algorithm 1. We note that more recent works such as Friedrich et al. (2010) define the (G)SEMO minimally different, namely so that in case of equal objective values the offspring enters the population and replaces the existing individual with this objective value. Preferring offspring in case of equal objective values allows EAs to traverse plateaus of equal fitness and this is generally preferred over keeping the first solution with a certain objective value. For our purposes, this difference is not important since all points with equal fitness behave identically.
In the following Theorem 9, we show that the SEMO cannot cope with the multimodality of the function. Even with infinite time, it does not find the full Pareto front of any function. We note that this result is not very surprising and we prove it rather for reasons of completeness. It is well-known in single-objective optimization that the randomized local search heuristic with positive (and often very high) probability fails on multimodal problems (the only published reference we found is Doerr et al., 2008, Theorem 9, but surely this was known before). It is for the same reason that the SEMO algorithm cannot optimize the function.
For all with , the SEMO cannot optimize the function.
Note that any individual with ones has the function value . Since a search point with ones has the function value and a search point with zeros has the function value , we know that any newly generated search point with number of ones in will lead to the removal from the current population of all individuals with number of zeros or ones in and will also prevent any such individual from entering the population in the future.
Without loss of generality suppose that the initial individual has at most ones. We show that can never enter the population. Assume first that never in this run of the SEMO is a search point with ones generated. Since each offspring has a Hamming distance of one from its parent, the statement “Never is an individual with or more ones generated” holds in each generation. In particular, the search point is never generated.
Now, assume that at some time for the first time an individual with ones is generated. As in the previous paragraph, up to this point no search point with more than ones is generated. In particular, the point is not generated so far. Since lies on the Pareto front, it enters the population. From this point on, by our initial considerations, no individual with to ones can enter the population. In particular, at no time can a parent with ones be selected, which is a necessary precondition for generating the point . Consequently, the point will never be generated.
6 Runtime Analysis for the GSEMO
As we have shown in the previous section, to deal with the OneJumpZeroJump benchmark, a mutation-based algorithm needs to be able to flip more than one bit. The global SEMO (GSEMO) proposed by Giel (2003) is a well-analyzed MOEA that has this ability. Generalized from the EA algorithm, it uses the standard bit-wise mutation; that is, each bit is flipped independently with the same probability of, usually, . The details are shown in Algorithm 2.
Theorem 12 will show that GSEMO can find the Pareto front. To make the main proof clearer, some propositions are extracted as the following lemmas. Lemma 10 shows that once a solution in the inner part of the Pareto front is part of the population, it takes iterations until the whole inner part of the Pareto front is covered.
Consider optimizing the function via the GSEMO. Let . Assume that at some time the population contains an individual with . Then the expected number of interations until all , , are covered by the population is at most iterations.
From a population covering the whole inner part of the Pareto front, the two missing extremal points of the front can be found in roughly iterations, as the following Lemma 11 shows.
Consider the GSEMO optimizing the function. Assume that at some time, all , , are covered by the current population. Then, the two missing points , of the Pareto front will be found in an expected number of at most additional generations.
Now, we establish our main result for the GSEMO.
Then from Lemma 10, we know that all , , in the Pareto front will be found in an expected number of at most iterations. After that, from Lemma 11, the points and in the Pareto front will be found in an expected number of at most additional iterations. Hence, the expected iterations to find the Pareto front is at most .
Finally, we show that this bound is very tight. For convenience, we consider only the case . In this case, it is sufficiently hard to generate an extremal point of the Pareto front from the inner part that we do not need a careful analysis of the population dynamics in the short initial phase in which the inner part of the Pareto front is detected, which is not the main theme of this work. That said, we are very optimistic that it would pose no greater difficulties to show that with high probability at the first time when a solution with less than or more than ones enters the population, the population already contains individuals. Therefore, from this point on, only every iterations an outer-most point will be chosen as parent, and this should suffice to prove an lower bound also for and .
We first prove that with high probability the inner region of the Pareto front will be fully covered before or is generated and then use this to bound the time until the two extremal points of the Pareto front are found.
7 GSEMO with Heavy-Tailed Mutation
In the previous section, we have shown that the GSEMO can optimize our multimodal optimization problem, but similar to the single-objective world (say, the optimization of Jump functions via simple evolutionary algorithms, Droste et al., 2002), the runtime increases significantly with the distance that a solution on the Pareto front can have from all other solutions on the front. As has been discussed in Doerr et al. (2017), increasing the mutation rate can improve the time it takes to jump over such gaps. However, this work also showed that a deviation from the optimal mutation rate can be costly: A small constant-factor deviation from the optimal rate leads to a performance loss exponential in . For this reason, a heavy-tailed mutation operator was proposed. Compared to using the optimal (usually unknown) rate, it loses only a small polynomial factor (in ) in performance.
We now equip the GSEMO with the heavy-tailed mutation operator from Doerr et al. (2017) and observe similar advantages. We start by introducing the heavy-tailed mutation operator and giving a first non-asymptotic estimate for the probabilities to flip a certain number of bits.
7.1 Heavy-Tailed Mutation Operator
The following capped power-law distribution will be used in the definition of the heavy-tailed mutation operator.
The heavy-tailed mutation operator proposed by Doerr et al. (2017), in the remainder denoted by , in each application independently samples a number from the power-law distribution and then uses standard bit-wise mutation with mutation rate , that is, flips each bit independently with probability . Doerr et al. (2017) shows the two following properties of .
In order to obtain an understanding of the leading constants when generating a solution with a not too large Hamming distance, we prove the following estimations, which have not been shown before, and can also be applied to other previous works about the theoretical analysis of the heavy-tailed mutation. We suspect tighter estimates of the leading constants exist but leave it as interesting future work.
The proof of Lemma 16 will use an elementary estimate for when . Since we have not found a good reference for it and since we believe that it might be useful for other runtime analyses, we formulate it as a separate lemma and give a formal proof.
Let and . Then .
Now we prove Lemma 16.
Equipping the standard GSEMO with this mutation operator gives Algorithm 3, which we call GSEMO-HTM.
7.2 GSEMO-HTM on
We now analyze the runtime of the GSEMO-HTM on . We start with an estimate for the time it takes to find the inner part of the Pareto front.
Consider the GSEMO-HTM optimizing the function. Let . If in a certain iteration, the point is covered by the current population, then all , , in the Pareto front will be found in an expected number of at most iterations.
We now estimate the time to find the two extreme Pareto front points after the coverage of the inner part of the Pareto front.
Consider the GSEMO-HTM optimizing the function. Assume that at some time, all , , are covered by the current population. Then, the two missing , , in the Pareto front will be found in an expected number of at most additional generations.
Now, we are ready to show the runtime for the GSEMO-HTM on .
The expected runtime of the GSEMO-HTM optimizing the function is at most .
Then from Lemma 18, we know that all for in the Pareto front will be found in at most iterations in expectation. After that, from Lemma 19, the points and enter the Pareto front in at most iterations in expectation.
Let and . Then the expected runtime of the GSEMO-HTM optimizing the function is at most .
8 GSEMO with Stagnation Detection
In this section, we discuss the non-trivial question of how to adapt the stagnation-detection strategy proposed by Rajabi and Witt (2022) to multiobjective optimization, which increases the mutation rate with the time that no improvement has been found. We define a stagnation-detection variant of the GSEMO that has a slightly better asymptotic performance on OneJumpZeroJump than the GSEMO with heavy-tailed mutation. In contrast to this positive result on OneJumpZeroJump, we speculate that this algorithm may have difficulties with plateaus of constant fitness.
8.1 The Stagnation-Detection Strategy of Rajabi and Witt
Rajabi and Witt (2022) proposed the following strategy to adjust the mutation rate during the run of the EA. We recall that the EA is a very elementary EA working with a parent and offspring population size of one; that is, it generates in each iteration one offspring from the unique parent via mutation and accepts it if it is at least as good as the parent. The classic mutation operator for this algorithm is standard bit-wise mutation with mutation rate , that is, the offspring is generated by flipping each bit of the parent independently with probability .
The main idea of the stagnation-detection approach is as follows. Assume that the EA for a longer time, say at least iterations, has not accepted any new solution. Then, with high probability, it has generated (and rejected) all Hamming neighbors of the parent. Consequently, there is no use to generate these solutions again and the algorithm should better concentrate on solutions further away from the parent. This can be achieved by increasing the mutation rate. For example, with a mutation rate of the rate of Hamming neighbors produced is already significantly reduced; however, Hamming neighbors can still be generated, which is important in case we were unlucky so far and missed one of them.
Rajabi and Witt (2022) showed that the EA with this strategy optimizes with in time and . In particular, for , a tight (apart from constant factors independent of and ) bound of is obtained. This is faster than the runtime of proven by Doerr et al. (2017) for the EA with heavy-tailed mutation with power-law exponent by a factor of . For the recommended choice , this factor is .
8.2 Adaptation of the Stagnation-Detection Strategy to Multiobjective Optimization
The EA being an algorithm without a real population, it is clear that certain adaptations are necessary to use the stagnation-detection strategy in multiobjective optimization.
8.2.1 Global or Individual Failure Counters
The first question is how to count the number of unsuccessful iterations. The following two obvious alternatives exist.
Individual counters: From the basic idea of the stagnation-detection strategy, the most natural solution is to equip each individual with its own counter. Whenever an individual is chosen as parent in the GSEMO, its counter is increased by one. New solutions (but see the following subsection for an important technicality of what “new” shall mean) entering the population (as well as the random initial individual) start with a counter value of zero.
A global counter: Algorithmically simpler is the approach to use only one global counter. This counter is increased in each iteration. When a new solution enters the population, the global counter is reset to zero.
We suspect that for many problems, both ways of counting give similar results. The global counter appears to be wasteful in the sense that when a new individual enters the population, parents that are contained in the population for a long time also re-start generating offspring with mutation rate despite the fact that they have, with very high probability, already generated as offspring all solutions close by. On the other hand, often these “old individuals” do not generate solutions that enter the population anyway, so that an optimized choice of the mutation rate is less important.
For the OneJumpZeroJump problem, it is quite clear that this second effect is dominant. A typical run starts with some individual in the inner region of the Pareto front. In a relatively short time, the whole middle region is covered, and for this it suffices that relatively recent solutions generate a suitable Hamming neighbor as offspring. The runtime is dominated by the time to find the two extremal solutions and this will almost always happen from the closest parent in the middle region of the front (regardless of whether individual counters or a global counter is used). For this reason, we analyze in the following the simpler approach using a global counter.
8.2.2 Dealing with Indifferent Solutions
One question that becomes critical when using stagnation detection is how to deal with indifferent solutions, that is, which solution to put or keep in the population in the case that an offspring has the same (multiobjective) fitness as an individual already in the population. Since , we have and , that is, both solutions do an equally good job in dominating others and thus in approximating the Pareto front. In early works (e.g., Laumanns et al., 2002) proposing the SEMO algorithm, such later generated indifferent solutions do not enter the population. This is partially justified by the fact that in many of the problems regarded in these works, search points with equal fitness are fully equivalent for the future run of the algorithm. We note that our problem also has this property, so all results presented so far are valid regardless of how indifferent solutions are treated.
When non-equivalent search points with equal fitness exist, it is less obvious how to deal with indifferent solutions. In particular, it is clear that larger plateaus of constant fitness can be traversed much easier when a new indifferent solution is accepted, as this allows imitation of a random walk behavior on the plateau (Brockhoff et al., 2007). For that reason, and in analogy to single-objective optimization (Jansen and Wegener, 2001), it seems generally more appropriate to let a new indifferent solution enter the population, and this is also what most of the later works on the SEMO and GSEMO algorithm do (Friedrich et al., 2010, 2011; Qian et al., 2016; Li et al., 2016; Bian et al., 2018; Covantes Osuna et al., 2020).
Unfortunately, as mentioned in Section 1, it is not so clear how to handle indifferent solutions together with stagnation detection. In principle, when a new solution enters the population, the failure counter has to be reset to zero to reset the mutation rate to . Otherwise, the continued use of a high mutation rate would prohibit finding good solutions in the direct neighborhood of the new solution. However, the acceptance of indifferent solutions can also lead to unwanted resets. For the OneJumpZeroJump problem, for example, it is easy to see by mathematical means that in a typical run, it will happen very frequently that an indifferent solution is generated. If this enters the population with a reset of a global failure counter (or an individual counter), then the regular resets will prevent the counters from reaching interesting values. In a quick experiment for , , and a global counter, the largest counter value ever reached in this run of over 500,000,000 iterations was 5. Consequently, this SD-GSEMO was far from ever increasing the mutation rate and just imitated the classic GSEMO.
For this reason, in this work we regard the GSEMO with stagnation detection only in the variant that does not accept indifferent solutions, and we take note of the fact that thus our positive results on the stagnation-detection mechanism will not take over to problems with non-trivial plateaus of constant fitness.
8.2.3 Adjusting the Self-Adjustment
8.2.4 The GSEMO with Stagnation Detection: SD-GSEMO
Putting the design choices discussed so far together, we obtain the following variant of the GSEMO, called SD-GSEMO. Its pseudocode is shown in Algorithm 4.
8.3 Runtime Analysis of the SD-GSEMO on
We now analyze the runtime of the SD-GSEMO on the OneJumpZeroJump function class. This will show that its expected runtime is by at least a factor of smaller than the one of the heavy-tailed GSEMO (which was a factor of smaller than the one of the standard GSEMO).
We extract one frequently used calculation into Lemma 23 to make our main proof clearer. The proof of Lemma 23 uses the following lemma.
Let and . Then .
As for all algorithms discussed in this work, the most costly part of the optimization process is finding the two extremal points of the Pareto front. Borrowing many ideas from the proof of Rajabi and Witt (2020, Theorem 3.2), we show the following estimate for the time to find these extremal points when the inner part of the Pareto front is already covered.
Consider the SD-GSEMO optimizing the function. Consider the first time that all , , are covered by the current population. Then the two possibly missing elements , , of the Pareto front will be found in an expected number of at most additional generations.
Now, we are ready to show the runtime for the SD-GSEMO on the function.
We first consider the expected time until we have all , , in the current Pareto front. Our main argument is the following adaptation of the analysis of this period for the standard GSEMO in the proof of Theorem 12. The main argument there was that the time spent in this period can be estimated by a sum of at most waiting times for the generation of a particular Hamming neighbor of a particular member of the population. The same argument, naturally, is valid for the SD-GSEMO, and fortunately, we can also reuse the computation of these waiting times. Consider the time such a subperiod now takes with the SD-GSEMO. If we condition on the subperiod ending before the rate is increased above the initial value of , then the expected time is at most the expected time of this subperiod for the standard GSEMO (this is the same argument as used in the proof of Lemma 24). Consequently, the total time of this period for the SD-GSEMO is at most the corresponding time for the standard GSEMO ( as determined in the proof of Theorem 12) plus the additional time caused by subperiods using rates with .
Together with the time to find the remaining two elements in the Pareto front in Lemma 24, this theorem is proven.
9 Experiments
To understand the performance of the algorithms discussed in this work for concrete problem sizes (for which an asymptotic mathematical analysis cannot give definite answers), we now conduct a simple experimental analysis. Since the SEMO cannot find the Pareto front, we did not include it in this investigation. We did include the variant of the SD-GSEMO, denoted by SD-GSEMO-Ind, in which each individual has its own failure counter (see the discussion in Section 8). Our experimental settings are the same for all algorithms.
: jump size and problem size .
Maximal iterations for the termination: . This number of iterations was reached in none of the experiments, i.e., the results reported are the true runtimes until the full Pareto front was found.
as suggested in Doerr et al. (2017) for the power-law distribution in GSEMO-HTM.
for SD-GSEMO and SD-GSEMO-Ind as suggested in Rajabi and Witt (2020).
20 independent runs for each setting.
We clearly see that these curves, in terms of shape and, where known, in terms of leading constants, match well the estimates of our theoretical runtime results. We also see, as predicted by informal considerations, the similarity of the performance of the SD-GSEMO and the SD-GSEMO-Ind. Finally, our experiments show that the different runtime behaviors are already visible for moderate (and thus realistic) problem sizes and not only in the asymptotic sense in which they were proven. In particular, we observe a performance improvement by a factor of (roughly) 5 through the use of heavy-tailed mutation and by a factor of (roughly) 10 with the stagnation-detection strategy.
10 Conclusion and Outlook
Noting that, in contrast to single-objective evolutionary computation, no broadly accepted multimodal benchmarks exist in the theory of MOEAs, we proposed the OneJumpZeroJump benchmark, which is a multiobjective analogue of the single-objective jump problem.
We use this benchmark to show that several insights from single-objective EA theory extend to the multiobjective world. Naturally, algorithms using 1-bit mutation such as the SEMO cannot optimize our benchmark, that is, cannot compute a population that covers the full Pareto front. In contrast, for all problem sizes and jump sizes , the GSEMO covers the Pareto front in iterations in expectation. Heavy-tailed mutation rates and the stagnation-detection mechanism of Rajabi and Witt (2022) give a runtime improvement over the standard GSEMO by a factor of , with the stagnation-detection GSEMO slightly ahead by a small polynomial factor in . These are the same gains as observed for the single-objective Jump benchmark with gap size . Our experiments confirm this performance ranking already for moderate problem sizes, with the stagnation-detection GSEMO slightly further ahead than what the asymptotically small advantage suggests. On the downside, adapting the stagnation-detection mechanism to MOEAs requires making several design choices, among which the question of how to treat indifferent solutions could be difficult for problems having larger plateaus of constant fitness.
Overall, this work suggests that the recently developed ideas to cope with multimodality in single-objective evolutionary optimization can be effective in multiobjective optimization as well. In this first work in this direction, we only concentrated on mutation-based algorithms. The theory of evolutionary computation has also observed that crossover and estimation-of-distribution algorithms can be helpful in multimodal optimization. Investigating to what degree these results extend into multiobjective optimization is clearly an interesting direction for future research.
Also, we only covered very simple MOEAs in this work. Analyzing more complex MOEAs amenable to mathematical runtime anlyses, such as the decomposition-based MOEA/D (Zhang and Li, 2007; Li et al., 2016) or the NSGA-II (Deb et al., 2002; Zheng et al., 2022) would be highly interesting. For the MOEA/D, this would most likely require an adaptation of our benchmark problem. Since the difficult-to-find extremal points of the front are just the solutions of the single-objective sub-problems, and thus the two problems that naturally are part of the set of subproblems regarded by the MOEA/D, this algorithm might have an unfair advantage on the OneJumpZeroJump problem.
Work conducted after this research: After the publication of Doerr and Zheng (2021), the following works have used the OneJumpZeroJump benchmark. In Doerr and Qu (2022), the performance of the NSGA-II, both with standard-bit mutation and with heavy-tailed mutation, on the OneJumpZeroJump benchmark was analyzed. When the population size is at least four times the size of the Pareto front and , then the standard NSGA-II finds the Pareto front in time . Hence the NSGA-II and the GSEMO satisfy the same asymptotic runtime guarantees when is chosen asymptotically optimal (that is, ). With heavy-tailed mutation, again a improvement is obtained. In this work, an NSGA-II was regarded that does not use crossover, but it is clear that the same upper bounds could have been shown when crossover with rate less than one was used. Doerr and Qu (2023b) show that the upper bound of for the NSGA-II with standard-bit mutation is asymptotically tight in many cases: When , the runtime is . Consequently, in this regime the NSGA-II does not profit from larger population sizes (not even when regarding the parallel runtime, that is, the number of iterations). Doerr and Qu (2023a) prove that uniform crossover can give super-constant speed-ups for the NSGA-II optimizing OneJumpZeroJump functions. This result is not totally surprising given that Dang et al. (2018) have shown that crossover speeds up the EA on single-objective jump functions, but the reason for the speed-ups shown in Doerr and Qu (2023a) is different (and, in fact, can be translated to different, and sometimes stronger, speed-ups for the EA on jump functions). These recent results show that a jump-like benchmark, as proposed in this work, is useful in multiobjective evolutionary computation.
Acknowledgments
This work was supported by a public grant as part of the Investissement d'avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH.
This work was also supported by the Science, Technology and Innovation Commission of Shenzhen Municipality (Grant No. GXWD20220818191018001), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110177), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), and Shenzhen Science and Technology Program (Grant No. KQTD2016112514355531).
References
Author notes
*A preliminary short version was published in the Proceedings of AAAI 2021 (Doerr and Zheng, 2021).
†This work was partially done while Weijie Zheng was with Southern University of Science and Technology, Shenzhen, China.