Multiobjective evolutionary algorithms are successfully applied in many real-world multiobjective optimization problems. As for many other AI methods, the theoretical understanding of these algorithms is lagging far behind their success in practice. In particular, previous theory work considers mostly easy problems that are composed of unimodal objectives.

As a first step towards a deeper understanding of how evolutionary algorithms solve multimodal multiobjective problems, we propose the OneJumpZeroJump problem, a bi-objective problem composed of two objectives isomorphic to the classic jump function benchmark. We prove that the simple evolutionary multiobjective optimizer (SEMO) with probability one does not compute the full Pareto front, regardless of the runtime. In contrast, for all problem sizes n and all jump sizes k[4..n2-1], the global SEMO (GSEMO) covers the Pareto front in an expected number of Θ((n-2k)nk) iterations. For k=o(n), we also show the tighter bound 32enk+1±o(nk+1), which might be the first runtime bound for an MOEA that is tight apart from lower-order terms. We also combine the GSEMO with two approaches that showed advantages in single-objective multimodal problems. When using the GSEMO with a heavy-tailed mutation operator, the expected runtime improves by a factor of at least kΩ(k). When adapting the recent stagnation-detection strategy of Rajabi and Witt (2022) to the GSEMO, the expected runtime also improves by a factor of at least kΩ(k) and surpasses the heavy-tailed GSEMO by a small polynomial factor in k. Via an experimental analysis, we show that these asymptotic differences are visible already for small problem sizes: A factor-5 speed-up from heavy-tailed mutation and a factor-10 speed-up from stagnation detection can be observed already for jump size 4 and problem sizes between 10 and 50. Overall, our results show that the ideas recently developed to aid single-objective evolutionary algorithms to cope with local optima can be effectively employed also in multiobjective optimization.

Real-world problems often contain multiple conflicting objectives. For such problems, a single best solution cannot be determined without additional information. One solution concept for such problems is to compute a set of solutions, each of which cannot be improved without worsening in at least one objective (Pareto optima), and then let a decision maker choose one of these solutions. With their population-based nature, multiobjective evolutionary algorithms (MOEAs) are a natural choice for this approach and have indeed been very successfully applied here (Zhou et al., 2011).

Similar to the situation for single-objective evolutionary algorithms, the rigorous theoretical understanding of MOEAs falls far behind the success of these algorithms in practical applications. The classic works in this area have defined multiobjective, especially bi-objective, counterparts of well-analyzed single-objective benchmark functions used in evolutionary computation theory and have analyzed the performance of mostly very simple MOEAs on these benchmarks.

For example, in the problems COCZ (Laumanns et al., 2002) and OneMinMax (Giel and Lehre, 2010), the two objectives are both (conflicting) variants of the classic OneMax benchmark. The classic benchmark LeadingOnes was used to construct the LOTZ (Laumanns et al., 2004b) and WLPTNO (Qian et al., 2013) problems. These multiobjective benchmark problems are among the most intensively studied (Giel, 2003; Brockhoff et al., 2008; Doerr et al., 2013, 2016; Bian et al., 2018; Huang et al., 2019; Huang and Zhou, 2020; Covantes Osuna et al., 2020; Zheng et al., 2022; Bian and Qian, 2022; Zheng and Doerr, 2022). We note that these problems are all very easy. They are composed of unimodal objectives and they have the further property that from each set of solutions P a set P' witnessing the Pareto front can be computed by repeatedly selecting a solution from P, flipping a single bit in it, adding it to P, and removing dominated solutions from P. They are thus relatively easy to solve, as witnessed by typical runtimes such as O(n2logn) on the OneMax-type problems or O(n3) on the LeadingOnes-inspired benchmarks. These runtimes naturally are higher than for the single-objective counterparts due to the fact the Pareto front of the multiobjective versions has size Θ(n), hence Θ(n) times more solutions have to be computed compared to the single-objective setting.

We defer a detailed discussion of the state of the art of rigorous analyses of MOEAs to Section 3 and state here only that, to the best of our knowledge, there is not a single runtime analysis work discussing in detail how MOEAs cope with problems composed of multimodal objectives.

Our contributions. This paper aims at a deeper understanding of how MOEAs cope with multiobjective problems consisting of natural, well-analyzed, multimodal objectives. In the theory of single-objective evolutionary computation, the class of Jump functions proposed by Droste et al. (2002) is a natural and intensively used multimodal function class that has inspired many ground-breaking results. Hence, in this paper, we design a bi-objective counterpart of the Jump function with problem size n and jump size k, called OneJumpZeroJumpn,k. It consists of one Jump function with respect to the number of ones and one Jump function with respect to the number of zeros; hence both objectives are isomorphic to classic jump functions. We compute its Pareto front in Theorem 7 and, as intended, observe that unlike the easy multimodal benchmarks, the Pareto front is not connected; that is, one cannot transform any solution on the Pareto front into any other solution on the Pareto front via one-bit flips. From this observation, we easily show that for all nN and all k[2..n2], the simple evolutionary multiobjective optimizer (SEMO) cannot find the full Pareto front of the OneJumpZeroJump benchmark (Theorem 9).

At the heart of this work are mathematical runtime analyses for the global SEMO (GSEMO). We first show that the classic version of this algorithm for all n and k[2..n2] finds the Pareto front of OneJumpZeroJumpn,k in O((n-2k+3)nk) iterations (and fitness evaluations) in expectation (Theorem 12). We show a matching lower bound of Ω((n-2k)nk) for k[4..n2-1] (Theorem 13). Here and in the remainder, the asymptotic notation only hides constants independent of n and k. We note that our actual bounds are more precise. In particular, for 4k=o(n), the expected runtime is 32enk+1 apart from lower-order terms. This result might be the first runtime analysis of an MOEA that is tight apart from lower order terms.

We then try to reuse two ideas that led to performance gains for multimodal problems in single-objective optimization. Doerr et al. (2017) proposed the fast mutation operator in which the number of flipped bits follows a power-law distribution with (negative) exponent β>1. They showed that the (1+1) EA with this operator optimizes jump functions with jump size k by a factor of kΩ(k) faster than the standard (1+1) EA. We show that the GSEMO with fast mutation also witnesses such an improvement over the standard GSEMO. More precisely, we prove an upper bound of O((n-2k)(en)k/kk+0.5-β) for this algorithm (Theorem 20). We also provide a result with explicit leading constant, for the first time for an algorithm using fast mutation. We are optimistic that our non-asymptotic estimates for the number of flipped bits by this operator will be useful also in other works.

The stagnation-detection strategy of Rajabi and Witt (2022) is a second way to cope with multimodality (in single-objective optimization). So far used only in conjunction with the (1+,λ) EA (and there mostly with the (1+1) EA), it consists of counting for how long no improvement has been found and using this information to set the mutation rate (higher mutation rates when no improvement was made for longer time). With suitable choices of the parameters of this strategy, the (1+1) EA can optimize jump functions with jump size k faster than the standard (1+1) EA by a factor of Ω(kΩ(k)). This is the same rough estimate as for the (1+1) EA with fast mutation. A more detailed calculation, however, shows that the stagnation-detection (1+1) EA is faster than the fast (1+1) EA by a factor of Θ(kβ-0.5). We note that this difference usually is not extremely large since β is usually chosen small (β=1.5 was recommended by Doerr et al. (2017)) and k has to be small to admit reasonable runtimes (recall that the runtime dependence on n is Ω(nk)). Nevertheless, in experiments conducted by Rajabi and Witt (2022) the stagnation-detection (1+1) EA was faster than the fast (1+1) EA by around a factor of three. Different from fast mutation, it is not immediately clear how to incorporate stagnation detection into the GSEMO. Being an algorithm with non-trivial parent population, one question is whether one should count unsuccessful iterations globally or individually for each parent individual. Also, clearly the parameters of the algorithm need to be adjusted to the longer waiting times for an improvement. We succeeded in defining a stagnation-detection version of the GSEMO that effectively solves the OneJumpZeroJump problem, more precisely, that computes the full Pareto front in an expected runtime of O((n-2k)(en)k/kk), again a kΩ(k) factor improvement over the classic GSEMO and reducing the runtime guarantee for the heavy-tailed GSEMO by a small factor of Ω(kβ-0.5); see Theorem 25.

Via a small set of experiments, we demonstrate that these are not only asymptotic differences. Compared to the standard GSEMO, we observe roughly a factor-5 speed-up with heavy-tailed mutation and a factor-10 speed-up with our stagnation-detection GSEMO, and this already for jump size k=4 and problem sizes n between 10 and 50.

We note that this work is an extended version of the 7-page (excluding acknowledgments and references) preliminary paper (Doerr and Zheng, 2021) and differs in the following ways. We estimate the leading constant of the asymptotic upper bound of the runtime of the GSEMO with heavy-tailed mutation for not too large jump size, which has not been shown in our preliminary version, and also not been shown before for the theoretical works with the heavy-tailed mutation. Besides, this version adds a formal discussion on the multimodality in multiobjective problems in Section 3, and contains all mathematical proofs that had to be omitted in Doerr and Zheng (2021) for reasons of space.

The remainder of this paper is organized as follows. Section 2 introduces the basic definitions that will be used throughout this work. In Section 3, we discuss the relevant previous works. The OneJumpZeroJumpn,k function class is defined in Section 4. Our mathematical runtime analyses are presented in Sections 5 to 8. Section 9 shows our experimental results. A conclusion is given in Section 10.

A multiobjective optimization problem consists of optimizing multiple objectives simultaneously. In this paper, we restrict ourselves to the maximization of bi-objective pseudo-Boolean problems f=(f1,f2):{0,1}nR2. We note that only sporadic theoretical works exist that discuss the optimization of more than two objectives (Laumanns et al., 2004b; Bian et al., 2018; Huang et al., 2021).

We briefly review the standard notation in multiobjective optimization. For any two search points x,y{0,1}n, we say that

  • xweakly dominatesy, denoted by xy, if and only if f1(x)f1(y) and f2(x)f2(y);

  • xdominatesy, denoted by xy, if and only if f1(x)f1(y) and f2(x)f2(y) and at least one of the inequalities is strict.

We call x{0,1}nPareto optimal if and only if there is no y{0,1}n such that yx. All Pareto optimal solutions form the Pareto setS* of the problem. The set F*=f(S*)={(f1(x),f2(x))xS*} of the function values of the Pareto set is called the Pareto front.

For most multiobjective problems, the objectives are at least partially conflicting and thus there is usually not a single Pareto optimum. Since a priori it is not clear which of several incomparable Pareto optima to prefer, the most common target is to compute the Pareto front, that is, compute a set P of solutions such that f(P):={(f1(x),f2(x))xP} equals the Pareto front F*. This is our objective in this work as well. We note that if the Pareto front is excessively large, then one has to resort to approximating it in a suitable manner, but this will not be our problem here.

We will use |x|1 and |x|0 to denote the number of ones and the number of zeros of the search point x{0,1}n. We use [a..b] to denote the set {a,a+1,,b} for a,bZ and ab.

Since this work conducts a mathematical runtime analysis of an evolutionary algorithm for discrete search spaces, we primarily discuss the state of the art inside this subarea of evolutionary computation. The mathematical runtime analysis as a rigorous means to understand evolutionary algorithms was started in the 1990s with works like Mühlenbein (1992), Bäck (1993), and Rudolph (1997). The first analysis of MOEAs were conducted by Laumanns et al. (2002, 2004b), Giel (2003), and Thierens (2003). The theory of MOEAs here often followed the successful examples given by single-objective works. For example, the multiobjective benchmarks COCZ (Laumanns et al., 2004b) and OneMinMax (Giel and Lehre, 2010) are composed of two contradicting copies of the classic OneMax problem, the problems LOTZ (Laumanns et al., 2004b) and WLPTNO (Qian et al., 2013) follow the main idea of the LeadingOnes problem (Rudolph, 1997).

Due to the higher complexity of analyzing MOEAs, many topics with interesting results on single-objective EAs received almost no attention for MOEAs. For example, the first runtime analysis of a single-objective algorithm using crossover was conducted by Jansen and Wegener (2001) and was followed up by a long sequence of important works. In contrast, only sporadic works discuss crossover in MOEAs (Neumann and Theile, 2010; Qian et al., 2013; Huang et al., 2019; Bian and Qian, 2022). Dynamic parameter settings for single-objective EAs were first discussed by Jansen and Wegener (2005) and then received a huge amount of attention (see, e.g., Doerr and Doerr, 2020), whereas the only such result for MOEAs appears to be one result by Doerr et al. (2022). How single-objective evolutionary algorithms cope with noise was discussed already by Droste (2004), but despite numerous follow-up works in single-objective optimization, there is only a single mathematical runtime analysis of an MOEA in the presence of noise (Gutjahr, 2012).

In this work, we regard another topic that is fairly well-understood in the evolutionary theory community on the single-objective side, but where essentially nothing is known on the multiobjective side. It is well-known in general that local optima are challenging for evolutionary algorithms. Already one of the first runtime analysis works, the ground-breaking paper by Droste et al. (2002), proposes a multimodal (that is, having local optima different from the global optimum) benchmark of scalable difficulty. The jump function with difficulty parameter k, later called gap size, resembles the easy OneMax function except that it has a valley of very low fitness around the optimum. This valley consists of the k-1 first Hamming levels around the optimum; the search points in distance k form a local optimum that is easy to reach, but hard to leave. Mutation-based algorithms need to flip the right k bits to go from any point on the local optimum to a search point of better fitness (which here is the global optimum). While other multimodal benchmarks have been proposed and have received some attention, most notably trap functions (Jägersküpper and Storch, 2007) and the DeceptiveLeadingBlocks problem (Lehre and Nguyen, 2019), it is safe to say that the Jump benchmark is the by far most studied one and that these studies have led to fundamental insights on how evolutionary algorithms cope with local optima.

For reasons of space, we only discuss some of the most significant works on jump functions. The first work (Droste et al., 2002) determined the runtime of the (1+1) EA on Jump functions with gap size k2 to be Θ(nk), showing that local optima can lead to substantial performance losses. This bound was tightened and extended to arbitrary mutation rates by Doerr et al. (2017). This analysis led to the discovery of a heavy-tailed mutation operator and more generally, established the use of power-law distributed parameters, an idea that led to many interesting results subsequently (e.g., Antipov et al., 2021; Corus et al., 2021; Antipov, Buzdalov, et al., 2022; Dang et al., 2022). Again, an analysis of how single-trajectory heuristics optimize jump functions led to the definition of a stagnation-detection mechanism, which in a very natural manner leads to a reduction of the time an EA is stuck on a local optimum (Rajabi and Witt, 2022). Jump functions are among the first and most convincing examples that show that crossover can lead to significant speed-ups over only using mutation as variation operator (Jansen and Wegener, 2002; Dang et al., 2018; Antipov, Doerr, et al., 2022). That estimation-of-distribution algorithms and ant-colony optimizers may find it easier to cope with local optima was also shown via jump functions (Hasenöhrl and Sutton, 2018; Doerr, 2021; Benbaki et al., 2021; Witt, 2023).

In contrast to this intensive discussion of multimodal functions in single-objective evolutionary optimization, almost no such results exist in multiobjective optimization. Before discussing this aspect in detail, let us agree (for this work) on the following terminology. We say that a function f:{0,1}nR is unimodal if it has a unique maximum and if all other search points have a Hamming neighbor with strictly larger fitness. We say that f is multimodal if it has local optima different from the global optimum. Here, we call local optimum a set S of search points such that f(x)=f(y) for all x,yS and such that all neighbors of S, that is, all zS having a Hamming neighbor in S, have a smaller f-value. Note that a global optimum is also a local optimum according to this definition. With these definitions, unimodal functions are those where a hillclimber can reach the optimum via one-bit flips regardless of the initial solution. In contrast, multimodal functions definitely need a single-trajectory heuristic to flip more than one bit (when started in a suitable solution) or need the acceptance of truly inferior solutions. With our definition, plateaus of constant fitness render a function not unimodal, but they do not imply multimodality. This is convenient for our purposes as we shall not deal with such plateaus, for the simple reason that they behave again differently from unimodal functions and true local optima. We note that the role of plateaus in multiobjective evolutionary optimization has been extensively studied, among others, in Brockhoff et al. (2007), Friedrich et al. (2010, 2011), Qian et al. (2016), and Li et al. (2016).

We extend the definitions of unimodality and multimodality to multiobjective problems in the natural way: A multiobjective problem is unimodal if all its objectives are unimodal functions. Many such problems are easy, and even simple MOEAs employing one-bit flips as only variation operator—for example, the SEMO—can find the full Pareto front. Examples of such problems include the well-studied benchmarks inspired by the unimodal single-objective benchmarks OneMax and LeadingOnes. However, somewhat surprisingly, it is not true that the SEMO can cover the full Pareto front of all unimodal multiobjective problems as we now show.

Lemma 1:

There exists a unimodal multiobjective problem whose full Pareto front cannot be covered by the SEMO in an arbitrarily long time with a positive probability.

Proof:
We consider the following bi-objective problem defined on {0,1}3.
f(000)=(0,1),f(001)=(2,0),f(010)=(0,2),f(011)=(1,5)f(100)=(4,0),f(101)=(5,0),f(110)=(3,3),f(111)=(0,4).
Then its Pareto front is {(1,5),(5,0),(3,3)}. We know that f1 is unimodal as
f1(000)<f1(001),f1(001)<f1(101),f1(010)<f1(011),f1(011)<f1(001),f1(100)<f1(101),f1(110)<f1(100),f1(111)<f1(110),
and f1(101) is the unique maximal value. We also know that f2 is unimodal as
f2(000)<f2(010),f2(001)<f2(000),f2(010)<f2(011),f2(100)<f2(000),f2(101)<f2(111),f2(110)<f3(111),f2(111)<f2(011),
and f2(011) is the unique maximal value.
Consider the run of the SEMO described in Table 1. It is easy to see that this process happens with probability of
18·13·1213·1213·1213=15184>0.
Table 1:

An example process of the SEMO. Given is the population at the end of each iteration (which for iteration 0 is the initial population). To ease understanding the process, with each individual we also state its objective value.

GenerationPopulation
(100, (4,0)) 
(100, (4,0)), (000, (0,1)) 
(100, (4,0)), (010, (0,2)) 
(101, (5,0)), (010, (0,2)) 
(101, (5,0)), (011, (1,5)) 
GenerationPopulation
(100, (4,0)) 
(100, (4,0)), (000, (0,1)) 
(100, (4,0)), (010, (0,2)) 
(101, (5,0)), (010, (0,2)) 
(101, (5,0)), (011, (1,5)) 

We now show that once the optimization process has reached the state described in the last line of Table 1, the Pareto front point (3, 3) can no longer be reached. Note that 110 is the only Pareto optimum for (3, 3), and its Hamming neighbors are 010, 100, and 111. Since f(010)=(0,2),f(100)=(4,0), and f(111)=(0,4), we know that 010 and 111 are strictly dominated by 011, and that 100 is strictly dominated by 101. Hence, even if any of them is generated, it cannot survive to the next generation when the current population is {101,011}. Therefore, 110 cannot be generated (and (3, 3) cannot be reached) via the SEMO with the one-bit mutation.

We say that a multiobjective problem is multimodal if at least one objective is multimodal. We note that this does not automatically imply that the SEMO cannot find the full Pareto front, in fact, as the following proof of Lemma 2 shows, there are multi-objective problems consisting only of multimodal objectives such that the SEMO regardless of the initial solution computes the full Pareto front very efficiently. Clearly, our interest in this work are multimodal multiobjective problems that are harder to optimize. To avoid misunderstandings, we note that a second notion of multimodality exists. In multimodal optimization, the target is to find all or a diverse set of solutions having some optimality property (Preuss, 2015; Liang et al., 2016; Tanabe and Ishibuchi, 2020). This notion is unrelated to our notion of multimodality.

Lemma 2:

There is a bi-objective problem f=(f1,f2):{0,1}nR2 such that both objectives f1 and f2 have several local optima, but the SEMO algorithm, regardless of its initial solution, finds the full Pareto front in time O(n2logn).

Proof:
Let k,N2 and n=k. Define f1 via
f1(x)=|x|1, if |x|10(mod),|x|1/+(-i), if |x|1i(mod),
for all x{0,1}n. Hence, f1 agrees with the classic OneMax benchmark when |x|1 is a multiple of . Solutions x with |x|11(mod) are local optima different from the global optimum, which is x*=1n. The function f1 is deceptive in each interval [j+1..j+-1], j=0,,k-1.
We define f2 analogously, but with the roles of zeroes and ones interchanged, that is,
f2(x)=|x|0, if |x|00(mod),|x|0/+(-i), if |x|0i(mod),
for all x{0,1}n. Clearly, f2 has the same characteristics as f1; in particular, it is highly multimodal with its k local optima.
Figure 1 plots these two functions with (n,)=(50,10). We clearly see the n/+1=6 local optima for both objectives, namely the points with |x|11(mod) and x=1n for f1 and the points with |x|01(mod) and x=0n for f2.
Figure 1:

The values of f1 and f2 with respect to |x|1, the number of ones in the search point x.

Figure 1:

The values of f1 and f2 with respect to |x|1, the number of ones in the search point x.

Close modal

We note that any solution is a Pareto optimum of f=(f1,f2). Indeed, we have f1(x)+f2(x)=n for all x{0,1}n. Consequently, if f1(x)>f1(y) for some x,y{0,1}n, then necessarily f1(x)<f2(y) and thus x does not dominate y.

Since all solutions are Pareto optima and since f has the same classes of search points of equal fitness, the optimization process of the SEMO on f is identical to the one on OneMax. With Giel and Lehre's (2010) Theorem 3 and its proof, our claim is proven.

Having made precise what we understand as multimodal multiobjective problem, we review the known results on such problems.

To the best of our knowledge, only very few runtime analysis works on multimodal multiobjective problems exist, and none of them are focused on the possible difficulties arising from the multimodality. Roughly speaking, these can be divided into works that construct artificial problems to demonstrate, often in an extreme way, a particular strength or weakness of an algorithm, and works on classic combinatorial optimization problems that happen to be multimodal. The first set of works gave rise to the multimodal problems ZPLG, SPG, and Dec-obj-MOP. The first two of these were proposed by Qian et al. (2016) and are defined as follows.

Definition 3 (ZPLG; Qian et al., 2016):
The function ZPLG:{0,1}nR×{0,1,2} is defined by
ZPLG(x)=(n+1,1), if x=1i0n-i,i[1..34n-1];(n+2+i,0), if x=134n+2i014n-2i,i[0..18n];(|x|0,2), else .
Definition 4 (SPG; Qian et al., 2016):
The function SPG:{0,1}nR×{0,1} is defined by
SPG(x)=(-1,0), if x=1i0n-i,i mod 3=1,i[1..n];(in,0), if x=1i0n-i,i mod 3{0,2},i[1..n];(|x|0,1), else .

In order to investigate the effect of mixing low-level heuristics, Qian et al. (2016) showed that mixing fair selection with respect to the decision space and the objective space is beneficial for ZPLG, and that mixing 1-bit and 2-bit mutation is efficient for SPG.

In the first theoretical study of decomposition-based MOEAs, Li et al. (2016) analyzed how the MOEA/D solves several multiobjective problems, among them the following multimodal one.

Definition 5 (Dec-obj-MOP; Li et al., 2016):
The function Dec-obj-MOP:{0,1}nR2 is defined by
Dec-obj-MOP(x)=(n+1-|x|0 mod n+1,n+|x|0 mod n+1).

We note that the three problems ZPLG, SPG, and Dec-obj-MOP all appear slightly artificial. Also, they have only one multimodal objective. The first objective of ZPLG has local optima on x=134n+2i014n-2i,i[0..18n], the first objective of SPG has local optima on x=0n and x=1i0n-i,i mod 3=0,i[1..n], and the second objective of Dec-obj-MOP has two local optima on x{0n,1n}.

The second type of works dealing with multimodal multiobjective problems are those which regard combinatorial optimization problems (e.g., Laumanns et al., 2004a; Kumar and Banerjee, 2006; Neumann, 2007; Neumann and Reichel, 2008; Greiner, 2009; Horoba, 2009; Qian et al., 2013, 2015, 2018; Feng et al., 2019; Qian et al., 2020; Roostapour et al., 2020, 2022). Combinatorial optimization problems almost always are multimodal, simply because a simple cardinality constraint suffices to render a problem multimodal. We note for some such problems the multimodality is not very strong; for example, for minimum spanning tree problems, flipping two bits suffices to leave a local optimum. Overall, all of these works are relatively problem-specific and we could not distill from these any general insights on how MOEAs cope with local optima.

To study via mathematical means how MOEAs cope with multimodality, we now define and analyze a class of bi-objective functions of scalable difficulty. As mentioned above, this is strongly influenced by the single-objective Jump function class proposed in Droste et al. (2002), which is intensively used in the theory of single-objective evolutionary computation and which gave rise to many interesting results including that larger mutation rates help in the optimization of multimodal functions (Doerr et al., 2017), that crossover can help to cope with multimodality (Jansen and Wegener, 2002; Dang et al., 2018), and that estimation-of-distribution algorithms and the (1+(λ,λ)) GA can significantly outperform classic evolutionary algorithms on multimodal problems (Hasenöhrl and Sutton, 2018; Doerr, 2021; Antipov and Doerr, 2020; Antipov, Doerr, et al., 2022).

We recall that for all nN and k[1..n], the jump function Jumpn,k:{0,1}nR with problem size n and gap size k is defined by Jumpn,k(x)=k+|x|1, if |x|1[0..n-k]{n} and Jumpn,k(x)=n-|x|1 otherwise. Hence, for k2, this function has a valley of low fitness around its optimum x*=1n, which can be crossed only by flipping k specific bits (or accepting solutions with very low fitness). We define the OneJumpZeroJumpn,k function as a bi-objective counterpart of the function Jumpn,k.

Definition 6 (OneJumpZeroJumpn,k):
Let nN and k=[1..n]. The function OneJumpZeroJumpn,k=(f1,f2):{0,1}nR2 is defined by
f1(x)=k+|x|1, if |x|1n-k or x=1n,n-|x|1, else ;f2(x)=k+|x|0, if |x|0n-k or x=0n,n-|x|0, else .
So the first objective of OneJumpZeroJumpn,k is just the classic Jumpn,k function. The second objective has a fitness landscape isomorphic to this function, but the roles of zeros and ones are interchanged, that is, f2(x)=Jumpn,k(x¯), where x¯ is the bitwise complement of x. Figure 2 displays these two functions and in particular the two local optima on |x|1=n-k and x=1n for the first objective and two local optima on |x|1=k and x=0n for the second objective.
Figure 2:

The values of two objectives in OneJumpZeroJumpn,k with respect to |x|1, the number of ones in the search point x.

Figure 2:

The values of two objectives in OneJumpZeroJumpn,k with respect to |x|1, the number of ones in the search point x.

Close modal
In the following theorem, we determine the Pareto set and front of the OneJumpZeroJumpn,k function. As Figure 2 suggests, for k2 the Pareto set consists of an inner region of all search points x with |x|1[k..n-k] and the two extremal points 1n and 0n, as visualized in Figure 3. We shall call the Pareto front values {(a,2k+n-a)a[2k..n]} the inner part/region of the Pareto front in this paper.
Figure 3:

The Pareto front for the OneJumpZeroJumpn,k function with (n,k)=(50,10).

Figure 3:

The Pareto front for the OneJumpZeroJumpn,k function with (n,k)=(50,10).

Close modal
Theorem 7:

The Pareto set of the OneJumpZeroJumpn,k function is S*={x|x|1[k..n-k]{0,n}}, and the Pareto front is F*={(a,2k+n-a)a[2k..n]{k,n+k}}.

Proof:

We firstly prove that for any given xS*, yx holds for all y{0,1}n. If yx, then f1(y)f1(x) and f2(y)f2(x), and at least one of two inequalities is strict. Without loss of generality, let f1(y)>f1(x). From xS*, we have f1(x)k, hence y0n. Since f1(y)n+k, we know x1n. Then we have |x|1[k..n-k]{0} since xS*.

  • If |x|1[k..n-k], then k|x|1<|y|1n-k or |y|1=n. For k|y|1n-k, we have k|y|0<|x|0n-k. Then f2(y)=k+|y|0<k+|x|0=f2(x). For |y|1=n, we have f2(y)=k<k+kf2(x) since |x|0[k..n-k] from |x|1[k..n-k].

  • If |x|1=0, then f2(x)=k+n>f2(y) since y0n.

That is, we obtain f2(y)<f2(x) which is contradictory to yx. Hence, S* is the Pareto set.

We secondly prove that for any given y{0,1}nS*, there is xS* such that xy. Since y{0,1}nS*, we have |y|1[1..k-1][n-k+1..n-1].

  • If |y|1[1..k-1], we have f1(y)=k+|y|1 and f2(y)=|y|1. Then we could take x with |x|1=k, and have f1(x)=k+k>f1(y) and f2(x)=k+n-k>f2(y). Then xy.

  • If |y|1[n-k+1..n-1], we have f1(y)=n-|y|1 and f2(y)=k+|y|02k-1. Then we could take x with |x|1=n-k, and have f1(x)=n>f1(y) and f2(x)=k+k>f2(y). Then xy.

Hence, F* is the Pareto front and this theorem is proven.

Theorem 7 implies that for k>n/2, the Pareto front consists only of the two extremal solutions 0n and 1n. This appears to be a relatively special situation that is not very representative for multimodal multiobjective problems. For this reason, in the remainder, we shall only regard the case that kn/2. In this case, again from Theorem 7, we easily obtain in the following corollary a general upper bound on the size of any set of solutions without pair-wise weak domination, and thus also on the size of the population in the algorithms discussed in this work.

Corollary 8:

Let kn/2. Consider any set of solutions P such that x¬y with respect to OneJumpZeroJumpn,k for all x,yP with xy. Then |P|n-2k+3.

Proof:

We first note that any non-extremal solution that lies on the Pareto front dominates any solution not on the Pareto front. Let P'=P{0n,1n}. Therefore, either P' is a subset of the Pareto front or it contains no point on the Pareto front. In the first case, Theorem 7 immediately shows our claim. Assume that P' contains no point on the Pareto front. Let x,yP' with |x|1,|y|1[1..k-1]. Assume without loss of generality that |x|1|y|1. Then xy. Hence P' contains at most one solution x with |x|1[1..k-1]. A symmetric argument shows that P' contains at most one solution x with |x|1[n-k+1..n-1]. Note that 1n dominates any x with |x|1[1..k-1] and 0n dominates any x with |x|1[n-k+1..n-1]. Hence |P|2n-2k+3.

The simple evolutionary multiobjective optimizer (SEMO), proposed by Laumanns et al. (2002), is a well-studied basic benchmark algorithm in multiobjective evolutionary theory (Qian et al., 2013; Li et al., 2016). It is a multiobjective analogue of the randomized local search (RLS) algorithm, which starts with a random individual and tries to improve it by repeatedly flipping a single random bit and accepting the better of parent and this offspring. As a multiobjective algorithm trying to compute the full Pareto front, the SEMO naturally has to work with a non-trivial population. This is initialized with a single random individual. In each iteration, a random parent is chosen from the population. It generates an offspring by flipping a random bit. The offspring enters the population if it is not weakly dominated by some individual already in the population. In this case, any individual dominated by it is removed from the population. The details of SEMO are shown in Algorithm 1. We note that more recent works such as Friedrich et al. (2010) define the (G)SEMO minimally different, namely so that in case of equal objective values the offspring enters the population and replaces the existing individual with this objective value. Preferring offspring in case of equal objective values allows EAs to traverse plateaus of equal fitness and this is generally preferred over keeping the first solution with a certain objective value. For our purposes, this difference is not important since all points with equal fitness behave identically.

graphic

In the following Theorem 9, we show that the SEMO cannot cope with the multimodality of the OneJumpZeroJumpn,k function. Even with infinite time, it does not find the full Pareto front of any OneJumpZeroJump function. We note that this result is not very surprising and we prove it rather for reasons of completeness. It is well-known in single-objective optimization that the randomized local search heuristic with positive (and often very high) probability fails on multimodal problems (the only published reference we found is Doerr et al., 2008, Theorem 9, but surely this was known before). It is for the same reason that the SEMO algorithm cannot optimize the OneJumpZeroJump function.

Theorem 9:

For all n,kN with k[2..n2], the SEMO cannot optimize the OneJumpZeroJumpn,k function.

Proof:

Note that any individual with m'[k..n-k] ones has the function value (m'+k,n-m'+k)(2k,2k). Since a search point with m[1..k-1] ones has the function value (m+k,m)(2k-1,k-1) and a search point with m[1..k-1] zeros has the function value (m,m+k)(k-1,2k-1), we know that any newly generated search point with number of ones in [k..n-k] will lead to the removal from the current population of all individuals with number of zeros or ones in [1..k-1] and will also prevent any such individual from entering the population in the future.

Without loss of generality suppose that the initial individual has at most n2 ones. We show that 1n can never enter the population. Assume first that never in this run of the SEMO is a search point with n-k ones generated. Since each offspring has a Hamming distance of one from its parent, the statement “Never is an individual with n-k or more ones generated” holds in each generation. In particular, the search point 1n is never generated.

Now, assume that at some time t for the first time an individual x with n-k ones is generated. As in the previous paragraph, up to this point no search point with more than n-k ones is generated. In particular, the point 1n is not generated so far. Since x lies on the Pareto front, it enters the population. From this point on, by our initial considerations, no individual with n-k+1 to n-1 ones can enter the population. In particular, at no time can a parent with n-1 ones be selected, which is a necessary precondition for generating the point 1n. Consequently, the point 1n will never be generated.

As we have shown in the previous section, to deal with the OneJumpZeroJump benchmark, a mutation-based algorithm needs to be able to flip more than one bit. The global SEMO (GSEMO) proposed by Giel (2003) is a well-analyzed MOEA that has this ability. Generalized from the (1+1) EA algorithm, it uses the standard bit-wise mutation; that is, each bit is flipped independently with the same probability of, usually, 1/n. The details are shown in Algorithm 2.

graphic

Theorem 12 will show that GSEMO can find the Pareto front. To make the main proof clearer, some propositions are extracted as the following lemmas. Lemma 10 shows that once a solution in the inner part of the Pareto front is part of the population, it takes O(n2logn) iterations until the whole inner part of the Pareto front is covered.

Lemma 10:

Consider optimizing the function f:=OneJumpZeroJumpn,k via the GSEMO. Let a0[2k..n]. Assume that at some time the population contains an individual x with f(x)=(a0,2k+n-a0). Then the expected number of interations until all (a,2k+n-a), a[2k..n], are covered by the population is at most 2en(n-2k+3)(lnn2+1) iterations.

Proof:
Note that any solution corresponding to (a,2k+n-a) in the Pareto front contains exactly a-k ones. If the population contains such an individual, then the probability to find (a+1,2k+n-a-1) in the Pareto front is at least
1n-2k+3n-a+kn1-1nn-1n-a+ken(n-2k+3),
where the population size is at most n-2k+3 (see Corollary 8). Then, by a simple Markov chain argument (similar to Wegener's (2001) fitness level method), the expected time to find all (a,2k+n-a), a[a0+1..n], is at most
a=a0n-1en(n-2k+3)n-a+k.
(1)
Similarly, the probability to find (a-1,2k+n-a+1) in the Pareto front once the population contains a solution with function value (a,2k+n-a) is at least
1n-2k+3a-kn1-1nn-1a-ken(n-2k+3),
and the expected time to find all (a,2k+n-a), a[2k..a0-1], is at most
a=a02k+1en(n-2k+3)a-k.
(2)
By comparing the individual summands, the sum of (1) and (2) is at most
2a=n/2+kn-1en(n-2k+3)n-a+k2en(n-2k+3)lnn2+1.

From a population covering the whole inner part of the Pareto front, the two missing extremal points of the front can be found in roughly O(nk+1) iterations, as the following Lemma 11 shows.

Lemma 11:

Consider the GSEMO optimizing the OneJumpZeroJumpn,k function. Assume that at some time, all (a,2k+n-a), a[2k..n], are covered by the current population. Then, the two missing points (a,2k+n-a), a{k,n+k} of the Pareto front will be found in an expected number of at most 32enk(n-2k+3) additional generations.

Proof:
We pessimistically calculate the time to generate two elements (a,2k+n-a), a{k,n+k}, in the Pareto front ignoring the fact that the current population may contain the corresponding solutions 0n and 1n already. Similar to the proof of Lemma 10, we note that the probability to find (k,k+n) in the Pareto front from a solution with function value (2k,n) is at least
1n-2k+31nk1-1nn-k1enk(n-2k+3).
The same estimate holds for the probability to find (k+n,k) in the Pareto front from one solution with function value (n,2k). Hence, the expected time to find one of (k,n+k) and (k+n,k) is at most e2nk(n-2k+3), and the expected time to find the remaining element of the Pareto front is at most enk(n-2k+3) additional iterations.

Now, we establish our main result for the GSEMO.

Theorem 12:
Let nN2 and k[1..n2]. The expected runtime of the GSEMO optimizing the OneJumpZeroJumpn,k function is at most
e(n-2k+3)32nk+2nlnn2+3.
For 2k=o(n), this bound is 32enk+1±o(nk+1).
Proof:
We first consider the time until the population P contains an x such that (f1(x),f2(x))=(a,2k+n-a) for some a[2k..n]. If the initial search point has a number of ones in [k..n-k], then such an x is already found. Otherwise, the initial search point has at most k-1 ones or at most k-1 zeros. Without loss of generality, we consider an initial point with mk-1 zeros. If m=0, that is, the initial point is 1n, then its function value is (n+k,k). Since any point x˜ with m˜[1..k-1] zeros has function value (m˜,k+m˜), such a point is not dominated by 1n, and thus will be kept in the population. After that, since (m˜,k+m˜) increases with respect to m˜, one individual with larger m˜[1..k-1] will replace the one with smaller m˜[1..k-1]. Finally, any individual with m'[k..n-k] zeros has function value (n-m'+k,m'+k)>(k-1,2k-1) and thus dominates all search points with m˜[1..k-1] zeros. We pessimistically add up the expected waiting times for increasing the number of zeros one by one until one individual with k zeros is found, which needs an expected number of iterations of at most
m=0k-11n-2k+3n-mn1-1nn-1-1m=0k-1en(n-2k+3)n-men(n-2k+3)kn-k+1en(n-2k+3).
(3)

Then from Lemma 10, we know that all (a,2k+n-a), a[2k..n], in the Pareto front will be found in an expected number of at most 2en(n-2k+3)(lnn2+1) iterations. After that, from Lemma 11, the points (k,n+k) and (n+k,k) in the Pareto front will be found in an expected number of at most 32enk(n-2k+3) additional iterations. Hence, the expected iterations to find the Pareto front is at most en(n-2k+3)+2en(n-2k+3)(lnn2+1)+32enk(n-2k+3)=e(n-2k+3)(32nk+2nlnn2+3).

Finally, we show that this bound is very tight. For convenience, we consider only the case k4. In this case, it is sufficiently hard to generate an extremal point of the Pareto front from the inner part that we do not need a careful analysis of the population dynamics in the short initial phase in which the inner part of the Pareto front is detected, which is not the main theme of this work. That said, we are very optimistic that it would pose no greater difficulties to show that with high probability at the first time when a solution with less than 14n or more than 34n ones enters the population, the population already contains Θ(n) individuals. Therefore, from this point on, only every Θ(n) iterations an outer-most point will be chosen as parent, and this should suffice to prove an Ω(nk+1) lower bound also for k=2 and k=3.

Theorem 13:
Let nN8 and k[4..n2-1]. The expected runtime of the GSEMO optimizing the OneJumpZeroJumpn,k function is at least
32e(n-2k+1)nknk(n-1)k1-1nn/4-2-2nn-2k-e+3n-4e(lnn+1)nk-3.
If k=o(n), this bound becomes 32enk+1-o(nk+1) and matches the upper bound of Theorem 12 apart from lower-order terms.
Proof:

We first prove that with high probability the inner region of the Pareto front will be fully covered before 0n or 1n is generated and then use this to bound the time until the two extremal points of the Pareto front are found.

For the initial individual x, we have E[|x|1]=12n. With the additive Chernoff bound (see, e.g., Doerr, 2020, Theorem 1.10.7), we know that with probability at least
1-2exp-2nn42=1-2exp-18n,
the initial individual has 14n<|x|1<34n. Then |x|1<k or |x|1>n-k only if k>14n. If k>14n, without loss of generality, let |x|1<k, then we will show that with probability at least 1-n-n/4+2-2n-n+2k-en-1, an individual x' with k|x'|1n-k is generated earlier than 1n or 0n. Note that before such an x' appears, the population consists of one individual with 14n<|x|1<k except if 1n, 0n, or a y with |y|1>n-k is generated. We note that the probability to generate one of the exceptions from one individual with 14n<|x|1<k is at most
1nn-|x|1+1n|x|1+1nn-k+1-|x|1<1nn-k+1+1n14n+1nn-2k+21n14n+2nn-2k+2,
where the last inequality uses k1. Then with probability at least
1-n-n/4-2n-(n-2k+2)n21-n-n/4+2-2n-n+2k,
the population consists of only one individual with 14n<|x|1<k in n2 iterations. Then from the similar calculation in (3) except changing n-2k+3 to 1, we know that it takes an expected time of at most en iterations to generate such an x'. Via Markov's inequality, we know that with probability at least 1-enn2=1-en, such an x' will be generated in n2 iterations. Hence, with probability at least 1-n-n/4+2-2n-n+2k-en-1, an individual x' with k|x'|1n-k is generated earlier than 1n or 0n.
From this point on, we will first show that with probability at least 1-8e(lnn+1)nk-3, all (a,2k+n-a), a[2k..n], in the Pareto front are found before 0n or 1n is generated. It is not difficult to see that any search point y in either of the two gaps, that is, with |y|1[1..k-1][n-k+1..n-1], has both objective values less than any search point z with |z|1[k..n-k], and thus cannot enter into the population. Therefore, the probability to generate 0n or 1n from some parent x with |x|1[k..n-k] is at most
1n|x|11-1nn-|x|1+1nn-|x|11-1n|x|12nk.
(4)
From Lemma 10, we know that all (a,2k+n-a), a[2k..n], in the Pareto front will be found in an expected number of at most 2en(n-2k+3)(lnn2+1) iterations. Via Markov's inequality, we know that with probability at least 1-2en(n-2k+3)(lnn2+1)2en3(lnn+1)1-1n, all (a,2k+n-a), a[2k..n], in the Pareto front will be found in 2en3(lnn+1) iterations. Thus, with (4), we know that the probability that, during all 2en3(lnn+1) iterations, 0n or 1n is not generated is at least
1-2nk2en3(lnn+1)1-4en3(lnn+1)nk=1-4e(lnn+1)nk-3.
Now all (a,2k+n-a), a[2k..n], in the Pareto front are found and the current population has size n-2k+1. From this time on, we compute the probability to generate the remaining Pareto optimal solution 0n and 1n as
i=kn-k1n-2k+11ni1-1nn-i+1nn-i1-1ni=2(n-2k+1)nk1-1nn-k1+j=1n-2k1nj1-1n-j2(n-2k+1)nk1-1nn-k1+1n-11-1n-1=1+1n-22(n-2k+1)nk1-1nn-k=n-1n-22(n-2k+1)nk1-1nn-k.
Then the expected time to find 0n or 1n is at least 12(n-2k+1)nk1-1nk-nn-2n-1. Without loss of generality, 0n is found. Now the current population size is n-2k+2, and the probability to find the remaining 1n is
1n-2k+2·1nn+i=kn-k1n-2k+2·1ni1-1nn-i1(n-2k+2)nk1-1nn-k1nn-k1-1n-n+k+1+1n-21(n-2k+2)nk1-1nn-k1+2n-2=1(n-2k+2)nk1-1nn-knn-2
Then we need at least (n-2k+2)nk1-1nk-nn-2n additional iterations in expectation to cover the whole Pareto front.
In summary, together with the above discussed probability for the initial individual and the probability that all (a,2k+n-a), a[2k..n], in the Pareto front are found before 0n or 1n is generated, we obtain that the lower bound to cover the Pareto front is
1-1nn/4-2-2nn-2k-en-1n-4e(lnn+1)nk-3·32(n-2k+1)nk1-1nk-nn-2n32(n-2k+1)nk1-1nk-n1-1nn/4-2-2nn-2k-e+3n-4e(lnn+1)nk-332e(n-2k+1)nknk(n-1)k1-1nn/4-2-2nn-2k-e+3n-4e(lnn+1)nk-3.

In the previous section, we have shown that the GSEMO can optimize our multimodal optimization problem, but similar to the single-objective world (say, the optimization of Jump functions via simple evolutionary algorithms, Droste et al., 2002), the runtime increases significantly with the distance k that a solution on the Pareto front can have from all other solutions on the front. As has been discussed in Doerr et al. (2017), increasing the mutation rate can improve the time it takes to jump over such gaps. However, this work also showed that a deviation from the optimal mutation rate can be costly: A small constant-factor deviation from the optimal rate k/n leads to a performance loss exponential in k. For this reason, a heavy-tailed mutation operator was proposed. Compared to using the optimal (usually unknown) rate, it loses only a small polynomial factor (in k) in performance.

We now equip the GSEMO with the heavy-tailed mutation operator from Doerr et al. (2017) and observe similar advantages. We start by introducing the heavy-tailed mutation operator and giving a first non-asymptotic estimate for the probabilities to flip a certain number of bits.

7.1 Heavy-Tailed Mutation Operator

The following capped power-law distribution will be used in the definition of the heavy-tailed mutation operator.

Definition 14 (Power-law distribution Dn/2β):
Let nN2 and β>1. Let Cn/2β:=i=1n/2i-β. We say a random variable ξ follows the power-low Dn/2β, written as ξDn/2β, if for all α[1..n/2], we have
Pr[ξ=α]=Cn/2β-1α-β.

The heavy-tailed mutation operator proposed by Doerr et al. (2017), in the remainder denoted by mutβ(·), in each application independently samples a number α from the power-law distribution Dn/2β and then uses standard bit-wise mutation with mutation rate α/n, that is, flips each bit independently with probability α/n. Doerr et al. (2017) shows the two following properties of mutβ.

Lemma 15 (Doerr et al., 2017):
Let x{0,1}n and ymutβ(x). Let H(x,y) denote the Hamming distance between x and y. Then we have
Piβ:=Pr[H(x,y)=i]=Cn/2β-1Θ(1), for i=1;Cn/2β-1Ω(i-β), for i=[2..n/2].

In order to obtain an understanding of the leading constants when generating a solution with a not too large Hamming distance, we prove the following estimations, which have not been shown before, and can also be applied to other previous works about the theoretical analysis of the heavy-tailed mutation. We suspect tighter estimates of the leading constants exist but leave it as interesting future work.

Lemma 16:
Let x{0,1}n and ymutβ(x). Let H(x,y) denote the Hamming distance between x and y. Then we have
Piβ:=Pr[H(x,y)=i]β-1eβ, for i=1;β-122πe82+13βi-β, for i=[2..n].

The proof of Lemma 16 will use an elementary estimate for nk when kn. Since we have not found a good reference for it and since we believe that it might be useful for other runtime analyses, we formulate it as a separate lemma and give a formal proof.

Lemma 17:

Let nN and i=[0..n]. Then nini2i!.

Proof:
Our claim holds trivially for i=0 and 1. Thus, for any n[1..3], this lemma holds. We now discuss n4 and i[2..n]. By Weierstrass product inequality (see, e.g., Doerr, 2020, Lemma 1.4.8(b)), we know
1-1n1-2n1-i-1n1-j=1i-1jn=1-(i-1)(i-1+1)2n1-i22n12.
Then we have
ni=n!(n-i)!i!=n(n-1)(n-i+1)i!=nii!n(n-1)(n-i+1)ni=nii!1-1n1-2n1-i-1nni2i!.

Now we prove Lemma 16.

Proof (Proof of Lemma 16):
We calculate
P1β=Pr[H(x,y)=1]Cn/2β-111βn11n1-1nn-11eCn/2β.
Since β>1, we know
Cn/2β=i=1n/2i-β1+1n/2x-βdx=1+(n/2)-β+1-1-β+11+1β-1=ββ-1.
(5)
Hence, P1ββ-1eβ.
For i[2..n], we have
nini2i!ni22πi(i/e)ie1/(12i)=122πiniiei-1/(12i),
where the second inequality uses Stirling's approximation. Using that jji(1-jn)n-i is unimodal and has its maximum at j=i, we compute
Piβ=Cn/2β-1j=1n/21jβnijni1-jnn-iei-1/(12i)22πiCn/2βj=1n/21jβjii1-jnn-iei-1/(12i)22πiCn/2βj=i-ii1jβjii1-jnn-iei-1/(12i)22πiCn/2β·iβj=i-iijii1-jnn-iei-1/(12i)(i+1)22πiCn/2β·iβi-iii1-i-inn-i122πCn/2βiβexpi-112iexp-ii-1exp-n-ini-i-1=122πCn/2βiβexpi-112i-ii-1-n-ini-i-1.
Now, we calculate
i-112i-ii-1-n-ini-i-1=-112i-ii-1+nin-i+i=-112i-ni-i2+ii(n-i+i)(i-1)-124-ni+ii(n-i)i/4=-124-42nn-i-1-124-42nn-n-1=-124-42n-1+1-124-422-1+1-82-13,
where the first inequality uses that ni-i2+ii0 for i<n and i-1i/4 for i2, the second inequality uses i<n, and the third inequality uses n2. Then we know
PiβCn/2β-122πiβe-82-13β-122πe82+13βi-β,
where the last inequality uses (5). Hence, this lemma is proven.

Equipping the standard GSEMO with this mutation operator mutβ gives Algorithm 3, which we call GSEMO-HTM.

graphic

7.2 GSEMO-HTM on OneJumpZeroJumpn,k

We now analyze the runtime of the GSEMO-HTM on OneJumpZeroJumpn,k. We start with an estimate for the time it takes to find the inner part of the Pareto front.

Lemma 18:

Consider the GSEMO-HTM optimizing the OneJumpZeroJumpn,k function. Let a0[2k..n]. If in a certain iteration, the point (a0,2k+n-a0) is covered by the current population, then all (a,2k+n-a), a[2k..n], in the Pareto front will be found in an expected number of at most 2P1βn(n-2k+3)(lnn2+1) iterations.

Proof:
The proof is very similar to the one of Lemma 10. As in the proof of Lemma 10, we ignore any positive effect of mutation flipping more than one bit. Note that the corresponding solution for (a,2k+n-a) in the Pareto front has a-k ones; the probability to find (a+1,2k+n-a-1) in the Pareto front, conditional on the population having one solution with function value (a,2k+n-a), is at least
1n-2k+3n-a+knP1β=(n-a+k)P1βn(n-2k+3).
Hence, the expected time to find all (a,2k+n-a) for a[a0..n] is at most
a=a0n-1n(n-2k+3)(n-a+k)P1β.
(6)
Similarly, the probability to find (a-1,2k+n-a+1) in the Pareto front conditional on the population containing a solution with function value (a,2k+n-a) is at least
1n-2k+3a-knP1β=(a-k)P1βn(n-2k+3),
and the expected time to find all (a,2k+n-a) for a[2k..a0] is at most
a=2k+1a0n(n-2k+3)(a-k)P1β.
(7)
By comparing the individual summands, the sum of (6) and (7) is at most
2a=n/2+kn-1n(n-2k+3)(n-a+k)P1β2n(n-2k+3)(lnn2+1)P1β.

We now estimate the time to find the two extreme Pareto front points after the coverage of the inner part of the Pareto front.

Lemma 19:

Consider the GSEMO-HTM optimizing the OneJumpZeroJumpn,k function. Assume that at some time, all (a,2k+n-a), a[2k..n], are covered by the current population. Then, the two missing (a,2k+n-a), a{k,n+k}, in the Pareto front will be found in an expected number of at most 32Pkβnk(n-2k+3) additional generations.

Proof:
We note that the probability to find (k,k+n) in the Pareto front from one solution with function value (2k,n), or to find (k+n,k) in the Pareto front from one solution with function value (n,2k) is at least
2n-2k+3nk-1Pkβ.
Hence, the expected time to find (k,n+k) or (k+n,k) is at most nk(n-2k+3)/(2Pkβ).
Without loss of generality, (k,n+k) is found first. Then, the probability to find (k+n,k) in the Pareto front from one solution with function value (n,2k) is at least
1n-2k+3nk-1Pkβ.
Hence, the expected additional time to find the last missing element in the Pareto front is at most nk(n-2k+3)/Pkβ. Therefore, this lemma is proven.

Now, we are ready to show the runtime for the GSEMO-HTM on OneJumpZeroJumpn,k.

Theorem 20:

The expected runtime of the GSEMO-HTM optimizing the OneJumpZeroJumpn,k function is at most (n-2k+3)O(kβ-0.5)Cn/2βnnkk(n-k)n-k.

Proof:
We first consider the time to find one (a,2k+n-a) for some a[2k..n] in the Pareto front. If the initial search point has a number of ones in [k,n-k], then (a,2k+n-a) for some a[2k..n] in the Pareto front is already found. Otherwise, we have the initial search point with at most k-1 ones or zeros. Without loss of generality, we consider the initial point with mk-1 zeros. Similar to the discussion in the proof of Theorem 12, we pessimistically add the waiting time to increase the number of zeros one by one until one individual with k zeros is found. This gives the expected number of iterations of at most
m=0k-11n-2k+3n-mnP1β-1m=0k-1n(n-2k+3)(n-m)P1βn(n-2k+3)(lnn+1)P1β.

Then from Lemma 18, we know that all (a,2k+n-a) for a[2k..n] in the Pareto front will be found in at most 2P1βn(n-2k+3)(ln(n-k)+1) iterations in expectation. After that, from Lemma 19, the points (k,n+k) and (n+k,k) enter the Pareto front in at most 2Pkβnk(n-2k+3) iterations in expectation.

Hence, the expected time to cover the Pareto front is at most
(n-2k+3)2n(lnn2+1)P1β+2Pkβnk+n(lnn+1)P1β=(n-2k+3)O(1)Cn/2βn(lnn+1)+O(kβ-0.5)Cn/2βnnkk(n-k)n-k=(n-2k+3)O(kβ-0.5)Cn/2βnnkk(n-k)n-k,
(8)
where the first equality uses Lemma 15.
To give a non-asymptotic runtime bound, by Lemma 16, for k[2..n], we know that the first line of (8) can be calculated as
(n-2k+3)2n(lnn2+1)P1β+2Pkβnk+n(lnn+1)P1β(n-2k+3)eββ-13n(lnn+1)+42πe82+13ββ-1kβn!k!(n-k)!.
Via the Stirling's approximation 2πnn+0.5e-nn!enn+0.5e-n and due to the fact that n/(n-k)2, we know that
n!k!(n-k)!enn+0.5e-n2πkk+0.5e-k2π(n-k)n-k+0.5e-(n-k)enn2πkk+0.5(n-k)n-k.
Hence, we easily obtain the following runtime for k[2..n].
Theorem 21:

Let nN and k[2..n]. Then the expected runtime of the GSEMO-HTM optimizing the OneJumpZeroJumpn,k function is at most (n-2k+3)(4e82+14β(β-1)πkβ-0.5nnkk(n-k)n-k+3eββ-1n(lnn+1)).

Comparing Theorem 12 and Theorem 20 (Theorem 21), we see that the asymptotic expected runtime of the GSEMO-HTM on OneJumpZeroJumpn,k is smaller than that of the GSEMO by a factor of around kk+0.5-β/ek.

In this section, we discuss the non-trivial question of how to adapt the stagnation-detection strategy proposed by Rajabi and Witt (2022) to multiobjective optimization, which increases the mutation rate with the time that no improvement has been found. We define a stagnation-detection variant of the GSEMO that has a slightly better asymptotic performance on OneJumpZeroJump than the GSEMO with heavy-tailed mutation. In contrast to this positive result on OneJumpZeroJump, we speculate that this algorithm may have difficulties with plateaus of constant fitness.

8.1 The Stagnation-Detection Strategy of Rajabi and Witt

Rajabi and Witt (2022) proposed the following strategy to adjust the mutation rate during the run of the (1+1) EA. We recall that the (1+1) EA is a very elementary EA working with a parent and offspring population size of one; that is, it generates in each iteration one offspring from the unique parent via mutation and accepts it if it is at least as good as the parent. The classic mutation operator for this algorithm is standard bit-wise mutation with mutation rate 1/n, that is, the offspring is generated by flipping each bit of the parent independently with probability 1/n.

The main idea of the stagnation-detection approach is as follows. Assume that the (1+1) EA for a longer time, say at least 10nlnn iterations, has not accepted any new solution. Then, with high probability, it has generated (and rejected) all Hamming neighbors of the parent. Consequently, there is no use to generate these solutions again and the algorithm should better concentrate on solutions further away from the parent. This can be achieved by increasing the mutation rate. For example, with a mutation rate of 2n the rate of Hamming neighbors produced is already significantly reduced; however, Hamming neighbors can still be generated, which is important in case we were unlucky so far and missed one of them.

More generally and more precisely, to implement this stagnation-detection approach the (1+1) EA maintains a counter (“failure counter”) that keeps track of how long the parent individual has not given rise to a better offspring. This counter determines the current mutation rate. This dependency is governed by a safety parameter R which is recommended to be at least n. Then for r=1,2, in this order the mutation rate r/n is used for
Tr:=2enrrln(nR)
(9)
iterations; the rate n/2 is used until an improvement is found, that is, the mutation rate is never increased above 1/2. When a strictly improving solution is found, the counter is reset to zero, and consequently, the mutation rate starts again at 1n.

Rajabi and Witt (2022) showed that the (1+1) EA with this strategy optimizes Jumpn,k with k=o(n) in time Ω((enk)k(1-k2n-k)) and O((enk)k). In particular, for k=o(n), a tight (apart from constant factors independent of k and n) bound of Θ((enk)k) is obtained. This is faster than the runtime of Θ(kβ-0.5(enk)k) proven by Doerr et al. (2017) for the (1+1) EA with heavy-tailed mutation with power-law exponent β>1 by a factor of kβ-0.5. For the recommended choice β=1.5, this factor is Θ(k).

8.2 Adaptation of the Stagnation-Detection Strategy to Multiobjective Optimization

The (1+1) EA being an algorithm without a real population, it is clear that certain adaptations are necessary to use the stagnation-detection strategy in multiobjective optimization.

8.2.1 Global or Individual Failure Counters

The first question is how to count the number of unsuccessful iterations. The following two obvious alternatives exist.

Individual counters: From the basic idea of the stagnation-detection strategy, the most natural solution is to equip each individual with its own counter. Whenever an individual is chosen as parent in the GSEMO, its counter is increased by one. New solutions (but see the following subsection for an important technicality of what “new” shall mean) entering the population (as well as the random initial individual) start with a counter value of zero.

A global counter: Algorithmically simpler is the approach to use only one global counter. This counter is increased in each iteration. When a new solution enters the population, the global counter is reset to zero.

We suspect that for many problems, both ways of counting give similar results. The global counter appears to be wasteful in the sense that when a new individual enters the population, parents that are contained in the population for a long time also re-start generating offspring with mutation rate 1n despite the fact that they have, with very high probability, already generated as offspring all solutions close by. On the other hand, often these “old individuals” do not generate solutions that enter the population anyway, so that an optimized choice of the mutation rate is less important.

For the OneJumpZeroJump problem, it is quite clear that this second effect is dominant. A typical run starts with some individual in the inner region of the Pareto front. In a relatively short time, the whole middle region is covered, and for this it suffices that relatively recent solutions generate a suitable Hamming neighbor as offspring. The runtime is dominated by the time to find the two extremal solutions and this will almost always happen from the closest parent in the middle region of the front (regardless of whether individual counters or a global counter is used). For this reason, we analyze in the following the simpler approach using a global counter.

8.2.2 Dealing with Indifferent Solutions

One question that becomes critical when using stagnation detection is how to deal with indifferent solutions, that is, which solution to put or keep in the population in the case that an offspring y has the same (multiobjective) fitness as an individual x already in the population. Since f(x)=f(y), we have xy and yx, that is, both solutions do an equally good job in dominating others and thus in approximating the Pareto front. In early works (e.g., Laumanns et al., 2002) proposing the SEMO algorithm, such later generated indifferent solutions do not enter the population. This is partially justified by the fact that in many of the problems regarded in these works, search points with equal fitness are fully equivalent for the future run of the algorithm. We note that our OneJumpZeroJump problem also has this property, so all results presented so far are valid regardless of how indifferent solutions are treated.

When non-equivalent search points with equal fitness exist, it is less obvious how to deal with indifferent solutions. In particular, it is clear that larger plateaus of constant fitness can be traversed much easier when a new indifferent solution is accepted, as this allows imitation of a random walk behavior on the plateau (Brockhoff et al., 2007). For that reason, and in analogy to single-objective optimization (Jansen and Wegener, 2001), it seems generally more appropriate to let a new indifferent solution enter the population, and this is also what most of the later works on the SEMO and GSEMO algorithm do (Friedrich et al., 2010, 2011; Qian et al., 2016; Li et al., 2016; Bian et al., 2018; Covantes Osuna et al., 2020).

Unfortunately, as mentioned in Section 1, it is not so clear how to handle indifferent solutions together with stagnation detection. In principle, when a new solution enters the population, the failure counter has to be reset to zero to reset the mutation rate to 1/n. Otherwise, the continued use of a high mutation rate would prohibit finding good solutions in the direct neighborhood of the new solution. However, the acceptance of indifferent solutions can also lead to unwanted resets. For the OneJumpZeroJump problem, for example, it is easy to see by mathematical means that in a typical run, it will happen very frequently that an indifferent solution is generated. If this enters the population with a reset of a global failure counter (or an individual counter), then the regular resets will prevent the counters from reaching interesting values. In a quick experiment for n=50, k=4, and a global counter, the largest counter value ever reached in this run of over 500,000,000 iterations was 5. Consequently, this SD-GSEMO was far from ever increasing the mutation rate and just imitated the classic GSEMO.

For this reason, in this work we regard the GSEMO with stagnation detection only in the variant that does not accept indifferent solutions, and we take note of the fact that thus our positive results on the stagnation-detection mechanism will not take over to problems with non-trivial plateaus of constant fitness.

8.2.3 Adjusting the Self-Adjustment

In the (1+1) EA with stagnation detection, Rajabi and Witt (2022) increased the mutation rate from rn to r+1n once the rate rn has been used for Tr iterations with Tr as defined in (9). This choice ensured that any particular target solution in Hamming distance r is found in this phase with probability at least 1-(nR)-2 (see the proof of Lemma 2 in Rajabi and Witt, 2022). Since in a run of the GSEMO with current population size |P|, each member of the population is chosen as parent only an expected number of Tr/|P| times in a time interval of length Tr, we need to adjust the parameter Tr. Not surprisingly, by taking
T˜r=2|P|enrrln(nR),
(10)
that is, roughly |P|Tr, the probability to generate any particular solution in Hamming distance r in phase r is at least
1-1-1|P|rnr1-rnn-r2|P|(en)rln(nR)/rr1-1(nR)2,
which is sufficient for this purpose. Note that the population size |P| changes only if a new solution enters the population and in this case the mutation rate is reset to 1n. Hence, the definition of T˜r, with the convention that we suppress |P| in the notation to ease reading, is unambiguous.

8.2.4 The GSEMO with Stagnation Detection: SD-GSEMO

Putting the design choices discussed so far together, we obtain the following variant of the GSEMO, called SD-GSEMO. Its pseudocode is shown in Algorithm 4.

graphic

8.3 Runtime Analysis of the SD-GSEMO on OneJumpZeroJumpn,k

We now analyze the runtime of the SD-GSEMO on the OneJumpZeroJump function class. This will show that its expected runtime is by at least a factor of Ω(k) smaller than the one of the heavy-tailed GSEMO (which was a factor of kΩ(k) smaller than the one of the standard GSEMO).

We extract one frequently used calculation into Lemma 23 to make our main proof clearer. The proof of Lemma 23 uses the following lemma.

Lemma 22 (Lemma 2.1; Rajabi and Witt, 2020):

Let m,nN and m<n. Then i=1m(eni)i<nn-m(enm)m.

Lemma 23:
Let c0, a,nN, and an2. Then
r=a+1n/21nc(r-a)i=1renii2enaa1nc-1a+12n2c-3a2.
Proof:
We compute
r=a+1n/21nc(r-a)i=1reniir=a+1n/21nc(r-a)nn-renrr2r=a+1n/2errnrnc(r-a)2ea+1a+1na+1nc+2n2-aea+2a+2na+2n2c2enaaenc-1a1-1a+1a+1+e22n2c-3a21-2a+2a+22enaa1nc-1a+12n2c-3a2,
where the first inequality uses Lemma 22.

As for all algorithms discussed in this work, the most costly part of the optimization process is finding the two extremal points of the Pareto front. Borrowing many ideas from the proof of Rajabi and Witt (2020, Theorem 3.2), we show the following estimate for the time to find these extremal points when the inner part of the Pareto front is already covered.

Lemma 24:

Consider the SD-GSEMO optimizing the OneJumpZeroJumpn,k function. Consider the first time T0 that all (a,2k+n-a), a[2k..n], are covered by the current population. Then the two possibly missing elements (a,2k+n-a), a{k,n+k}, of the Pareto front will be found in an expected number of at most (n-2k+3)(enk)k(32+(4kn+12nk)ln(nR))+2k additional generations.

Proof:
Let T' denote the first time (additional number of iterations) that one of 0n and 1n is found. Let phase r be the period when the SD-GSEMO uses the bit-wise mutation rate rn, and let Ar denote the event that either 0n or 1n is found in phase r. With the pessimistic assumption that neither 0n nor 1n is found during phase r for all r<k, we have
E[T']=r=1n/2E[T'Ar]Pr[Ar]r=kn/2E[T'Ar]Pr[Arrk].
Conditional on the event Ar, the time T' to find the first extremal point includes all time spent in phases 1 to r-1, that is, at most T˜1++T˜r-1 with T˜i defined as in (10) for |P|=n-2k+3. In phase r, each iteration has a probability of at least pr=2n-2k+3rnk1-rnn-k to find an extremal point (via choosing one of the boundary points of the inner region as parent and then flipping exactly the right k bits in it). Consequently, the time spent in phase r follows a geometric law with success probability p conditional on being at most T˜r. Since for any random variable X and any number T such that Pr[XT]>0 we have E[XXT]E[X], the expected number of iterations spent in phase r conditional on Ar is at most 1pr.
From this, and recalling from Corollary 8 that the population size is at most n-2k+3, we compute
E[T'Ak]T˜1++T˜k-1+1pkr=1k-12(n-2k+3)enrrln(nR)+2n-2k+3knk1-knn-k-12(n-2k+3)ln(nR)nn-k+1enk-1k-1+k-1+n-2k+32enkkn-2k+32enkk4kln(nR)en1+1k-1k-1+1+k-1n-2k+32enkk4kln(nR)n+1+k-1,
(11)
where the second inequality uses Lemma 22.
We observe that for rk, the probability that during phase r neither 0n nor 1n is found is at most
1-2|P|rnk1-rnn-k2|P|(en)rln(nR)/rr1(nR)4.
(12)
Hence, we have Pr[Ar]i=kr-11(nR)4(nR)-4(r-k)n-4(r-k) for r>k. Noting that the estimate E[T'Ar]i=1rT˜i holds trivially for r<n2, and that it also holds for r=n2 due to 1pr2(n-2k+3)(2e)n/2ln(nR)=T˜n2, we compute
r=k+1n/2E[T'Ar]Pr[Ar]r=k+1n/21n4(r-k)i=1r2(n-2k+3)eniiln(nR)4(n-2k+3)ln(nR)enkk1n3k+12n5k2+r=k+1n/2rn4(r-k)4(n-2k+3)ln(nR)enkk1n3k+12n5k2+r=k+1n/2n2n44(n-2k+3)ln(nR)enkk1n3k+12n5k2+1,
(13)
where we use Lemma 23 for the second inequality.
Noting Pr[Ak]1, together with (11) and (13), we have
E[T']n-2k+32enkk4kln(nR)n+1+1n3k+12n5k28ln(nR)+k.
We omit a precise statement of the very similar calculation for the time to generate the second extremal point. The only differences are that the expression 2n-2k+3 in the definition of p has to be replaced by 1n-2k+3 and that the expression 2|P| in (12) has to be replaced by 1|P|, both due to the fact that now only one parent individual can generate the desired solution via a k-bit flip. With this, we obtain that the expected additional number of iterations to generate the second extremal point is at most
(n-2k+3)enkk2kln(nR)n+1+1nk+12nk24ln(nR)+k.
With 4(1nk+12nk2)+8(12n3k+14n5k2)12nk, this lemma is proven.

Now, we are ready to show the runtime for the SD-GSEMO on the OneJumpZeroJumpn,k function.

Theorem 25:
The expected runtime of the SD-GSEMO optimizing the OneJumpZeroJumpn,k function is at most
(n-2k+3)enkk32+4kn+12nkln(nR)+3e(n-2k+3)(nlnn+2(n-2)ln(nR)).
For k=o(n/ln(nR)), this bound is at most
(n-2k+3)32+o(1)enkk+3en(lnn+2ln(nR)).
Proof:

We first consider the expected time until we have all (a,2k+n-a), a[2k..n], in the current Pareto front. Our main argument is the following adaptation of the analysis of this period for the standard GSEMO in the proof of Theorem 12. The main argument there was that the time spent in this period can be estimated by a sum of at most n-2 waiting times for the generation of a particular Hamming neighbor of a particular member of the population. The same argument, naturally, is valid for the SD-GSEMO, and fortunately, we can also reuse the computation of these waiting times. Consider the time such a subperiod now takes with the SD-GSEMO. If we condition on the subperiod ending before the rate is increased above the initial value of 1/n, then the expected time is at most the expected time of this subperiod for the standard GSEMO (this is the same argument as used in the proof of Lemma 24). Consequently, the total time of this period for the SD-GSEMO is at most the corresponding time for the standard GSEMO (en(n-2k+3)(lnn+1)+2en(n-2k+3)(ln(n-k)+1) as determined in the proof of Theorem 12) plus the additional time caused by subperiods using rates r/n with r2.

We prove an upper bound for this time valid uniformly for all subperiods. Let phase r be the time interval of this subperiod when the SD-GSEMO uses the bit-wise mutation rate rn. We recall that at all times the population contains an individual x such that there is a search point y that can dominate at least one individual in the current population and that y is a Hamming neighbor of x, that is, can be generated from x by flipping a single bit. Hence for all r1, the probability that this y is not found in phase r is at most
1-1|P|1-rnn-1rn2|P|(en)rln(nR)/rr1(nR)2.
Thus with the probability at least 1-1(nR)2, the desired search point y will be found in phase r. Hence, if y is not found in phase r=1, then analogous to (13) in the proof of Lemma 24, also noting that the estimate E[T'Ar]i=1rT˜i holds trivially for r<n2 and it also holds for r=n2 since |P|2n2|P|(2e)n/2ln(nR)=T˜n2, the expected time spent in phases from 2 to n/2 is at most
r=2n/21(nR)2(r-1)i=1r2|P|eniiln(nR)4en|P|ln(nR)32n=6e|P|ln(nR),
where we use Lemma 23 for the first inequality.
With this, the total additional runtime caused by the stagnation-detection strategy compared with the GSEMO to find all (a,2k+n-a), a[2k..n] in the Pareto front is at most (n-2)6e|P|ln(nR). Therefore, together with the runtime for GSEMO to find all (a,2k+n-a), a[2k..n], in the Pareto front in the proof of Theorem 12, we know the expected time when all (a,2k+n-a), a[2k..n], are contained in the current Pareto front is at most
en(n-2k+3)(lnn+1)+2en(n-2k+3)(ln(n-k)+1)+(n-2)6e(n-2k+3)ln(nR))=3e(n-2k+3)(nlnn+2(n-2)ln(nR)).

Together with the time to find the remaining two elements in the Pareto front in Lemma 24, this theorem is proven.

Assume that, as suggested in Rajabi and Witt (2020), the control parameter R is set to n. Then the dominating element of the upper bound in Theorem 25 becomes (n-2k+3)(enk)k(32+8(kn+3nk)lnn). Hence, if k=O(nlnn), then the runtime of SD-GSEMO on OneJumpZeroJumpn,k is O((n-2k)(enk)k).

To understand the performance of the algorithms discussed in this work for concrete problem sizes (for which an asymptotic mathematical analysis cannot give definite answers), we now conduct a simple experimental analysis. Since the SEMO cannot find the Pareto front, we did not include it in this investigation. We did include the variant of the SD-GSEMO, denoted by SD-GSEMO-Ind, in which each individual has its own failure counter (see the discussion in Section 8). Our experimental settings are the same for all algorithms.

  • OneJumpZeroJumpn,k: jump size k=4 and problem size n=10,14,,50.

  • Maximal iterations for the termination: nk+3. This number of iterations was reached in none of the experiments, i.e., the results reported are the true runtimes until the full Pareto front was found.

  • β=1.5 as suggested in Doerr et al. (2017) for the power-law distribution in GSEMO-HTM.

  • R=n for SD-GSEMO and SD-GSEMO-Ind as suggested in Rajabi and Witt (2020).

  • 20 independent runs for each setting.

Figure 4 shows the median number of function evaluations of GSEMO, GSEMO-HTM, SD-GSEMO, and SD-GSEMO-Ind on the OneJumpZeroJumpn,k function. To see how the experimental results compare with our bounds, we also plot (i) the curve 1.5e(n-2k)nk corresponding to the bounds for the GSEMO in Theorems 12 and 13, and (ii) the curve (n-2k)(en)k/kk-1 for the GSEMO-HTM with β=1.5; since the leading constant in Theorem 20 is implicit, we chose a constant such that the curve matches the experimental data. We observe that for n18, Theorem 21 have given an upper bound of 4e82+14β(β-1)π6×1011 for the leading constant, which is far larger than 1 and indicates that further improvements of the leading constant estimate are possible, and (iii) the curve 1.5(n-2k)(en)k/kk corresponding to the upper bound of SD-GSEMO with R=n in Theorem 25.
Figure 4:

The median number of function evaluations (with the first and third quartiles) of GSEMO, GSEMO-HTM, SD-GSEMO, and SD-GSEMO-Ind on OneJumpZeroJumpn,k with k=4 and n=10:4:50 in 20 independent runs.

Figure 4:

The median number of function evaluations (with the first and third quartiles) of GSEMO, GSEMO-HTM, SD-GSEMO, and SD-GSEMO-Ind on OneJumpZeroJumpn,k with k=4 and n=10:4:50 in 20 independent runs.

Close modal

We clearly see that these curves, in terms of shape and, where known, in terms of leading constants, match well the estimates of our theoretical runtime results. We also see, as predicted by informal considerations, the similarity of the performance of the SD-GSEMO and the SD-GSEMO-Ind. Finally, our experiments show that the different runtime behaviors are already visible for moderate (and thus realistic) problem sizes and not only in the asymptotic sense in which they were proven. In particular, we observe a performance improvement by a factor of (roughly) 5 through the use of heavy-tailed mutation and by a factor of (roughly) 10 with the stagnation-detection strategy.

Noting that, in contrast to single-objective evolutionary computation, no broadly accepted multimodal benchmarks exist in the theory of MOEAs, we proposed the OneJumpZeroJump benchmark, which is a multiobjective analogue of the single-objective jump problem.

We use this benchmark to show that several insights from single-objective EA theory extend to the multiobjective world. Naturally, algorithms using 1-bit mutation such as the SEMO cannot optimize our benchmark, that is, cannot compute a population that covers the full Pareto front. In contrast, for all problem sizes n and jump sizes k[4..n2-1], the GSEMO covers the Pareto front in Θ((n-2k)nk) iterations in expectation. Heavy-tailed mutation rates and the stagnation-detection mechanism of Rajabi and Witt (2022) give a runtime improvement over the standard GSEMO by a factor of kΩ(k), with the stagnation-detection GSEMO slightly ahead by a small polynomial factor in k. These are the same gains as observed for the single-objective Jump benchmark with gap size k. Our experiments confirm this performance ranking already for moderate problem sizes, with the stagnation-detection GSEMO slightly further ahead than what the asymptotically small advantage suggests. On the downside, adapting the stagnation-detection mechanism to MOEAs requires making several design choices, among which the question of how to treat indifferent solutions could be difficult for problems having larger plateaus of constant fitness.

Overall, this work suggests that the recently developed ideas to cope with multimodality in single-objective evolutionary optimization can be effective in multiobjective optimization as well. In this first work in this direction, we only concentrated on mutation-based algorithms. The theory of evolutionary computation has also observed that crossover and estimation-of-distribution algorithms can be helpful in multimodal optimization. Investigating to what degree these results extend into multiobjective optimization is clearly an interesting direction for future research.

Also, we only covered very simple MOEAs in this work. Analyzing more complex MOEAs amenable to mathematical runtime anlyses, such as the decomposition-based MOEA/D (Zhang and Li, 2007; Li et al., 2016) or the NSGA-II (Deb et al., 2002; Zheng et al., 2022) would be highly interesting. For the MOEA/D, this would most likely require an adaptation of our benchmark problem. Since the difficult-to-find extremal points of the front are just the solutions of the single-objective sub-problems, and thus the two problems that naturally are part of the set of subproblems regarded by the MOEA/D, this algorithm might have an unfair advantage on the OneJumpZeroJump problem.

Work conducted after this research: After the publication of Doerr and Zheng (2021), the following works have used the OneJumpZeroJump benchmark. In Doerr and Qu (2022), the performance of the NSGA-II, both with standard-bit mutation and with heavy-tailed mutation, on the OneJumpZeroJump benchmark was analyzed. When the population size N is at least four times the size n-2k+3 of the Pareto front and 2<kn/4, then the standard NSGA-II finds the Pareto front in time O(Nnk). Hence the NSGA-II and the GSEMO satisfy the same asymptotic runtime guarantees when N is chosen asymptotically optimal (that is, N=Θ(n-2k)). With heavy-tailed mutation, again a kΩ(k) improvement is obtained. In this work, an NSGA-II was regarded that does not use crossover, but it is clear that the same upper bounds could have been shown when crossover with rate less than one was used. Doerr and Qu (2023b) show that the upper bound of O(Nnk) for the NSGA-II with standard-bit mutation is asymptotically tight in many cases: When N=o(n2/k2), the runtime is Ω(Nnk). Consequently, in this regime the NSGA-II does not profit from larger population sizes (not even when regarding the parallel runtime, that is, the number of iterations). Doerr and Qu (2023a) prove that uniform crossover can give super-constant speed-ups for the NSGA-II optimizing OneJumpZeroJump functions. This result is not totally surprising given that Dang et al. (2018) have shown that crossover speeds up the (μ+1) EA on single-objective jump functions, but the reason for the speed-ups shown in Doerr and Qu (2023a) is different (and, in fact, can be translated to different, and sometimes stronger, speed-ups for the (μ+1) EA on jump functions). These recent results show that a jump-like benchmark, as proposed in this work, is useful in multiobjective evolutionary computation.

This work was supported by a public grant as part of the Investissement d'avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH.

This work was also supported by the Science, Technology and Innovation Commission of Shenzhen Municipality (Grant No. GXWD20220818191018001), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110177), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), and Shenzhen Science and Technology Program (Grant No. KQTD2016112514355531).

Antipov
,
D.
,
Buzdalov
,
M.
, and
Doerr
,
B.
(
2021
).
Lazy parameter tuning and control: choosing all parameters randomly from a power-law distribution
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
1115
1123
.
Antipov
,
D.
,
Buzdalov
,
M.
, and
Doerr
,
B.
(
2022
).
Fast mutation in crossover-based algorithms
.
Algorithmica
,
84
:
1724
1761
.
Antipov
,
D.
, and
Doerr
,
B.
(
2020
).
Runtime analysis of a heavy-tailed (1+(λ,λ)) genetic algorithm on jump functions
. In
Proceedings of the International Conference on Parallel Problem Solving from Nature, Part II
, pp.
545
559
.
Antipov
,
D.
,
Doerr
,
B.
, and
Karavaev
,
V.
(
2022
).
A rigorous runtime analysis of the (1+(λ,λ)) GA on jump functions
.
Algorithmica
,
84
:
1573
1602
.
Bäck
,
T.
(
1993
).
Optimal mutation rates in genetic search
. In
Proceedings of the International Conference on Genetic Algorithms
, pp.
2
8
.
Benbaki
,
R.
,
Benomar
,
Z.
, and
Doerr
,
B.
(
2021
).
A rigorous runtime analysis of the 2-MMAS ib on jump functions: Ant colony optimizers can cope well with local optima
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
4
13
.
Bian
,
C.
, and
Qian
,
C.
(
2022
).
Better running time of the non-dominated sorting genetic algorithm II (NSGA-II) by using stochastic tournament selection
. In
Proceedings of the International Conference on Parallel Problem Solving from Nature
, pp.
428
441
.
Bian
,
C.
,
Qian
,
C.
, and
Tang
,
K.
(
2018
).
A general approach to running time analysis of multi-objective evolutionary algorithms
. In
Proceedings of the International Joint Conference on Artificial Intelligence
, pp.
1405
1411
.
Brockhoff
,
D.
,
Friedrich
,
T.
,
Hebbinghaus
,
N.
,
Klein
,
C.
,
Neumann
,
F.
, and
Zitzler
,
E.
(
2007
).
Do additional objectives make a problem harder?
In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
765
772
.
Brockhoff
,
D.
,
Friedrich
,
T.
, and
Neumann
,
F.
(
2008
).
Analyzing hypervolume indicator based algorithms
. In
Proceedings of the International Conference on Parallel Problem Solving from Nature
, pp.
651
660
.
Corus
,
D.
,
Oliveto
,
P. S.
, and
Yazdani
,
D.
(
2021
).
Automatic adaptation of hypermutation rates for multimodal optimisation
. In
Proceedings of Foundations of Genetic Algorithms
, pp.
4:1
4:12
.
Covantes Osuna
,
E.
,
Gao
,
W.
,
Neumann
,
F.
, and
Sudholt
,
D.
(
2020
).
Design and analysis of diversity-based parent selection schemes for speeding up evolutionary multi-objective optimisation
.
Theoretical Computer Science
,
832
:
123
142
.
Dang
,
D.
,
Eremeev
,
A. V.
,
Lehre
,
P. K.
, and
Qin
,
X.
(
2022
).
Fast non-elitist evolutionary algorithms with power-law ranking selection
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
1372
1380
.
Dang
,
D.-C.
,
Friedrich
,
T.
,
Kötzing
,
T.
,
Krejca
,
M. S.
,
Lehre
,
P. K.
,
Oliveto
,
P. S.
,
Sudholt
,
D.
, and
Sutton
,
A. M.
(
2018
).
Escaping local optima using crossover with emergent diversity
.
IEEE Transactions on Evolutionary Computation
,
22
:
484
497
.
Deb
,
K.
,
Pratap
,
A.
,
Agarwal
,
S.
, and
Meyarivan
,
T.
(
2002
).
A fast and elitist multiobjective genetic algorithm: NSGA-II
.
IEEE Transactions on Evolutionary Computation
,
6
:
182
197
.
Doerr
,
B.
(
2020
).
Probabilistic tools for the analysis of randomized optimization heuristics
. In
B.
Doerr
and
F.
Neumann
(Eds.)
,
Theory of evolutionary computation: Recent developments in discrete optimization
, pp.
1
87
.
Springer
. https://arxiv.org/abs/1801.06733
Doerr
,
B.
(
2021
).
The runtime of the compact genetic algorithm on Jump functions
.
Algorithmica
,
83
:
3059
3107
.
Doerr
,
B.
, and
Doerr
,
C.
(
2020
).
Theory of parameter control for discrete black-box optimization: Provable performance gains through dynamic parameter choices
. In
B.
Doerr
and
F.
Neumann
(Eds.)
,
Theory of evolutionary computation: Recent developments in discrete optimization
, pp.
271
321
.
Springer
. https://arxiv.org/abs/1804.05650
Doerr
,
B.
,
Gao
,
W.
, and
Neumann
,
F.
(
2016
).
Runtime analysis of evolutionary diversity maximization for OneMinMax
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
557
564
.
Doerr
,
B.
,
Hadri
,
O. E.
, and
Pinard
,
A.
(
2022
).
The (1+(λ,λ)) global SEMO algorithm
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
520
528
.
Doerr
,
B.
,
Jansen
,
T.
, and
Klein
,
C.
(
2008
).
Comparing global and local mutations on bit strings
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
929
936
.
Doerr
,
B.
,
Kodric
,
B.
, and
Voigt
,
M.
(
2013
).
Lower bounds for the runtime of a global multi-objective evolutionary algorithm
. In
Proceedings of the Congress on Evolutionary Computation
, pp.
432
439
.
Doerr
,
B.
,
Le
,
H. P.
,
Makhmara
,
R.
, and
Nguyen
,
T. D.
(
2017
).
Fast genetic algorithms
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
777
784
.
Doerr
,
B.
, and
Qu
,
Z.
(
2022
).
A first runtime analysis of the NSGA-II on a multimodal problem
. In
Proceedings of the International Conference on Parallel Problem Solving from Nature
, pp.
399
412
.
Doerr
,
B.
, and
Qu
,
Z.
(
2023a
).
The first mathematical proof that crossover gives super-constant performance gains for the NSGA-II
. In
Proceedings of the Conference on Artificial Intelligence
(forthcoming).
Doerr
,
B.
, and
Qu
,
Z.
(
2023b
).
From understanding the population dynamics of the NSGA-II to the first proven lower bounds
. In
Proceedings of the Conference on Artificial Intelligence
(forthcoming).
Doerr
,
B.
, and
Zheng
,
W.
(
2021
).
Theoretical analyses of multi-objective evolutionary algorithms on multi-modal objectives
. In
Proceedings of the Conference on Artificial Intelligence
, pp.
12293
12301
.
Droste
,
S.
(
2004
).
Analysis of the (1+1) EA for a noisy OneMax
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
1088
1099
.
Droste
,
S.
,
Jansen
,
T.
, and
Wegener
,
I.
(
2002
).
On the analysis of the (1+1) evolutionary algorithm
.
Theoretical Computer Science
,
276
:
51
81
.
Feng
,
C.
,
Qian
,
C.
, and
Tang
,
K.
(
2019
).
Unsupervised feature selection by Pareto optimization
. In
Proceedings of the AAAI Conference on Artificial Intelligence
, pp.
3534
3541
.
Friedrich
,
T.
,
Hebbinghaus
,
N.
, and
Neumann
,
F.
(
2010
).
Plateaus can be harder in multi-objective optimization
.
Theoretical Computer Science
,
411
:
854
864
.
Friedrich
,
T.
,
Horoba
,
C.
, and
Neumann
,
F.
(
2011
).
Illustration of fairness in evolutionary multi-objective optimization
.
Theoretical Computer Science
,
412
:
1546
1556
.
Giel
,
O.
(
2003
).
Expected runtimes of a simple multi-objective evolutionary algorithm
. In
Proceedings of the Congress on Evolutionary Computation
, pp.
1918
1925
.
Giel
,
O.
, and
Lehre
,
P. K.
(
2010
).
On the effect of populations in evolutionary multi-objective optimisation
.
Evolutionary Computation
,
18
:
335
356
.
Greiner
,
G.
(
2009
).
Single- and multi-objective evolutionary algorithms for graph bisectioning
. In
Proceedings of Foundations of Genetic Algorithms
, pp.
29
38
.
Gutjahr
,
W. J.
(
2012
).
Runtime analysis of an evolutionary algorithm for stochastic multi-objective combinatorial optimization
.
Evolutionary Computation
,
20
:
395
421
.
Hasenöhrl
,
V.
, and
Sutton
,
A. M.
(
2018
).
On the runtime dynamics of the compact genetic algorithm on jump functions
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
967
974
.
Horoba
,
C.
(
2009
).
Analysis of a simple evolutionary algorithm for the multiobjective shortest path problem
. In
Proceedings of the International Workshop on Foundations of Genetic Algorithms
, pp.
113
120
.
Huang
,
Z.
, and
Zhou
,
Y.
(
2020
).
Runtime analysis of somatic contiguous hypermutation operators in MOEA/D framework
. In
Proceedings of the Conference on Artificial Intelligence
, pp.
2359
2366
.
Huang
,
Z.
,
Zhou
,
Y.
,
Chen
,
Z.
, and
He
,
X.
(
2019
).
Running time analysis of MOEA/D with crossover on discrete optimization problem
. In
Proceedings of the Conference on Artificial Intelligence
, pp.
2296
2303
.
Huang
,
Z.
,
Zhou
,
Y.
,
Luo
,
C.
, and
Lin
,
Q.
(
2021
).
A runtime analysis of typical decomposition approaches in MOEA/D framework for many-objective optimization problems
. In
Proceedings of the International Joint Conference on Artificial Intelligence
, pp.
1682
1688
.
Jägersküpper
,
J.
, and
Storch
,
T.
(
2007
).
When the plus strategy outperforms the comma strategy and when not
. In
Proceedings of Foundations of Computational Intelligence
, pp.
25
32
.
Jansen
,
T.
, and
Wegener
,
I.
(
2001
).
Evolutionary algorithms—How to cope with plateaus of constant fitness and when to reject strings of the same fitness
.
IEEE Transactions on Evolutionary Computation
,
5
:
589
599
.
Jansen
,
T.
, and
Wegener
,
I.
(
2002
).
The analysis of evolutionary algorithms—A proof that crossover really can help
.
Algorithmica
,
34
:
47
66
.
Jansen
,
T.
, and
Wegener
,
I.
(
2005
).
Real royal road functions—Where crossover provably is essential
.
Discrete Applied Mathematics
,
149
:
111
125
.
Kumar
,
R.
, and
Banerjee
,
N.
(
2006
).
Analysis of a multiobjective evolutionary algorithm on the 0–1 knapsack problem
.
Theoretical Computer Science
,
358
:
104
120
.
Laumanns
,
M.
,
Thiele
,
L.
, and
Zitzler
,
E.
(
2004a
).
Running time analysis of evolutionary algorithms on a simplified multiobjective knapsack problem
.
Natural Computing
,
3
:
37
51
.
Laumanns
,
M.
,
Thiele
,
L.
, and
Zitzler
,
E.
(
2004b
).
Running time analysis of multiobjective evolutionary algorithms on pseudo-Boolean functions
.
IEEE Transactions on Evolutionary Computation
,
8
:
170
182
.
Laumanns
,
M.
,
Thiele
,
L.
,
Zitzler
,
E.
,
Welzl
,
E.
, and
Deb
,
K.
(
2002
).
Running time analysis of multi-objective evolutionary algorithms on a simple discrete optimization problem
. In
Proceedings of the International Conference on Parallel Problem Solving from Nature
, pp.
44
53
.
Lehre
,
P. K.
, and
Nguyen
,
P.T.H.
(
2019
).
On the limitations of the univariate marginal distribution algorithm to deception and where bivariate EDAs might help
. In
Proceedings of the International Workshop on Foundations of Genetic Algorithms
, pp.
154
168
.
Li
,
Y.-L.
,
Zhou
,
Y.-R.
,
Zhan
,
Z.-H.
, and
Zhang
,
J.
(
2016
).
A primary theoretical study on decomposition-based multiobjective evolutionary algorithms
.
IEEE Transactions on Evolutionary Computation
,
20
:
563
576
.
Liang
,
J. J.
,
Yue
,
C.
, and
Qu
,
B.
(
2016
).
Multimodal multi-objective optimization: A preliminary study
. In
Proceedings of the Congress on Evolutionary Computation
, pp.
2454
2461
.
Mühlenbein
,
H.
(
1992
).
How genetic algorithms really work: mutation and hillclimbing
. In
Proceedings of the International Conference on Parallel Problem Solving from Nature
, pp.
15
26
.
Neumann
,
F.
(
2007
).
Expected runtimes of a simple evolutionary algorithm for the multi-objective minimum spanning tree problem
.
European Journal of Operational Research
,
181
:
1620
1629
.
Neumann
,
F.
, and
Reichel
,
J.
(
2008
).
Approximating minimum multicuts by evolutionary multi-objective algorithms
. In
Proceedings of Parallel Problem Solving from Nature
, pp.
72
81
.
Neumann
,
F.
, and
Theile
,
M.
(
2010
).
How crossover speeds up evolutionary algorithms for the multi-criteria all-pairs-shortest-path problem
. In
Proceedings of Parallel Problem Solving from Nature, Part I
, pp.
667
676
.
Preuss
,
M.
(
2015
).
Multimodal optimization by means of evolutionary algorithms
.
Springer
.
Qian
,
C.
,
Bian
,
C.
, and
Feng
,
C.
(
2020
).
Subset selection by Pareto optimization with recombination
. In
Proceedings of the AAAI Conference on Artificial Intelligence
, pp.
2408
2415
.
Qian
,
C.
,
Tang
,
K.
, and
Zhou
,
Z.-H.
(
2016
).
Selection hyper-heuristics can provably be helpful in evolutionary multi-objective optimization
. In
Proceedings of the International Conference on Parallel Problem Solving from Nature
, pp.
835
846
.
Qian
,
C.
,
Yu
,
Y.
, and
Zhou
,
Z.
(
2013
).
An analysis on recombination in multi-objective evolutionary optimization
.
Artificial Intelligence
,
204
:
99
119
.
Qian
,
C.
,
Yu
,
Y.
, and
Zhou
,
Z.
(
2015
).
On constrained Boolean Pareto optimization
. In
Proceedings of the International Joint Conference on Artificial Intelligence
, pp.
389
395
.
Qian
,
C.
,
Zhang
,
Y.
,
Tang
,
K.
, and
Yao
,
X.
(
2018
).
On multiset selection with size constraints
. In
Proceedings of the AAAI Conference on Artificial Intelligence
, pp.
1395
1402
.
Rajabi
,
A.
, and
Witt
,
C.
(
2020
).
Self-adjusting evolutionary algorithms for multimodal optimization
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
1314
1322
.
Rajabi
,
A.
, and
Witt
,
C.
(
2022
).
Self-adjusting evolutionary algorithms for multimodal optimization
.
Algorithmica
,
84
:
1694
1723
.
Roostapour
,
V.
,
Bossek
,
J.
, and
Neumann
,
F.
(
2020
).
Runtime analysis of evolutionary algorithms with biased mutation for the multi-objective minimum spanning tree problem
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
551
559
.
Roostapour
,
V.
,
Neumann
,
A.
,
Neumann
,
F.
, and
Friedrich
,
T.
(
2022
).
Pareto optimization for subset selection with dynamic cost constraints
.
Artificial Intelligence
,
302
:103597.
Rudolph
,
G.
(
1997
).
Convergence properties of evolutionary algorithms
.
Verlag Dr. Kovac
.
Tanabe
,
R.
, and
Ishibuchi
,
H.
(
2020
).
A review of evolutionary multimodal multiobjective optimization
.
IEEE Transactions on Evolutionary Computation
,
24
:
193
200
.
Thierens
,
D.
(
2003
).
Convergence time analysis for the multi-objective counting ones problem
. In
Proceedings of the International Conference on Evolutionary Multi-Criterion Optimization
, pp.
355
364
.
Wegener
,
I.
(
2001
).
Theoretical aspects of evolutionary algorithms
. In
Proceedings of the International Colloquium on Automata, Languages and Programming
, pp.
64
78
.
Witt
,
C.
(
2023
).
How majority-vote crossover and estimation-of-distribution algorithms cope with fitness valleys
.
Theoretical Computer Science
,
940
:
18
42
.
Zhang
,
Q.
, and
Li
,
H.
(
2007
).
MOEA/D: A multiobjective evolutionary algorithm based on decomposition
.
IEEE Transactions on Evolutionary Computation
,
11
:
712
731
.
Zheng
,
W.
, and
Doerr
,
B.
(
2022
).
Better approximation guarantees for the NSGA-II by using the current crowding distance
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)
, pp.
611
619
.
Zheng
,
W.
,
Liu
,
Y.
, and
Doerr
,
B.
(
2022
).
A first mathematical runtime analysis of the Non-Dominated Sorting Genetic Algorithm II (NSGA-II)
. In
Proceedings of the Conference on Artificial Intelligence
, pp.
10408
10416
.
Zhou
,
A.
,
Qu
,
B.-Y.
,
Li
,
H.
,
Zhao
,
S.-Z.
,
Suganthan
,
P. N.
, and
Zhang
,
Q.
(
2011
).
Multiobjective evolutionary algorithms: A survey of the state of the art
.
Swarm and Evolutionary Computation
,
1
:
32
49
.

Author notes

*A preliminary short version was published in the Proceedings of AAAI 2021 (Doerr and Zheng, 2021).

This work was partially done while Weijie Zheng was with Southern University of Science and Technology, Shenzhen, China.

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode