## Abstract

Many optimization tasks must be handled in noisy environments, where the exact evaluation of a solution cannot be obtained, only a noisy one. For optimization of noisy tasks, evolutionary algorithms (EAs), a type of stochastic metaheuristic search algorithm, have been widely and successfully applied. Previous work mainly focuses on the empirical study and design of EAs for optimization under noisy conditions, while the theoretical understandings are largely insufficient. In this study, we first investigate how noisy fitness can affect the running time of EAs. Two kinds of noise-helpful problems are identified, on which the EAs will run faster with the presence of noise, and thus the noise should not be handled. Second, on a representative noise-harmful problem in which the noise has a strong negative effect, we examine two commonly employed mechanisms dealing with noise in EAs: *reevaluation* and *threshold selection*. The analysis discloses that using these two strategies simultaneously is effective for the one-bit noise but ineffective for the asymmetric one-bit noise. Smooth threshold selection is then proposed, which can be proved to be an effective strategy to further improve the noise tolerance ability in the problem. We then complement the theoretical analysis by experiments on both synthetic problems as well as two combinatorial problems, the minimum spanning tree and the maximum matching. The experimental results agree with the theoretical findings and also show that the proposed smooth threshold selection can deal with the noise better.

## 1 Introduction

Optimization tasks often encounter noisy environments. For example, in industrial design such as VLSI design (Guo et al., 2014), every prototype is evaluated by simulations; therefore, the result of the evaluation may not be perfect due to the simulation error. Also, with machine learning, a prediction model is evaluated only on a limited amount of data (Qian et al., 2015a); therefore, the estimated performance is shifted from the true performance. It is possible that noisy environments change the properties of an optimization problem; thus traditional optimization techniques may have low efficacy. Meanwhile, evolutionary algorithms (EAs) (Bäck, 1996) have been widely and successfully adopted for noisy optimization tasks (Freitas, 2003; Ma et al., 2006; Chang and Chen, 2006).

EAs are a type of randomized metaheuristic optimization algorithm, inspired by natural phenomena including evolution of species, swarm cooperation, immune systems, and others. EAs typically involve a cycle of three stages: a reproduction stage that produces new solutions based on the currently maintained solutions; an evaluation stage that evaluates the newly generated solutions; and a selection stage that wipes out bad solutions. The concept of using EAs for noisy optimization is that the corresponding natural phenomena have been successfully processed in noisy natural environments, and hence the algorithmic simulations are also likely to be able to handle noise.

On one hand, it is believed that noise makes the optimization harder, and thus handling mechanisms have been proposed to reduce the negative effect of the noise (Fitzpatrick and Grefenstette, 1988; Beyer, 2000; Arnold and Beyer, 2003). Two representative strategies are *reevaluation* and *threshold selection*. According to the reevaluation strategy (Jin and Branke, 2005; Goh and Tan, 2007; Doerr et al., 2012a), whenever the fitness (also called the cost or objective value) of a solution is required, EAs make an independent evaluation of the solution regardless of whether the solution has been evaluated before, such that the fitness is smoothed. According to the threshold selection strategy (Markon et al., 2001; Bartz-Beielstein and Markon, 2002; Bartz-Beielstein, 2005a), in the selection stage EAs accept a newly generated solution only if its fitness is larger than the fitness of the old solution by at least a threshold value , such that the risk of accepting a bad solution due to noise is reduced.

On the other hand, several empirical observations have shown cases where noise can have a positive impact on the performance of local search (Selman et al., 1994; Hoos and Stützle, 2000; 2005), which indicates that noise does not always have a negative impact.

As these previous studies are mainly empirical, theoretical analysis is needed for a better understanding of evolutionary optimization in noisy environments.

### 1.1 Related Work

Despite EAs’ wide and successful application, the theoretical analysis of EAs on noisy optimization is rare. Some theoretical results on EAs have emerged (e.g., Neumann and Witt, 2010; Auger and Doerr, 2011), but most of them focus on clean environments. In noisy environments, the optimization is more complex and more randomized; thus the theoretical analysis is difficult.

Only a few theoretical analyses for EAs on noisy optimization have been published. Gutjahr (2003; 2004) first analyzed the ant colony optimization (ACO) algorithm for stochastic combinatorial optimization and proved convergence under mild conditions. Droste (2004) gave a running time analysis of EAs in discrete noisy optimization for the first time. Droste analyzed the ()-EA on the OneMax problem under one-bit noise and showed the maximal noise strength allowing a polynomial running time, where the noise strength is characterized by the noise probability in and *n* is the problem size. Sudholt and Thyssen (2012) analyzed the running time of a simple ACO for stochastic shortest path problems where edge weights are subject to noise, and showed the ability and limitation of the ACO under various noise models. For the difficulty faced by an ACO under a specific noise model, Doerr et al. (2012a) further showed that the reevaluation strategy can overcome it, that is, avoid being misled by an exceptionally optimistic evaluation due to noise. Qian et al. (2014) investigated the effectiveness of sampling, a common strategy to reduce the effect of noise. They proved a sufficient condition under which sampling is useless (i.e., sampling increases the running time), and applied it to show that sampling is useless for the ()-EA optimizing the OneMax and the Trap problems under additive Gaussian noise.

### 1.2 Our Contribution

In this article, we study the effect of noise on EAs and investigate the noise-handling mechanisms when noise needs to be accounted for.

Noise Handling Strategies . | PNT . |
---|---|

single evaluation | |

single-evaluation and | |

reevaluation | (Droste, 2004) |

reevaluation and | |

reevaluation and | |

reevaluation and |

Noise Handling Strategies . | PNT . |
---|---|

single evaluation | |

single-evaluation and | |

reevaluation | (Droste, 2004) |

reevaluation and | |

reevaluation and | |

reevaluation and |

The effect of noise on the expected running time of EAs is investigated in Section 3. On *deceptive* and *flat* problems, we prove that noise can simplify the optimization (i.e., decrease the expected running time) for EAs. The analysis results support that for some difficult problems, handling the noise is not necessary.

In Section 4.1 the OneMax problem is proved to be negatively affected by noise, and in Section 4.2 two commonly employed noise-handling mechanisms are examined: the reevaluation and the threshold selection strategies. With the ()-EA under one-bit noise, the noise-handling mechanisms are evaluated by the *polynomial noise tolerance* (PNT), which is the range of the noise strength such that the expected running time of the algorithm is polynomial. The wider the PNT is, the better a noise-handling mechanism is. For example, the one-bit noise strength (and thus the PNT) is characterized by the noise probability . The configurations of the ()-EA that we analyzed include without any noise-handling strategy (single-evaluation), single-evaluation with threshold selection (single-evaluation with a value of ), and reevaluation with threshold selection (reevaluation with a value of ). Their PNTs are presented in Table 1, where the PNT of the ()-EA with reevaluation (but no threshold selection) is directly derived from Droste (2004). The comparison shows the following:

Reevaluation alone makes the PNT much worse than single-evaluation.

Threshold selection must be combined with reevaluation; otherwise, the EA could not tolerate any noise strength larger than 0; meanwhile, reevaluation can also be better if used with threshold selection.

Reevaluation with threshold selection (threshold ) can improve upon that of single-evaluation.

In Section 4.3 we disclose a weakness of the noise-handling mechanisms: when used with the ()-EA solving the OneMax problem under *asymmetric* one-bit noise, all of them are ineffective (i.e., need exponential running time) when the noise probability reaches 1. The reason for the ineffectiveness of reevaluation with threshold selection is that it has too large a probability of accepting false progress caused by the noise when the threshold and too small a probability of accepting true progress when . Setting between 1 and 2 is useless because of the minimum fitness gap 1 (i.e., a value of is equivalent to ). We then introduce a modification into a threshold selection strategy to turn the original hard threshold into a *smooth threshold*, which allows a fractional threshold to be effective. We prove that with the smooth threshold selection strategy the PNT can be , that is, the ()-EA is always a polynomial algorithm on the problem regardless of the noise probability.

Finally, in Section 5, we describe our experiments to verify and complement the theoretical results. We show using two problem classes, the *Jump* problem, which is a synthetic problem, and the *minimum spanning tree* problem, which is a common combinatorial problem. We show that the badness of the noise is negatively correlated with the hardness of the problem, which was previously not noticed. Therefore, when the problem is quite hard, the noise can be helpful and thus handling the noise is not necessary. Then we verify that smooth threshold selection can better handle the noise by experiments on the *maximum matching* problem. Section 6 concludes the article.

## 2 Preliminaries

### 2.1 Noisy Optimization

A general optimization problem can be represented as , where the objective *f* is also called fitness in the context of evolutionary computation. In real-world optimization tasks, the fitness evaluation for a solution is usually disturbed by noise, and consequently we cannot obtain the exact fitness value but only a noisy one. Let and denote the noisy and true fitness of a solution *x*, respectively. In this study, we use the following three widely investigated noise models:

*Additive*. , where is uniformly selected from at random.*Multiplicative*. , where is uniformly randomly selected from .*One-bit*. with probability ; otherwise, , where is generated by flipping a uniformly randomly chosen bit of . This noise is for problems where solutions are represented in binary strings.

Additive and multiplicative noise have often been used to analyze the effect of noise (Beyer, 2000; Jin and Branke, 2005). One-bit noise is specifically used for optimizing pseudo-Boolean problems over and has been investigated in the first work for analyzing the running time of EAs in noisy optimization (Droste, 2004) and used to understand the role of noise in stochastic local search (Selman et al., 1994; Hoos and Stützle, 1999; Mengshoel, 2008).

Besides these kinds of noise we also consider a variant of one-bit noise called asymmetric one-bit noise (see Definition ^{1}). Inspired by the asymmetric mutation operator (Jansen and Sudholt, 2010), asymmetric one-bit noise flips a specific bit position with probability depending on the number of bit positions that take the same value. For the flipping of asymmetric one-bit noise on a solution , the probability of flipping a specific 0 bit is , and the probability of flipping a specific 1 bit is , where is the number of 0 bits of *x*. Note that for one-bit noise, the probability of flipping any specific bit is . For both one-bit and asymmetric one-bit noise, *p _{n}* controls the noise strength. In this article, we assume that the parameters of the environment (

*p*, , and ) do not change over time.

_{n}It is possible that a large noise could make an optimization problem extremely hard for particular algorithms. We are thus interested in the noise strength under which an algorithm could be tolerant to have a polynomial running time. The noise strength can be measured by adjustable parameters, for instance, for additive and multiplicative noise, and *p _{n}* for one-bit noise. We denote as a type of noisy fitness that disturbs the original fitness function

*f*by noise with parameter (where can be a tuple, e.g., for additive noise), and define the PNT in Definition

^{2}, which characterizes the maximum range of the noise parameter for allowing a polynomial expected running time. Note that the PNT is if the algorithm never has a polynomial expected running time for any noise strength. We study the PNT of EAs in order to analyze the effectiveness of noise-handling strategies.

*f*with a type of noise , let be the expected running time of on

*f*with noise strength represented by the parameter . Then, the polynomial noise tolerance of on

*f*with the type of noise is the range of the noise strength in which the expected running time is polynomial to the problem size

*n*, that is,

### 2.2 Evolutionary Algorithms

Evolutionary algorithms (Bäck, 1996) are a type of population-based metaheuristic optimization algorithm. Although many variants exist, the common procedure for EAs can be described as follows:

Generate an initial set of solutions (called a population).

Reproduce new solutions from the current population.

Evaluate the newly generated solutions.

Update the population by removing the bad solutions.

Repeat steps 2–5 until a specific criterion is met.

The ()-EA, as in Algorithm 1, is a simple EA for maximizing pseudo-Boolean problems over , which reflects the common structure of EAs. It maintains only one solution and repeatedly improves the current solution by using bitwise mutation (i.e., step 3 of Algorithm 1). It has been widely used for the running time analysis of EAs (e.g., by He and Yao, 2001; Droste et al., 2002).

The ()-EA, as in Algorithm 2, applies an offspring population size . In each iteration, it first generates offspring solutions by independently mutating the current solution times, and then selects the best from the current solution and the offspring solutions as the next solution. It has been used to disclose the effect of offspring population size by running time analysis (Jansen et al., 2005; Neumann and Wegener, 2007). Note that the ()-EA is a special case of the ()-EA with .

### 2.3 Markov Chain Modeling

We analyze EAs by modeling them as Markov chains. Here, we give some preliminaries.

EAs often generate solutions only based on their currently maintained solutions; thus they can be modeled and analyzed as Markov chains (e.g., He and Yao, 2001; Yu and Zhou, 2008; Yu et al., 2015). A Markov chain modeling an EA is constructed by taking the EA’s population space as the chain’s state space, . Let denote the set of all optimal populations that contain at least one optimal solution. The goal of the EA is to reach from an initial population. Thus, the process of an EA seeking can be analyzed by studying the corresponding Markov chain with the optimal state space . Note that we consider the discrete state space (i.e., is discrete) in this article.

Given a Markov chain and , we define the first hitting time (FHT) of the chain as a random variable such that . That is, is the number of steps needed to reach the optimal state space for the first time starting from . The mathematical expectation of , , is called the expected first hitting time (EFHT) of this chain starting from . If is drawn from a distribution , is called the expected first hitting time of the Markov chain over the initial distribution .

*expected running time*starting from and that starting from are respectively equal to where

*N*

_{1}and

*N*

_{2}are the number of fitness evaluations for the initial population and each iteration, respectively. For example, for the ()-EA, and ; for the ()-EA, and . Note that, when involving the expected running time of an EA in a problem in this article, if the initial population is not specified, it is the expected running time starting from a uniform initial distribution , that is, .

The following two lemmas on the EFHT of Markov chains (Freĭdlin, 1996) are used in the article.

Drift analysis is a commonly used tool for analyzing the EFHT of Markov chains. It was first introduced to the running time analysis of EAs by He and Yao (2001; 2004). Later, it became a popular tool in this field, and advanced variants have been proposed (e.g., Doerr et al., 2012b; Doerr and Goldberg, 2013). In this article, we use the additive version (Lemma ^{5}). To use it, a function has to be constructed to measure the distance of a state *x* to the optimal state space . The distance function satisfies that and . Then, we need to investigate the progress on the distance to in each step, . An upper (lower) bound of the EFHT can be derived through dividing the initial distance by a lower (upper) bound of the progress.

The simplified drift theorem (Oliveto and Witt, 2011; 2012) as presented in Lemma ^{6} was proposed to prove exponential lower bounds on the FHT of Markov chains, where *X _{t}* is usually represented by a mapping of . It requires two conditions: a constant negative drift and exponentially decaying probabilities of jumping toward or away from the goal state. To relax the requirement of a constant negative drift, advanced variants have been proposed, for instance, the simplified drift theorem with self-loops (Rowe and Sudholt, 2014) and the simplified drift theorem with scaling (Oliveto and Witt, 2014; 2015). In this article, we use the original version (Lemma

^{6}).

*X*, , be real-valued random variables describing a stochastic process over some state space. Suppose there exists an interval , two constants and (possibly depending on ) a function satisfying such that for all the following two conditions hold: Then there is a constant such that for it holds .

_{t}### 2.4 Pseudo-Boolean Functions

The pseudo-Boolean function class in Definition ^{7} is a large function class that only requires the solution space to be and the objective space to be . Many well-known NP-hard problems (e.g., the vertex cover problem and the 0-1 knapsack problem) belong to this class. Diverse pseudo-Boolean problems with different structures and difficulties have been used to disclose properties of EAs (e.g., Droste et al., 1998; 2002; He and Yao, 2001). We consider only maximization problems in this article. In the following, let *x _{i}* denote the

*i*th bit of a solution .

A function in the pseudo-Boolean function class has the form:

The Trap problem in Definition ^{8} is a special instance in this class, in which the aim is to maximize the number of 0 bits of a solution except for the global optimum (briefly denoted as ). Its optimal function value is , and the function value for any nonoptimal solution is not larger than 0. It has been used in the theoretical studies of EAs, and the expected running time of the ()-EA with mutation probability has been proved to be (Droste et al., 2002). It has also been recognized as the hardest instance in the pseudo-Boolean function class with a unique global optimum for the ()-EA (Qian et al., 2012), that is, the expected running time of the ()-EA on the Trap problem is the largest among the class.

The Peak problem in Definition ^{9} has the same fitness for all solutions except for the global optimum . It has been shown that for solving this problem, the ()-EA with mutation probability needs running time with an overwhelming probability (Oliveto and Witt, 2011).

The OneMax problem in Definition ^{10} aims to maximize the number of 1 bits of a solution. Its optimal solution is with the function value *n*. The running time of EAs has been well studied on the OneMax problem (He and Yao, 2001; Droste et al., 2002; Sudholt, 2013); particularly, the expected running time of the ()-EA with mutation probability is (Droste et al., 2002). It has also been recognized as the easiest instance in the pseudo-Boolean function class with a unique global optimum for the ()-EA (Qian et al., 2012).

## 3 On the Effect of Noisy Fitness

In this section, we provide two types of problems in which the noise can make the optimization easier for EAs. By easier, we mean that the EA with noise needs less expected running time than that without noise to find the optimal solution.

We analyze EAs by modeling them as Markov chains. Here, we first give some properties of Markov chains, which are used in the following analysis. We define a partition of the state space of a homogeneous Markov chain based on the EFHT in Definition ^{11}, and then define a jumping probability of a chain from one state to one state space in Definition ^{12}. It is easy to see that in Definition ^{11} is just , since .

Note that the EFHT partition is different from the fitness partition used in the fitness-level method (Wegener, 2002; Sudholt, 2013) for EAs’ running time analysis, since the solutions with the same fitness can have different EFHTs, and the EFHT order can be either consistent (e.g., the ()-EA on the Trap problem, as in Lemma ^{14}) or inconsistent (e.g., the ()-EA on the OneMax problem, as in Lemma ^{23}) with the fitness order.

For a Markov chain , is the probability of jumping from state *x* to state space in one step at time *t*.

Lemma ^{13} compares the EFHT of two Markov chains. It intuitively means that if one chain always has a larger probability of jumping into good states (i.e., with small *j* values), it needs less time for reaching the optimal state space.

To prove Lemma ^{13}, we need the following lemma, which is proved by using the property of majorization and Schur concavity.

Let . Because of condition 1 that *E _{i}* is increasing,

*f*is Schur-concave (Marshall et al., 2011, Theorem A.3). Conditions 2 and 3 imply that the vector majorizes . Thus, we have , which proves the lemma.

^{13}:

We prove one direction of the inequality, and the other can be proved similarly. We use Lemma ^{5} to derive a bound on , based on which this lemma holds.

^{5}to analyze , we first construct a distance function as which satisfies that and by Lemma

^{3}.

*x*with . Since , increases with

*j*and Eq. (3) holds, by Lemma

^{14}, we have Thus, we have, for all , all ,

By Lemma ^{5}, we get for all ,

### 3.1 On Deceptive Problems

Most practical EAs employ time-invariant operators; thus we can model an EA without noise by a homogeneous Markov chain, while for an EA with noise, since noise may change over time, we can just model it by a Markov chain. In the following analysis, we always denote them respectively by and , and denote the EFHT partition of by .

An evolutionary process can be characterized by variation (i.e., producing new solutions) and selection (i.e., weeding out bad solutions). Denote the state spaces before and after variation by and respectively, and then the variation process is a mapping and the selection process is a mapping (e.g., for the ()-EA on any pseudo-Boolean problem, and ). Note that is just the state space of the Markov chain. Let be the state transition probability by the variation process. Let denote the optimal solution set. The considered solution set (e.g., population) may be a multiset. For two multisets , we mean that .

The theorem intuitively means that if an evolutionary process is deceptive and the optimal solution is always accepted once generated in the noisy evolutionary process, then noise will be helpful.

The two EAs with and without noise are different only on whether the fitness evaluation is disturbed by noise; thus they must have the same values on *N*_{1} and *N*_{2} for their running time Eq. (2). Then, comparing their expected running time is equivalent to comparing the EFHTs of their corresponding Markov chains.

Then, we give a concrete deceptive evolutionary process, that is, the ()-EA optimizing the Trap problem. For the Trap problem given in Definition ^{8}, it is to maximize the number of 0 bits except for the optimal solution . It is not hard to see that the EFHT only depends on (i.e., the number of 0 bits). We denote as with . The order of is shown in Lemma ^{14}.

For any mutation probability , it holds that

For proving Lemma ^{14}, we need the following two lemmas. Lemma ^{18} (Witt, 2013) says that it is more likely that the offspring generated by mutating a parent solution with fewer 0 bits has a smaller number of 0 bits. Note that we consider instead of in their original lemma. It still holds because of symmetry. We have also restricted instead of , which leads to the strict inequality in the conclusion. Lemma ^{19} is very similar to Lemma ^{14}, except that the inequalities in condition 3 and the conclusion hold strictly.

^{14}:

First, trivially holds, because and . Then, we prove inductively on *j*.

*Initialization*is to prove . For , because the next solution can be only or , we have , we have . For , because the next solution can be , or a solution with number of 0 bits, we have , where

*P*denotes the probability that the next solution is . Then, . Thus, we have where the inequality is by .

*Inductive hypothesis*assumes that Then, we consider . Let

*x*and

*y*be a solution with number of 0 bits and that with

*K*number of 0 bits, respectively. Let

*a*and

*b*denote the number of 0 bits of the offspring solutions and , respectively. That is, and . For the independent mutations on

*x*and

*y*, we use and , respectively. Note that, are independently and identically distributed (i.i.d.), and are also i.i.d. Let and . Then, from Lemma

^{18}, we have .

*P*

_{0}and

*P*be the probability that for the offspring solutions, the least number of 0 bits is 0 (i.e., ), and the largest number of 0 bits is

_{i}*i*, while the least number of 0 bits is larger than 0 (i.e., ), respectively. By considering the mutation and selection behavior of the ()-EA on the Trap problem, we have For , let and . Then, we have For comparing with , we need to show that For , we have For , we similarly have Thus, where the last equality is by letting .

Since , it is easy to verify that is increasing. Then, we have by . Thus, the Eq. (9) holds.

^{19}to the formula in . The three conditions of Lemma

^{19}can be easily verified, because by inductive hypothesis; ; and Eq. (9) holds. Because , we have .

(3) *Conclusion*. According to steps (1) and (2), the lemma holds.

Either additive noise with or multiplicative noise with makes the Trap problem easier for the ()-EA with mutation probability less than 0.5.

First, we are to show that the ()-EA optimizing the Trap problem can be modeled by a deceptive Markov chain. By Lemma ^{14}, the EFHT partition of is and *m* in Definition ^{11} is equal to *n* here.

*x*, (i.e., the least number of 0 bits is 0), and (i.e., the largest number of 0 bits is

*j*, while the least number of 0 bits is larger than 0), respectively. For , because only the optimal solution or the solution with the largest number of 0 bits among the parent solution and offspring solutions will be accepted, we have This implies that Eq. (5) holds.

^{16}(i.e., Eq. (6)) holds. For with additive noise, since , we have For multiplicative noise, since , then and . Thus, for these two noises, we have , which implies that if the optimal solution is generated, it will always be accepted. Thus, we have , . This implies that Eq. (6) holds.

Thus, by Theorem ^{16}, we get that the Trap problem becomes easier for the ()-EA under these two types of noise.

### 3.2 On Flat Problems

Besides deceptive problems, we show that noise can also make flat problems easier for EAs. We take the Peak problem given in Definition ^{9} as the representative problem, which has the same fitness for all solutions except for the optimal solution . When using EAs to solve it, it provides no information for the search direction; thus it is hard for EAs. We analyze the ()-EA optimizing the Peak problem. The ()-EA is the same as the ()-EA except that it employs the strict selection strategy. That is, step 4 of Algorithm 1 changes to be “if .” The expected running time of the ()-EA with mutation probability on the Peak problem has been proved to be lower-bounded by (Droste et al., 2002).

One-bit noise with being a constant makes the Peak problem easier for the ()-EA with mutation probability , when starting from an initial solution *x* with .

Let and model the ()-EA with one-bit noise and without noise for maximizing the Peak problem, respectively. It is not hard to see that both the EFHT and only depend on . We denote and as and with , respectively.

*x*with , in one step, any nonoptimal offspring solution has the same fitness as the parent and then will be rejected because of the strict selection strategy; only the optimal solution can be accepted, which happens with probability . Thus, we have which leads to

*x*with , if the generated offspring is the optimal solution , it will be accepted with probability because only no bit flip for noise on and no 0 bit flip for noise on

*x*can make ; otherwise,

*x*will keep , because for any with . Thus, we have which leads to .

*x*with , if the offspring is , it will be accepted with probability because only no bit flip for noise on can make ; if , it will be accepted with probability because only flipping the unique 0 bit for noise on can make ; otherwise,

*x*keeps because for any with . Let be the probability of mutating to by bitwise mutation with . Then, we have, for , which leads to .

*i*with , where the first inequality is because and for large enough

*n*and

*p*being constant, and the last inequality is by .

_{n}This is equivalent to , which implies that noise is helpful when starting from an initial solution *x* with .

This theorem implies that the Peak problem becomes easier under noise when starting from an initial solution *x* with a large number of 0 bits. From the analysis, we can see that the reason for requiring a large is to make much larger than , which means that the negative effect of rejecting the optimal solution by noise can be compensated by the positive effect of accepting the solution *x* with .

For the ()-EA solving the Peak problem, any offspring solution will be accepted because its fitness is always not less than the fitness of the parent solution; thus the solution *x* in the evolutionary process almost performs a random walk over . In this case, we can intuitively find a similar effect of one-bit noise as that found in the ()-EA solving the Peak problem. Here, we assume that the single-evaluation strategy is used. Under one-bit noise, for any nonoptimal parent solution *x*, if , then and any offspring will be accepted; if and , then any offspring will be accepted; if and , any offspring with will be rejected because , and the optimal solution with will be rejected with probability *p _{n}*. Compared with the transition behavior without noise, noise only has an effect when and : the negative effect of rejecting the optimal solution, which has the probability , and the positive effect of rejecting , which has the probability at least . Obviously, the negative effect can be compensated by the positive effect, which implies that one-bit noise is helpful. Thus, we have the following conjecture. The rigorous analysis is not easy, and we leave it to future work. We instead verify it in the experiment section.

One-bit noise makes the Peak problem easier for the ()-EA with mutation probability .

## 4 On the Effect of Noise-Handling Strategies

In the previous section, we found that noise can make optimization easier for EAs when the problem presents some deceptiveness and flatness. Meanwhile, on other problems, noisy fitness evaluation can make an optimization harder for EAs. For example, Droste (2004) proved that the running time of the ()-EA on the OneMax problem can increase from polynomial to exponential in the presence of noise. Thus, in this section, we investigate how well different noise-handling strategies can perform when the noise is indeed harmful.

### 4.1 A Noise-Harmful Case

We consider the case that the ()-EA is used for optimizing the OneMax problem. Let and model the ()-EA with and without noise for maximizing OneMax, respectively. It is not hard to see that the EFHT only depends on . We denote as with . The order of is shown in Lemma ^{23}.

For any mutation probability , it holds that

We prove inductively on *j*.

(1) *Initialization* is to prove , which holds, since .

*Inductive hypothesis*assumes that Then, we consider . We use the similar analysis method as in the proof of Lemma

^{14}to compare with .

*i*(i.e., ). By considering the mutation and selection behavior of the ()-EA on the OneMax problem, we have For , let . We have By subtracting from , we get where the inequality is by applying Lemma

^{19}to the formula in . The three conditions of Lemma

^{19}can be easily verified, because by inductive hypothesis; ; and the following inequality holds. Because , we have .

(3) *Conclusion*. According to steps (1) and (2), the lemma holds.

Any noise makes the OneMax problem harder for the ()-EA with mutation probability less than 0.5.

We use Lemma ^{13} to prove it. By Lemma ^{23}, the EFHT partition of is .

*x*is

*j*. For , because the solution with the least number of 0 bits among the parent solution and offspring solutions will be accepted, we have For , because of the fitness evaluation disturbed by noise, the solution with the least number of 0 bits among the parent and offspring solutions may be rejected. Thus, we have Then, we get which implies that the condition Eq. (3) of Lemma

^{13}holds. Thus, we get , , namely, noise makes the OneMax problem harder for the ()-EA.

In the following sections we analyze the effect of different noise-handling strategies for the ()-EA (a specific case of the ()-EA), optimizing the OneMax problem to investigate their usefulness.

### 4.2 On Reevaluation and Threshold Selection Strategies

#### 4.2.1 Reevaluation

There are naturally two fitness evaluation options for EAs (Arnold and Beyer, 2002; Jin and Branke, 2005; Goh and Tan, 2007):

*Single-evaluation.*We evaluate a solution once, and use the evaluated fitness for this solution in the future.*Reevaluation.*We access the fitness of a solution by evaluation every time.

For example, for the ()-EA in Algorithm 1, if using reevaluation, both and will be calculated and recalculated in each iteration; if using single-evaluation, only will be calculated and the previous obtained fitness will be reused. Note that the analysis in the previous section without explicitly indicating the employed evaluation strategy assumes single-evaluation.

Sudholt and Thyssen (2012), for an ACO with single-evaluation solving stochastic shortest path problems, constructed an example graph to show that exponential running time is required for approximating real shortest paths. The difficulty is because once a path is luckily evaluated to have a relatively small length due to noise, it will always be preferred and make the ACO get stuck in an inferior solution. By using reevaluation instead of single-evaluation when evaluating the best-so-far path, the ACO can easily solve the example graph (Doerr et al., 2012a). Reevaluation has also been employed for EAs solving noisy multiobjective optimization problems (e.g., Buche et al., 2002; Park and Ryu, 2011; Fieldsend and Everson, 2015).

Intuitively, reevaluation can smooth noise and thus could be better for noisy optimizations, but it also increases the fitness evaluation cost and thus increases the running time. Its usefulness was not clear.

In this section we compare these two options for the ()-EA, solving the OneMax problem under one-bit noise to show whether reevaluation is useful. Note that for one-bit noise, *p _{n}* controls the noise strength, that is, noise becomes stronger as

*p*gets larger, and it is also the parameter of the PNT. In the following analysis, let indicate any polynomial of

_{n}*n*.

For the ()-EA with mutation probability solving the OneMax problem under one-bit noise, if using single-evaluation, the PNT is .

The theorem is straightforwardly derived from the following two lemmas. Lemma ^{26} tells us the expected running time upper bound , which implies that the expected running time is polynomial if , i.e., . Lemma ^{27} tells us the lower bound , which implies that the running time is superpolynomial if , i.e., . By combining these results, we get that the maximum noise strength allowing polynomial expected running time is , i.e., the PNT is .

For the ()-EA using single-evaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is upper-bounded by .

Let *L* denote the noisy fitness value of the current solution *x*. Because the ()-EA does not accept a solution with a smaller fitness (step 4 of Algorithm 1) and does not reevaluate the fitness of the current solution *x*, will never decrease. By applying the fitness level technique (Wegener, 2002; Sudholt, 2013), we first analyze the expected steps until *L* increases when starting from (denoted by ) and then sum them up to get an upper bound for the expected steps until *L* reaches the maximum value *n*. For , we analyze the probability *P* that *L* increases in two steps when , then . Note that one-bit noise can make *L* be , or , where is the number of 1 bits. When analyzing the noisy fitness of the offspring in each step, we need to first consider bitwise mutation on *x* and then one random bit flip for noise.

When , , *L* or .

(1) For , , since it is sufficient to flip one 0 bit for mutation and one 0 bit for noise in the first step, or flip one 0 bit for mutation and no bit for noise in the first step and flip one 0 bit for mutation and no bit for noise in the second step.

(2) For , since it is sufficient to flip no bit for mutation and one 0 bit for noise, or flip one 0 bit for mutation and no bit for noise in the first step.

(3) For , since it is sufficient to flip no bit for mutation and no bit or one 0 bit for noise in the first step.

When , or 1. By considering cases 2 and 3, we get the same lower bound for *P*.

When and the optimal solution has not been found, or . By considering cases 1 and 2, we get .

When , or *n*. The equality means that the optimal solution has been found. Because we are to get an upper bound for the expected running time of finding , we can pessimistically assume that . Starting from and (i.e., the current solution has one bits and the fitness is *n*), it will always keep in such a situation before finding , and the optimal solution can be generated and accepted in one step only through flipping the unique 0 bit for mutation and no bit for noise, which happens with probability . This implies that the expected steps for finding the optimal solution are at most .

Thus, the total expected running time is upper-bounded by .

For the ()-EA using single-evaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is lower-bounded by .

*x*is less than , that is, . Let

*T*denote the running time of finding the optimal solution when starting from

*x*. Denote

*A*as the event that in the evolutionary process, any solution with is never found. By the law of total expectation, we have

*x*to the optimal solution , which satisfies that . Then, is the sum of the probabilities of all possible such

*l*. For any such

*l*, there must exist a corresponding set of paths , in which the first solutions of any path are the same as that of

*l*and the

*m*th solution has number of 1 bits. Let

*q*denote the probability of the subpath , and let . Then, . The probability of mutating from to

*y*is at least , and the acceptance probability of

_{m}*y*is at least , which is reached when and . Thus, we have Moreover, for any two different paths , it must hold that . Thus, . Because , we get . Then,

_{m}*T*into two parts: the running time until finding a solution with for the first time (denoted by

*T*

_{1}), and the remaining running time for finding the optimal solution (denoted by

*T*

_{2}). Thus, we have For , when finding a solution with for the first time, we consider the case that the fitness is evaluated as

*n*, which happens with probability . If it happens, because of the single-evaluation strategy, the solution will always have number of 1 bits and its fitness will always be

*n*. From the upper-bound analysis in Lemma

^{26}, we know that the probability of generating and accepting the optimal solution in one step in such a situation is . Thus, which implies that , and thus .

Because the initial solution is uniformly distributed over , we have . Thus, the expected running time of the whole process is lower-bounded by , i.e., .

Note that when , the derived lower bound would be quite loose. Thus, for filling up this gap, we are to derive another lower bound that does not depend on *p _{n}*. From Droste et al. (2002, Lemma

^{23}), we know that the expected running time of the ()-EA to optimize linear functions with positive weights is . Their proof idea is to analyze the expected running time until all the 0 bits of the initial solution have been flipped at least once, which is obviously a lower bound on the expected running time of finding the optimal solution . Because noise will not affect this analysis process, we can directly apply their result to our setting and then get the lower bound .

By combining the derived two lower bounds, we get that the expected running time of the whole process is lower-bounded by .

We then show the PNT using reevaluation in the following theorem, which can be straightforwardly derived from Lemma ^{29}.

For the ()-EA with mutation probability solving the OneMax problem under one-bit noise, if using reevaluation, the PNT is .

For the ()-EA using reevaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is polynomial when , and superpolynomial when .

#### 4.2.2 Threshold Selection

During the process of evolutionary optimization, most of the improvements in one generation are small. When using reevaluation, because of noisy fitness evaluation, a considerable portion of these improvements are not real, where a worse solution appears to have a better fitness and then survives to replace the true better solution which appears to have a worse fitness. This may mislead the search direction of EAs and slow down the efficiency of EAs or make the EAs get trapped in the local optimal solution (see Section 4.2.1). To deal with this problem, a selection strategy for EAs handling noise was proposed (Markon et al., 2001; Bartz-Beielstein, 2005a), namely, threshold selection, where an offspring solution will be accepted only if its fitness is larger than the parent solution by at least a predefined threshold .

For example, for the ()-EA with threshold selection as in Algorithm 3, step 4 changes to be “if ” rather than “if ” in Algorithm 1. Such a strategy can reduce the risk of accepting a bad solution due to noise. Although the good local performance (i.e., the progress of one step) of EAs with threshold selection has been shown on some problems (Markon et al., 2001; Bartz-Beielstein and Markon, 2002; Bartz-Beielstein, 2005b), its usefulness for the global performance (i.e., the running time until finding the optimal solution) of EAs under noise is not yet clear.

In this section we analyze the running time of the ()-EA with threshold selection solving OneMax under one-bit noise to see whether threshold selection is useful. Note that the analysis here assumes reevaluation. This is because using single-evaluation and threshold selection simultaneously will lead to infinite expected running time for any noise strength , as shown in the following theorem.

For the ()-EA with mutation probability on the OneMax problem under one-b