Many optimization tasks must be handled in noisy environments, where the exact evaluation of a solution cannot be obtained, only a noisy one. For optimization of noisy tasks, evolutionary algorithms (EAs), a type of stochastic metaheuristic search algorithm, have been widely and successfully applied. Previous work mainly focuses on the empirical study and design of EAs for optimization under noisy conditions, while the theoretical understandings are largely insufficient. In this study, we first investigate how noisy fitness can affect the running time of EAs. Two kinds of noise-helpful problems are identified, on which the EAs will run faster with the presence of noise, and thus the noise should not be handled. Second, on a representative noise-harmful problem in which the noise has a strong negative effect, we examine two commonly employed mechanisms dealing with noise in EAs: reevaluation and threshold selection. The analysis discloses that using these two strategies simultaneously is effective for the one-bit noise but ineffective for the asymmetric one-bit noise. Smooth threshold selection is then proposed, which can be proved to be an effective strategy to further improve the noise tolerance ability in the problem. We then complement the theoretical analysis by experiments on both synthetic problems as well as two combinatorial problems, the minimum spanning tree and the maximum matching. The experimental results agree with the theoretical findings and also show that the proposed smooth threshold selection can deal with the noise better.
Optimization tasks often encounter noisy environments. For example, in industrial design such as VLSI design (Guo et al., 2014), every prototype is evaluated by simulations; therefore, the result of the evaluation may not be perfect due to the simulation error. Also, with machine learning, a prediction model is evaluated only on a limited amount of data (Qian et al., 2015a); therefore, the estimated performance is shifted from the true performance. It is possible that noisy environments change the properties of an optimization problem; thus traditional optimization techniques may have low efficacy. Meanwhile, evolutionary algorithms (EAs) (Bäck, 1996) have been widely and successfully adopted for noisy optimization tasks (Freitas, 2003; Ma et al., 2006; Chang and Chen, 2006).
EAs are a type of randomized metaheuristic optimization algorithm, inspired by natural phenomena including evolution of species, swarm cooperation, immune systems, and others. EAs typically involve a cycle of three stages: a reproduction stage that produces new solutions based on the currently maintained solutions; an evaluation stage that evaluates the newly generated solutions; and a selection stage that wipes out bad solutions. The concept of using EAs for noisy optimization is that the corresponding natural phenomena have been successfully processed in noisy natural environments, and hence the algorithmic simulations are also likely to be able to handle noise.
On one hand, it is believed that noise makes the optimization harder, and thus handling mechanisms have been proposed to reduce the negative effect of the noise (Fitzpatrick and Grefenstette, 1988; Beyer, 2000; Arnold and Beyer, 2003). Two representative strategies are reevaluation and threshold selection. According to the reevaluation strategy (Jin and Branke, 2005; Goh and Tan, 2007; Doerr et al., 2012a), whenever the fitness (also called the cost or objective value) of a solution is required, EAs make an independent evaluation of the solution regardless of whether the solution has been evaluated before, such that the fitness is smoothed. According to the threshold selection strategy (Markon et al., 2001; Bartz-Beielstein and Markon, 2002; Bartz-Beielstein, 2005a), in the selection stage EAs accept a newly generated solution only if its fitness is larger than the fitness of the old solution by at least a threshold value , such that the risk of accepting a bad solution due to noise is reduced.
On the other hand, several empirical observations have shown cases where noise can have a positive impact on the performance of local search (Selman et al., 1994; Hoos and Stützle, 2000; 2005), which indicates that noise does not always have a negative impact.
As these previous studies are mainly empirical, theoretical analysis is needed for a better understanding of evolutionary optimization in noisy environments.
1.1 Related Work
Despite EAs’ wide and successful application, the theoretical analysis of EAs on noisy optimization is rare. Some theoretical results on EAs have emerged (e.g., Neumann and Witt, 2010; Auger and Doerr, 2011), but most of them focus on clean environments. In noisy environments, the optimization is more complex and more randomized; thus the theoretical analysis is difficult.
Only a few theoretical analyses for EAs on noisy optimization have been published. Gutjahr (2003; 2004) first analyzed the ant colony optimization (ACO) algorithm for stochastic combinatorial optimization and proved convergence under mild conditions. Droste (2004) gave a running time analysis of EAs in discrete noisy optimization for the first time. Droste analyzed the ()-EA on the OneMax problem under one-bit noise and showed the maximal noise strength allowing a polynomial running time, where the noise strength is characterized by the noise probability in and n is the problem size. Sudholt and Thyssen (2012) analyzed the running time of a simple ACO for stochastic shortest path problems where edge weights are subject to noise, and showed the ability and limitation of the ACO under various noise models. For the difficulty faced by an ACO under a specific noise model, Doerr et al. (2012a) further showed that the reevaluation strategy can overcome it, that is, avoid being misled by an exceptionally optimistic evaluation due to noise. Qian et al. (2014) investigated the effectiveness of sampling, a common strategy to reduce the effect of noise. They proved a sufficient condition under which sampling is useless (i.e., sampling increases the running time), and applied it to show that sampling is useless for the ()-EA optimizing the OneMax and the Trap problems under additive Gaussian noise.
1.2 Our Contribution
In this article, we study the effect of noise on EAs and investigate the noise-handling mechanisms when noise needs to be accounted for.
|Noise Handling Strategies .||PNT .|
|Noise Handling Strategies .||PNT .|
The effect of noise on the expected running time of EAs is investigated in Section 3. On deceptive and flat problems, we prove that noise can simplify the optimization (i.e., decrease the expected running time) for EAs. The analysis results support that for some difficult problems, handling the noise is not necessary.
In Section 4.1 the OneMax problem is proved to be negatively affected by noise, and in Section 4.2 two commonly employed noise-handling mechanisms are examined: the reevaluation and the threshold selection strategies. With the ()-EA under one-bit noise, the noise-handling mechanisms are evaluated by the polynomial noise tolerance (PNT), which is the range of the noise strength such that the expected running time of the algorithm is polynomial. The wider the PNT is, the better a noise-handling mechanism is. For example, the one-bit noise strength (and thus the PNT) is characterized by the noise probability . The configurations of the ()-EA that we analyzed include without any noise-handling strategy (single-evaluation), single-evaluation with threshold selection (single-evaluation with a value of ), and reevaluation with threshold selection (reevaluation with a value of ). Their PNTs are presented in Table 1, where the PNT of the ()-EA with reevaluation (but no threshold selection) is directly derived from Droste (2004). The comparison shows the following:
Reevaluation alone makes the PNT much worse than single-evaluation.
Threshold selection must be combined with reevaluation; otherwise, the EA could not tolerate any noise strength larger than 0; meanwhile, reevaluation can also be better if used with threshold selection.
Reevaluation with threshold selection (threshold ) can improve upon that of single-evaluation.
In Section 4.3 we disclose a weakness of the noise-handling mechanisms: when used with the ()-EA solving the OneMax problem under asymmetric one-bit noise, all of them are ineffective (i.e., need exponential running time) when the noise probability reaches 1. The reason for the ineffectiveness of reevaluation with threshold selection is that it has too large a probability of accepting false progress caused by the noise when the threshold and too small a probability of accepting true progress when . Setting between 1 and 2 is useless because of the minimum fitness gap 1 (i.e., a value of is equivalent to ). We then introduce a modification into a threshold selection strategy to turn the original hard threshold into a smooth threshold, which allows a fractional threshold to be effective. We prove that with the smooth threshold selection strategy the PNT can be , that is, the ()-EA is always a polynomial algorithm on the problem regardless of the noise probability.
Finally, in Section 5, we describe our experiments to verify and complement the theoretical results. We show using two problem classes, the Jump problem, which is a synthetic problem, and the minimum spanning tree problem, which is a common combinatorial problem. We show that the badness of the noise is negatively correlated with the hardness of the problem, which was previously not noticed. Therefore, when the problem is quite hard, the noise can be helpful and thus handling the noise is not necessary. Then we verify that smooth threshold selection can better handle the noise by experiments on the maximum matching problem. Section 6 concludes the article.
2.1 Noisy Optimization
A general optimization problem can be represented as , where the objective f is also called fitness in the context of evolutionary computation. In real-world optimization tasks, the fitness evaluation for a solution is usually disturbed by noise, and consequently we cannot obtain the exact fitness value but only a noisy one. Let and denote the noisy and true fitness of a solution x, respectively. In this study, we use the following three widely investigated noise models:
Additive. , where is uniformly selected from at random.
Multiplicative. , where is uniformly randomly selected from .
One-bit. with probability ; otherwise, , where is generated by flipping a uniformly randomly chosen bit of . This noise is for problems where solutions are represented in binary strings.
Additive and multiplicative noise have often been used to analyze the effect of noise (Beyer, 2000; Jin and Branke, 2005). One-bit noise is specifically used for optimizing pseudo-Boolean problems over and has been investigated in the first work for analyzing the running time of EAs in noisy optimization (Droste, 2004) and used to understand the role of noise in stochastic local search (Selman et al., 1994; Hoos and Stützle, 1999; Mengshoel, 2008).
Besides these kinds of noise we also consider a variant of one-bit noise called asymmetric one-bit noise (see Definition 1). Inspired by the asymmetric mutation operator (Jansen and Sudholt, 2010), asymmetric one-bit noise flips a specific bit position with probability depending on the number of bit positions that take the same value. For the flipping of asymmetric one-bit noise on a solution , the probability of flipping a specific 0 bit is , and the probability of flipping a specific 1 bit is , where is the number of 0 bits of x. Note that for one-bit noise, the probability of flipping any specific bit is . For both one-bit and asymmetric one-bit noise, pn controls the noise strength. In this article, we assume that the parameters of the environment (pn, , and ) do not change over time.
It is possible that a large noise could make an optimization problem extremely hard for particular algorithms. We are thus interested in the noise strength under which an algorithm could be tolerant to have a polynomial running time. The noise strength can be measured by adjustable parameters, for instance, for additive and multiplicative noise, and pn for one-bit noise. We denote as a type of noisy fitness that disturbs the original fitness function f by noise with parameter (where can be a tuple, e.g., for additive noise), and define the PNT in Definition 2, which characterizes the maximum range of the noise parameter for allowing a polynomial expected running time. Note that the PNT is if the algorithm never has a polynomial expected running time for any noise strength. We study the PNT of EAs in order to analyze the effectiveness of noise-handling strategies.
2.2 Evolutionary Algorithms
Evolutionary algorithms (Bäck, 1996) are a type of population-based metaheuristic optimization algorithm. Although many variants exist, the common procedure for EAs can be described as follows:
Generate an initial set of solutions (called a population).
Reproduce new solutions from the current population.
Evaluate the newly generated solutions.
Update the population by removing the bad solutions.
Repeat steps 2–5 until a specific criterion is met.
The ()-EA, as in Algorithm 1, is a simple EA for maximizing pseudo-Boolean problems over , which reflects the common structure of EAs. It maintains only one solution and repeatedly improves the current solution by using bitwise mutation (i.e., step 3 of Algorithm 1). It has been widely used for the running time analysis of EAs (e.g., by He and Yao, 2001; Droste et al., 2002).
The ()-EA, as in Algorithm 2, applies an offspring population size . In each iteration, it first generates offspring solutions by independently mutating the current solution times, and then selects the best from the current solution and the offspring solutions as the next solution. It has been used to disclose the effect of offspring population size by running time analysis (Jansen et al., 2005; Neumann and Wegener, 2007). Note that the ()-EA is a special case of the ()-EA with .
2.3 Markov Chain Modeling
We analyze EAs by modeling them as Markov chains. Here, we give some preliminaries.
EAs often generate solutions only based on their currently maintained solutions; thus they can be modeled and analyzed as Markov chains (e.g., He and Yao, 2001; Yu and Zhou, 2008; Yu et al., 2015). A Markov chain modeling an EA is constructed by taking the EA’s population space as the chain’s state space, . Let denote the set of all optimal populations that contain at least one optimal solution. The goal of the EA is to reach from an initial population. Thus, the process of an EA seeking can be analyzed by studying the corresponding Markov chain with the optimal state space . Note that we consider the discrete state space (i.e., is discrete) in this article.
Given a Markov chain and , we define the first hitting time (FHT) of the chain as a random variable such that . That is, is the number of steps needed to reach the optimal state space for the first time starting from . The mathematical expectation of , , is called the expected first hitting time (EFHT) of this chain starting from . If is drawn from a distribution , is called the expected first hitting time of the Markov chain over the initial distribution .
The following two lemmas on the EFHT of Markov chains (Freĭdlin, 1996) are used in the article.
Drift analysis is a commonly used tool for analyzing the EFHT of Markov chains. It was first introduced to the running time analysis of EAs by He and Yao (2001; 2004). Later, it became a popular tool in this field, and advanced variants have been proposed (e.g., Doerr et al., 2012b; Doerr and Goldberg, 2013). In this article, we use the additive version (Lemma 5). To use it, a function has to be constructed to measure the distance of a state x to the optimal state space . The distance function satisfies that and . Then, we need to investigate the progress on the distance to in each step, . An upper (lower) bound of the EFHT can be derived through dividing the initial distance by a lower (upper) bound of the progress.
The simplified drift theorem (Oliveto and Witt, 2011; 2012) as presented in Lemma 6 was proposed to prove exponential lower bounds on the FHT of Markov chains, where Xt is usually represented by a mapping of . It requires two conditions: a constant negative drift and exponentially decaying probabilities of jumping toward or away from the goal state. To relax the requirement of a constant negative drift, advanced variants have been proposed, for instance, the simplified drift theorem with self-loops (Rowe and Sudholt, 2014) and the simplified drift theorem with scaling (Oliveto and Witt, 2014; 2015). In this article, we use the original version (Lemma 6).
2.4 Pseudo-Boolean Functions
The pseudo-Boolean function class in Definition 7 is a large function class that only requires the solution space to be and the objective space to be . Many well-known NP-hard problems (e.g., the vertex cover problem and the 0-1 knapsack problem) belong to this class. Diverse pseudo-Boolean problems with different structures and difficulties have been used to disclose properties of EAs (e.g., Droste et al., 1998; 2002; He and Yao, 2001). We consider only maximization problems in this article. In the following, let xi denote the ith bit of a solution .
A function in the pseudo-Boolean function class has the form:
The Trap problem in Definition 8 is a special instance in this class, in which the aim is to maximize the number of 0 bits of a solution except for the global optimum (briefly denoted as ). Its optimal function value is , and the function value for any nonoptimal solution is not larger than 0. It has been used in the theoretical studies of EAs, and the expected running time of the ()-EA with mutation probability has been proved to be (Droste et al., 2002). It has also been recognized as the hardest instance in the pseudo-Boolean function class with a unique global optimum for the ()-EA (Qian et al., 2012), that is, the expected running time of the ()-EA on the Trap problem is the largest among the class.
The Peak problem in Definition 9 has the same fitness for all solutions except for the global optimum . It has been shown that for solving this problem, the ()-EA with mutation probability needs running time with an overwhelming probability (Oliveto and Witt, 2011).
The OneMax problem in Definition 10 aims to maximize the number of 1 bits of a solution. Its optimal solution is with the function value n. The running time of EAs has been well studied on the OneMax problem (He and Yao, 2001; Droste et al., 2002; Sudholt, 2013); particularly, the expected running time of the ()-EA with mutation probability is (Droste et al., 2002). It has also been recognized as the easiest instance in the pseudo-Boolean function class with a unique global optimum for the ()-EA (Qian et al., 2012).
3 On the Effect of Noisy Fitness
In this section, we provide two types of problems in which the noise can make the optimization easier for EAs. By easier, we mean that the EA with noise needs less expected running time than that without noise to find the optimal solution.
We analyze EAs by modeling them as Markov chains. Here, we first give some properties of Markov chains, which are used in the following analysis. We define a partition of the state space of a homogeneous Markov chain based on the EFHT in Definition 11, and then define a jumping probability of a chain from one state to one state space in Definition 12. It is easy to see that in Definition 11 is just , since .
Note that the EFHT partition is different from the fitness partition used in the fitness-level method (Wegener, 2002; Sudholt, 2013) for EAs’ running time analysis, since the solutions with the same fitness can have different EFHTs, and the EFHT order can be either consistent (e.g., the ()-EA on the Trap problem, as in Lemma 14) or inconsistent (e.g., the ()-EA on the OneMax problem, as in Lemma 23) with the fitness order.
For a Markov chain , is the probability of jumping from state x to state space in one step at time t.
Lemma 13 compares the EFHT of two Markov chains. It intuitively means that if one chain always has a larger probability of jumping into good states (i.e., with small j values), it needs less time for reaching the optimal state space.
To prove Lemma 13, we need the following lemma, which is proved by using the property of majorization and Schur concavity.
Let . Because of condition 1 that Ei is increasing, f is Schur-concave (Marshall et al., 2011, Theorem A.3). Conditions 2 and 3 imply that the vector majorizes . Thus, we have , which proves the lemma.
We prove one direction of the inequality, and the other can be proved similarly. We use Lemma 5 to derive a bound on , based on which this lemma holds.
By Lemma 5, we get for all ,
3.1 On Deceptive Problems
Most practical EAs employ time-invariant operators; thus we can model an EA without noise by a homogeneous Markov chain, while for an EA with noise, since noise may change over time, we can just model it by a Markov chain. In the following analysis, we always denote them respectively by and , and denote the EFHT partition of by .
An evolutionary process can be characterized by variation (i.e., producing new solutions) and selection (i.e., weeding out bad solutions). Denote the state spaces before and after variation by and respectively, and then the variation process is a mapping and the selection process is a mapping (e.g., for the ()-EA on any pseudo-Boolean problem, and ). Note that is just the state space of the Markov chain. Let be the state transition probability by the variation process. Let denote the optimal solution set. The considered solution set (e.g., population) may be a multiset. For two multisets , we mean that .
The theorem intuitively means that if an evolutionary process is deceptive and the optimal solution is always accepted once generated in the noisy evolutionary process, then noise will be helpful.
The two EAs with and without noise are different only on whether the fitness evaluation is disturbed by noise; thus they must have the same values on N1 and N2 for their running time Eq. (2). Then, comparing their expected running time is equivalent to comparing the EFHTs of their corresponding Markov chains.
Then, we give a concrete deceptive evolutionary process, that is, the ()-EA optimizing the Trap problem. For the Trap problem given in Definition 8, it is to maximize the number of 0 bits except for the optimal solution . It is not hard to see that the EFHT only depends on (i.e., the number of 0 bits). We denote as with . The order of is shown in Lemma 14.
For any mutation probability , it holds that
For proving Lemma 14, we need the following two lemmas. Lemma 18 (Witt, 2013) says that it is more likely that the offspring generated by mutating a parent solution with fewer 0 bits has a smaller number of 0 bits. Note that we consider instead of in their original lemma. It still holds because of symmetry. We have also restricted instead of , which leads to the strict inequality in the conclusion. Lemma 19 is very similar to Lemma 14, except that the inequalities in condition 3 and the conclusion hold strictly.
First, trivially holds, because and . Then, we prove inductively on j.
Since , it is easy to verify that is increasing. Then, we have by . Thus, the Eq. (9) holds.
(3) Conclusion. According to steps (1) and (2), the lemma holds.
Either additive noise with or multiplicative noise with makes the Trap problem easier for the ()-EA with mutation probability less than 0.5.
Thus, by Theorem 16, we get that the Trap problem becomes easier for the ()-EA under these two types of noise.
3.2 On Flat Problems
Besides deceptive problems, we show that noise can also make flat problems easier for EAs. We take the Peak problem given in Definition 9 as the representative problem, which has the same fitness for all solutions except for the optimal solution . When using EAs to solve it, it provides no information for the search direction; thus it is hard for EAs. We analyze the ()-EA optimizing the Peak problem. The ()-EA is the same as the ()-EA except that it employs the strict selection strategy. That is, step 4 of Algorithm 1 changes to be “if .” The expected running time of the ()-EA with mutation probability on the Peak problem has been proved to be lower-bounded by (Droste et al., 2002).
One-bit noise with being a constant makes the Peak problem easier for the ()-EA with mutation probability , when starting from an initial solution x with .
Let and model the ()-EA with one-bit noise and without noise for maximizing the Peak problem, respectively. It is not hard to see that both the EFHT and only depend on . We denote and as and with , respectively.
This is equivalent to , which implies that noise is helpful when starting from an initial solution x with .
This theorem implies that the Peak problem becomes easier under noise when starting from an initial solution x with a large number of 0 bits. From the analysis, we can see that the reason for requiring a large is to make much larger than , which means that the negative effect of rejecting the optimal solution by noise can be compensated by the positive effect of accepting the solution x with .
For the ()-EA solving the Peak problem, any offspring solution will be accepted because its fitness is always not less than the fitness of the parent solution; thus the solution x in the evolutionary process almost performs a random walk over . In this case, we can intuitively find a similar effect of one-bit noise as that found in the ()-EA solving the Peak problem. Here, we assume that the single-evaluation strategy is used. Under one-bit noise, for any nonoptimal parent solution x, if , then and any offspring will be accepted; if and , then any offspring will be accepted; if and , any offspring with will be rejected because , and the optimal solution with will be rejected with probability pn. Compared with the transition behavior without noise, noise only has an effect when and : the negative effect of rejecting the optimal solution, which has the probability , and the positive effect of rejecting , which has the probability at least . Obviously, the negative effect can be compensated by the positive effect, which implies that one-bit noise is helpful. Thus, we have the following conjecture. The rigorous analysis is not easy, and we leave it to future work. We instead verify it in the experiment section.
One-bit noise makes the Peak problem easier for the ()-EA with mutation probability .
4 On the Effect of Noise-Handling Strategies
In the previous section, we found that noise can make optimization easier for EAs when the problem presents some deceptiveness and flatness. Meanwhile, on other problems, noisy fitness evaluation can make an optimization harder for EAs. For example, Droste (2004) proved that the running time of the ()-EA on the OneMax problem can increase from polynomial to exponential in the presence of noise. Thus, in this section, we investigate how well different noise-handling strategies can perform when the noise is indeed harmful.
4.1 A Noise-Harmful Case
We consider the case that the ()-EA is used for optimizing the OneMax problem. Let and model the ()-EA with and without noise for maximizing OneMax, respectively. It is not hard to see that the EFHT only depends on . We denote as with . The order of is shown in Lemma 23.
For any mutation probability , it holds that
We prove inductively on j.
(1) Initialization is to prove , which holds, since .
(3) Conclusion. According to steps (1) and (2), the lemma holds.
Any noise makes the OneMax problem harder for the ()-EA with mutation probability less than 0.5.
In the following sections we analyze the effect of different noise-handling strategies for the ()-EA (a specific case of the ()-EA), optimizing the OneMax problem to investigate their usefulness.
4.2 On Reevaluation and Threshold Selection Strategies
Single-evaluation. We evaluate a solution once, and use the evaluated fitness for this solution in the future.
Reevaluation. We access the fitness of a solution by evaluation every time.
For example, for the ()-EA in Algorithm 1, if using reevaluation, both and will be calculated and recalculated in each iteration; if using single-evaluation, only will be calculated and the previous obtained fitness will be reused. Note that the analysis in the previous section without explicitly indicating the employed evaluation strategy assumes single-evaluation.
Sudholt and Thyssen (2012), for an ACO with single-evaluation solving stochastic shortest path problems, constructed an example graph to show that exponential running time is required for approximating real shortest paths. The difficulty is because once a path is luckily evaluated to have a relatively small length due to noise, it will always be preferred and make the ACO get stuck in an inferior solution. By using reevaluation instead of single-evaluation when evaluating the best-so-far path, the ACO can easily solve the example graph (Doerr et al., 2012a). Reevaluation has also been employed for EAs solving noisy multiobjective optimization problems (e.g., Buche et al., 2002; Park and Ryu, 2011; Fieldsend and Everson, 2015).
Intuitively, reevaluation can smooth noise and thus could be better for noisy optimizations, but it also increases the fitness evaluation cost and thus increases the running time. Its usefulness was not clear.
In this section we compare these two options for the ()-EA, solving the OneMax problem under one-bit noise to show whether reevaluation is useful. Note that for one-bit noise, pn controls the noise strength, that is, noise becomes stronger as pn gets larger, and it is also the parameter of the PNT. In the following analysis, let indicate any polynomial of n.
For the ()-EA with mutation probability solving the OneMax problem under one-bit noise, if using single-evaluation, the PNT is .
The theorem is straightforwardly derived from the following two lemmas. Lemma 26 tells us the expected running time upper bound , which implies that the expected running time is polynomial if , i.e., . Lemma 27 tells us the lower bound , which implies that the running time is superpolynomial if , i.e., . By combining these results, we get that the maximum noise strength allowing polynomial expected running time is , i.e., the PNT is .
For the ()-EA using single-evaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is upper-bounded by .
Let L denote the noisy fitness value of the current solution x. Because the ()-EA does not accept a solution with a smaller fitness (step 4 of Algorithm 1) and does not reevaluate the fitness of the current solution x, will never decrease. By applying the fitness level technique (Wegener, 2002; Sudholt, 2013), we first analyze the expected steps until L increases when starting from (denoted by ) and then sum them up to get an upper bound for the expected steps until L reaches the maximum value n. For , we analyze the probability P that L increases in two steps when , then . Note that one-bit noise can make L be , or , where is the number of 1 bits. When analyzing the noisy fitness of the offspring in each step, we need to first consider bitwise mutation on x and then one random bit flip for noise.
When , , L or .
(1) For , , since it is sufficient to flip one 0 bit for mutation and one 0 bit for noise in the first step, or flip one 0 bit for mutation and no bit for noise in the first step and flip one 0 bit for mutation and no bit for noise in the second step.
(2) For , since it is sufficient to flip no bit for mutation and one 0 bit for noise, or flip one 0 bit for mutation and no bit for noise in the first step.
(3) For , since it is sufficient to flip no bit for mutation and no bit or one 0 bit for noise in the first step.
When , or 1. By considering cases 2 and 3, we get the same lower bound for P.
When and the optimal solution has not been found, or . By considering cases 1 and 2, we get .
When , or n. The equality means that the optimal solution has been found. Because we are to get an upper bound for the expected running time of finding , we can pessimistically assume that . Starting from and (i.e., the current solution has one bits and the fitness is n), it will always keep in such a situation before finding , and the optimal solution can be generated and accepted in one step only through flipping the unique 0 bit for mutation and no bit for noise, which happens with probability . This implies that the expected steps for finding the optimal solution are at most .
Thus, the total expected running time is upper-bounded by .
For the ()-EA using single-evaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is lower-bounded by .
Because the initial solution is uniformly distributed over , we have . Thus, the expected running time of the whole process is lower-bounded by , i.e., .
Note that when , the derived lower bound would be quite loose. Thus, for filling up this gap, we are to derive another lower bound that does not depend on pn. From Droste et al. (2002, Lemma 23), we know that the expected running time of the ()-EA to optimize linear functions with positive weights is . Their proof idea is to analyze the expected running time until all the 0 bits of the initial solution have been flipped at least once, which is obviously a lower bound on the expected running time of finding the optimal solution . Because noise will not affect this analysis process, we can directly apply their result to our setting and then get the lower bound .
By combining the derived two lower bounds, we get that the expected running time of the whole process is lower-bounded by .
We then show the PNT using reevaluation in the following theorem, which can be straightforwardly derived from Lemma 29.
For the ()-EA with mutation probability solving the OneMax problem under one-bit noise, if using reevaluation, the PNT is .
For the ()-EA using reevaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is polynomial when , and superpolynomial when .
4.2.2 Threshold Selection
During the process of evolutionary optimization, most of the improvements in one generation are small. When using reevaluation, because of noisy fitness evaluation, a considerable portion of these improvements are not real, where a worse solution appears to have a better fitness and then survives to replace the true better solution which appears to have a worse fitness. This may mislead the search direction of EAs and slow down the efficiency of EAs or make the EAs get trapped in the local optimal solution (see Section 4.2.1). To deal with this problem, a selection strategy for EAs handling noise was proposed (Markon et al., 2001; Bartz-Beielstein, 2005a), namely, threshold selection, where an offspring solution will be accepted only if its fitness is larger than the parent solution by at least a predefined threshold .
For example, for the ()-EA with threshold selection as in Algorithm 3, step 4 changes to be “if ” rather than “if ” in Algorithm 1. Such a strategy can reduce the risk of accepting a bad solution due to noise. Although the good local performance (i.e., the progress of one step) of EAs with threshold selection has been shown on some problems (Markon et al., 2001; Bartz-Beielstein and Markon, 2002; Bartz-Beielstein, 2005b), its usefulness for the global performance (i.e., the running time until finding the optimal solution) of EAs under noise is not yet clear.
In this section we analyze the running time of the ()-EA with threshold selection solving OneMax under one-bit noise to see whether threshold selection is useful. Note that the analysis here assumes reevaluation. This is because using single-evaluation and threshold selection simultaneously will lead to infinite expected running time for any noise strength , as shown in the following theorem.
For the ()-EA with mutation probability