Abstract

Many optimization tasks must be handled in noisy environments, where the exact evaluation of a solution cannot be obtained, only a noisy one. For optimization of noisy tasks, evolutionary algorithms (EAs), a type of stochastic metaheuristic search algorithm, have been widely and successfully applied. Previous work mainly focuses on the empirical study and design of EAs for optimization under noisy conditions, while the theoretical understandings are largely insufficient. In this study, we first investigate how noisy fitness can affect the running time of EAs. Two kinds of noise-helpful problems are identified, on which the EAs will run faster with the presence of noise, and thus the noise should not be handled. Second, on a representative noise-harmful problem in which the noise has a strong negative effect, we examine two commonly employed mechanisms dealing with noise in EAs: reevaluation and threshold selection. The analysis discloses that using these two strategies simultaneously is effective for the one-bit noise but ineffective for the asymmetric one-bit noise. Smooth threshold selection is then proposed, which can be proved to be an effective strategy to further improve the noise tolerance ability in the problem. We then complement the theoretical analysis by experiments on both synthetic problems as well as two combinatorial problems, the minimum spanning tree and the maximum matching. The experimental results agree with the theoretical findings and also show that the proposed smooth threshold selection can deal with the noise better.

1  Introduction

Optimization tasks often encounter noisy environments. For example, in industrial design such as VLSI design (Guo et al., 2014), every prototype is evaluated by simulations; therefore, the result of the evaluation may not be perfect due to the simulation error. Also, with machine learning, a prediction model is evaluated only on a limited amount of data (Qian et al., 2015a); therefore, the estimated performance is shifted from the true performance. It is possible that noisy environments change the properties of an optimization problem; thus traditional optimization techniques may have low efficacy. Meanwhile, evolutionary algorithms (EAs) (Bäck, 1996) have been widely and successfully adopted for noisy optimization tasks (Freitas, 2003; Ma et al., 2006; Chang and Chen, 2006).

EAs are a type of randomized metaheuristic optimization algorithm, inspired by natural phenomena including evolution of species, swarm cooperation, immune systems, and others. EAs typically involve a cycle of three stages: a reproduction stage that produces new solutions based on the currently maintained solutions; an evaluation stage that evaluates the newly generated solutions; and a selection stage that wipes out bad solutions. The concept of using EAs for noisy optimization is that the corresponding natural phenomena have been successfully processed in noisy natural environments, and hence the algorithmic simulations are also likely to be able to handle noise.

On one hand, it is believed that noise makes the optimization harder, and thus handling mechanisms have been proposed to reduce the negative effect of the noise (Fitzpatrick and Grefenstette, 1988; Beyer, 2000; Arnold and Beyer, 2003). Two representative strategies are reevaluation and threshold selection. According to the reevaluation strategy (Jin and Branke, 2005; Goh and Tan, 2007; Doerr et al., 2012a), whenever the fitness (also called the cost or objective value) of a solution is required, EAs make an independent evaluation of the solution regardless of whether the solution has been evaluated before, such that the fitness is smoothed. According to the threshold selection strategy (Markon et al., 2001; Bartz-Beielstein and Markon, 2002; Bartz-Beielstein, 2005a), in the selection stage EAs accept a newly generated solution only if its fitness is larger than the fitness of the old solution by at least a threshold value , such that the risk of accepting a bad solution due to noise is reduced.

On the other hand, several empirical observations have shown cases where noise can have a positive impact on the performance of local search (Selman et al., 1994; Hoos and Stützle, 2000; 2005), which indicates that noise does not always have a negative impact.

As these previous studies are mainly empirical, theoretical analysis is needed for a better understanding of evolutionary optimization in noisy environments.

1.1  Related Work

Despite EAs’ wide and successful application, the theoretical analysis of EAs on noisy optimization is rare. Some theoretical results on EAs have emerged (e.g., Neumann and Witt, 2010; Auger and Doerr, 2011), but most of them focus on clean environments. In noisy environments, the optimization is more complex and more randomized; thus the theoretical analysis is difficult.

Only a few theoretical analyses for EAs on noisy optimization have been published. Gutjahr (2003; 2004) first analyzed the ant colony optimization (ACO) algorithm for stochastic combinatorial optimization and proved convergence under mild conditions. Droste (2004) gave a running time analysis of EAs in discrete noisy optimization for the first time. Droste analyzed the ()-EA on the OneMax problem under one-bit noise and showed the maximal noise strength allowing a polynomial running time, where the noise strength is characterized by the noise probability in and n is the problem size. Sudholt and Thyssen (2012) analyzed the running time of a simple ACO for stochastic shortest path problems where edge weights are subject to noise, and showed the ability and limitation of the ACO under various noise models. For the difficulty faced by an ACO under a specific noise model, Doerr et al. (2012a) further showed that the reevaluation strategy can overcome it, that is, avoid being misled by an exceptionally optimistic evaluation due to noise. Qian et al. (2014) investigated the effectiveness of sampling, a common strategy to reduce the effect of noise. They proved a sufficient condition under which sampling is useless (i.e., sampling increases the running time), and applied it to show that sampling is useless for the ()-EA optimizing the OneMax and the Trap problems under additive Gaussian noise.

1.2  Our Contribution

In this article, we study the effect of noise on EAs and investigate the noise-handling mechanisms when noise needs to be accounted for.

Table 1:
The PNT with respect to one-bit noise of the ()-EA using different noise-handling strategies on the OneMax problem.
Noise Handling StrategiesPNT
single evaluation  
single-evaluation and   
reevaluation  (Droste, 2004
reevaluation and   
reevaluation and   
reevaluation and   
Noise Handling StrategiesPNT
single evaluation  
single-evaluation and   
reevaluation  (Droste, 2004
reevaluation and   
reevaluation and   
reevaluation and   

The effect of noise on the expected running time of EAs is investigated in Section 3. On deceptive and flat problems, we prove that noise can simplify the optimization (i.e., decrease the expected running time) for EAs. The analysis results support that for some difficult problems, handling the noise is not necessary.

In Section 4.1 the OneMax problem is proved to be negatively affected by noise, and in Section 4.2 two commonly employed noise-handling mechanisms are examined: the reevaluation and the threshold selection strategies. With the ()-EA under one-bit noise, the noise-handling mechanisms are evaluated by the polynomial noise tolerance (PNT), which is the range of the noise strength such that the expected running time of the algorithm is polynomial. The wider the PNT is, the better a noise-handling mechanism is. For example, the one-bit noise strength (and thus the PNT) is characterized by the noise probability . The configurations of the ()-EA that we analyzed include without any noise-handling strategy (single-evaluation), single-evaluation with threshold selection (single-evaluation with a value of ), and reevaluation with threshold selection (reevaluation with a value of ). Their PNTs are presented in Table 1, where the PNT of the ()-EA with reevaluation (but no threshold selection) is directly derived from Droste (2004). The comparison shows the following:

  • Reevaluation alone makes the PNT much worse than single-evaluation.

  • Threshold selection must be combined with reevaluation; otherwise, the EA could not tolerate any noise strength larger than 0; meanwhile, reevaluation can also be better if used with threshold selection.

  • Reevaluation with threshold selection (threshold ) can improve upon that of single-evaluation.

In Section 4.3 we disclose a weakness of the noise-handling mechanisms: when used with the ()-EA solving the OneMax problem under asymmetric one-bit noise, all of them are ineffective (i.e., need exponential running time) when the noise probability reaches 1. The reason for the ineffectiveness of reevaluation with threshold selection is that it has too large a probability of accepting false progress caused by the noise when the threshold and too small a probability of accepting true progress when . Setting between 1 and 2 is useless because of the minimum fitness gap 1 (i.e., a value of is equivalent to ). We then introduce a modification into a threshold selection strategy to turn the original hard threshold into a smooth threshold, which allows a fractional threshold to be effective. We prove that with the smooth threshold selection strategy the PNT can be , that is, the ()-EA is always a polynomial algorithm on the problem regardless of the noise probability.

Finally, in Section 5, we describe our experiments to verify and complement the theoretical results. We show using two problem classes, the Jump problem, which is a synthetic problem, and the minimum spanning tree problem, which is a common combinatorial problem. We show that the badness of the noise is negatively correlated with the hardness of the problem, which was previously not noticed. Therefore, when the problem is quite hard, the noise can be helpful and thus handling the noise is not necessary. Then we verify that smooth threshold selection can better handle the noise by experiments on the maximum matching problem. Section 6 concludes the article.

2  Preliminaries

2.1  Noisy Optimization

A general optimization problem can be represented as , where the objective f is also called fitness in the context of evolutionary computation. In real-world optimization tasks, the fitness evaluation for a solution is usually disturbed by noise, and consequently we cannot obtain the exact fitness value but only a noisy one. Let and denote the noisy and true fitness of a solution x, respectively. In this study, we use the following three widely investigated noise models:

  • Additive. , where is uniformly selected from at random.

  • Multiplicative. , where is uniformly randomly selected from .

  • One-bit. with probability ; otherwise, , where is generated by flipping a uniformly randomly chosen bit of . This noise is for problems where solutions are represented in binary strings.

Additive and multiplicative noise have often been used to analyze the effect of noise (Beyer, 2000; Jin and Branke, 2005). One-bit noise is specifically used for optimizing pseudo-Boolean problems over and has been investigated in the first work for analyzing the running time of EAs in noisy optimization (Droste, 2004) and used to understand the role of noise in stochastic local search (Selman et al., 1994; Hoos and Stützle, 1999; Mengshoel, 2008).

Besides these kinds of noise we also consider a variant of one-bit noise called asymmetric one-bit noise (see Definition 1). Inspired by the asymmetric mutation operator (Jansen and Sudholt, 2010), asymmetric one-bit noise flips a specific bit position with probability depending on the number of bit positions that take the same value. For the flipping of asymmetric one-bit noise on a solution , the probability of flipping a specific 0 bit is , and the probability of flipping a specific 1 bit is , where is the number of 0 bits of x. Note that for one-bit noise, the probability of flipping any specific bit is . For both one-bit and asymmetric one-bit noise, pn controls the noise strength. In this article, we assume that the parameters of the environment (pn, , and ) do not change over time.

Definition 1 (Asymmetric One-Bit Noise):
Given a fitness function f and a solution , asymmetric one-bit noise with a parameter leads to a noisy fitness value as with probability , otherwise , where is generated by flipping the jth bit of x, and j is a uniformly randomly chosen position of
formula

It is possible that a large noise could make an optimization problem extremely hard for particular algorithms. We are thus interested in the noise strength under which an algorithm could be tolerant to have a polynomial running time. The noise strength can be measured by adjustable parameters, for instance, for additive and multiplicative noise, and pn for one-bit noise. We denote as a type of noisy fitness that disturbs the original fitness function f by noise with parameter (where can be a tuple, e.g., for additive noise), and define the PNT in Definition 2, which characterizes the maximum range of the noise parameter for allowing a polynomial expected running time. Note that the PNT is if the algorithm never has a polynomial expected running time for any noise strength. We study the PNT of EAs in order to analyze the effectiveness of noise-handling strategies.

Definition 2 (Polynomial Noise Tolerance):
For an algorithm running on a problem f with a type of noise , let be the expected running time of on f with noise strength represented by the parameter . Then, the polynomial noise tolerance of on f with the type of noise is the range of the noise strength in which the expected running time is polynomial to the problem size n, that is,
formula

2.2  Evolutionary Algorithms

Evolutionary algorithms (Bäck, 1996) are a type of population-based metaheuristic optimization algorithm. Although many variants exist, the common procedure for EAs can be described as follows:

  1. Generate an initial set of solutions (called a population).

  2. Reproduce new solutions from the current population.

  3. Evaluate the newly generated solutions.

  4. Update the population by removing the bad solutions.

  5. Repeat steps 2–5 until a specific criterion is met.

The ()-EA, as in Algorithm 1, is a simple EA for maximizing pseudo-Boolean problems over , which reflects the common structure of EAs. It maintains only one solution and repeatedly improves the current solution by using bitwise mutation (i.e., step 3 of Algorithm 1). It has been widely used for the running time analysis of EAs (e.g., by He and Yao, 2001; Droste et al., 2002).

formula
formula

The ()-EA, as in Algorithm 2, applies an offspring population size . In each iteration, it first generates offspring solutions by independently mutating the current solution times, and then selects the best from the current solution and the offspring solutions as the next solution. It has been used to disclose the effect of offspring population size by running time analysis (Jansen et al., 2005; Neumann and Wegener, 2007). Note that the ()-EA is a special case of the ()-EA with .

The running time of EAs is usually defined as the number of fitness evaluations (i.e., computing ) until an optimal solution is found for the first time, since the fitness evaluation is often the computational process with the highest cost of the algorithm (He and Yao, 2001; Yu and Zhou, 2008).

2.3  Markov Chain Modeling

We analyze EAs by modeling them as Markov chains. Here, we give some preliminaries.

EAs often generate solutions only based on their currently maintained solutions; thus they can be modeled and analyzed as Markov chains (e.g., He and Yao, 2001; Yu and Zhou, 2008; Yu et al., 2015). A Markov chain modeling an EA is constructed by taking the EA’s population space as the chain’s state space, . Let denote the set of all optimal populations that contain at least one optimal solution. The goal of the EA is to reach from an initial population. Thus, the process of an EA seeking can be analyzed by studying the corresponding Markov chain with the optimal state space . Note that we consider the discrete state space (i.e., is discrete) in this article.

A Markov chain is a random process, where , depends only on . A Markov chain is said to be homogeneous if
formula
1
In this article, we always denote and as the state space and the optimal state space of a Markov chain, respectively.

Given a Markov chain and , we define the first hitting time (FHT) of the chain as a random variable such that . That is, is the number of steps needed to reach the optimal state space for the first time starting from . The mathematical expectation of , , is called the expected first hitting time (EFHT) of this chain starting from . If is drawn from a distribution , is called the expected first hitting time of the Markov chain over the initial distribution .

For the corresponding EA, the running time is the number of calls to the fitness function until meeting an optimal solution for the first time. Thus, the expected running time starting from and that starting from are respectively equal to
formula
2
where N1 and N2 are the number of fitness evaluations for the initial population and each iteration, respectively. For example, for the ()-EA, and ; for the ()-EA, and . Note that, when involving the expected running time of an EA in a problem in this article, if the initial population is not specified, it is the expected running time starting from a uniform initial distribution , that is, .

The following two lemmas on the EFHT of Markov chains (Freĭdlin, 1996) are used in the article.

Lemma 1:
Given a Markov chain , we have
formula
Lemma 2:
Given a homogeneous Markov chain , it holds
formula

Drift analysis is a commonly used tool for analyzing the EFHT of Markov chains. It was first introduced to the running time analysis of EAs by He and Yao (2001; 2004). Later, it became a popular tool in this field, and advanced variants have been proposed (e.g., Doerr et al., 2012b; Doerr and Goldberg, 2013). In this article, we use the additive version (Lemma 5). To use it, a function has to be constructed to measure the distance of a state x to the optimal state space . The distance function satisfies that and . Then, we need to investigate the progress on the distance to in each step, . An upper (lower) bound of the EFHT can be derived through dividing the initial distance by a lower (upper) bound of the progress.

Lemma 3 (Additive Drift Analysis) (He and Yao, 2001; 2004):
Given a Markov chain and a distance function , if it satisfies that for any and any with ,
formula
then the EFHT of this chain satisfies that
formula
where do not depend on and t.

The simplified drift theorem (Oliveto and Witt, 2011; 2012) as presented in Lemma 6 was proposed to prove exponential lower bounds on the FHT of Markov chains, where Xt is usually represented by a mapping of . It requires two conditions: a constant negative drift and exponentially decaying probabilities of jumping toward or away from the goal state. To relax the requirement of a constant negative drift, advanced variants have been proposed, for instance, the simplified drift theorem with self-loops (Rowe and Sudholt, 2014) and the simplified drift theorem with scaling (Oliveto and Witt, 2014; 2015). In this article, we use the original version (Lemma 6).

Lemma 4 (Simplified Drift Theorem) (Oliveto and Witt, 2011; 2012):
Let Xt, , be real-valued random variables describing a stochastic process over some state space. Suppose there exists an interval , two constants and (possibly depending on ) a function satisfying such that for all the following two conditions hold:
formula
Then there is a constant such that for it holds .

2.4  Pseudo-Boolean Functions

The pseudo-Boolean function class in Definition 7 is a large function class that only requires the solution space to be and the objective space to be . Many well-known NP-hard problems (e.g., the vertex cover problem and the 0-1 knapsack problem) belong to this class. Diverse pseudo-Boolean problems with different structures and difficulties have been used to disclose properties of EAs (e.g., Droste et al., 1998; 2002; He and Yao, 2001). We consider only maximization problems in this article. In the following, let xi denote the ith bit of a solution .

Definition 3 (Pseudo-Boolean Function):

A function in the pseudo-Boolean function class has the form:

The Trap problem in Definition 8 is a special instance in this class, in which the aim is to maximize the number of 0 bits of a solution except for the global optimum (briefly denoted as ). Its optimal function value is , and the function value for any nonoptimal solution is not larger than 0. It has been used in the theoretical studies of EAs, and the expected running time of the ()-EA with mutation probability has been proved to be (Droste et al., 2002). It has also been recognized as the hardest instance in the pseudo-Boolean function class with a unique global optimum for the ()-EA (Qian et al., 2012), that is, the expected running time of the ()-EA on the Trap problem is the largest among the class.

Definition 4 (Trap Problem):
Trap problem of size n is to solve the problem
formula

The Peak problem in Definition 9 has the same fitness for all solutions except for the global optimum . It has been shown that for solving this problem, the ()-EA with mutation probability needs running time with an overwhelming probability (Oliveto and Witt, 2011).

Definition 5 (Peak Problem):
Peak problem of size n is to solve the problem
formula

The OneMax problem in Definition 10 aims to maximize the number of 1 bits of a solution. Its optimal solution is with the function value n. The running time of EAs has been well studied on the OneMax problem (He and Yao, 2001; Droste et al., 2002; Sudholt, 2013); particularly, the expected running time of the ()-EA with mutation probability is (Droste et al., 2002). It has also been recognized as the easiest instance in the pseudo-Boolean function class with a unique global optimum for the ()-EA (Qian et al., 2012).

Definition 6 (OneMax Problem):
OneMax problem of size n is to solve the problem
formula

3  On the Effect of Noisy Fitness

In this section, we provide two types of problems in which the noise can make the optimization easier for EAs. By easier, we mean that the EA with noise needs less expected running time than that without noise to find the optimal solution.

We analyze EAs by modeling them as Markov chains. Here, we first give some properties of Markov chains, which are used in the following analysis. We define a partition of the state space of a homogeneous Markov chain based on the EFHT in Definition 11, and then define a jumping probability of a chain from one state to one state space in Definition 12. It is easy to see that in Definition 11 is just , since .

Definition 7 (EFHT Partition):
For a homogeneous Markov chain , the EFHT partition is a partition of into nonempty subspaces such that
formula

Note that the EFHT partition is different from the fitness partition used in the fitness-level method (Wegener, 2002; Sudholt, 2013) for EAs’ running time analysis, since the solutions with the same fitness can have different EFHTs, and the EFHT order can be either consistent (e.g., the ()-EA on the Trap problem, as in Lemma 14) or inconsistent (e.g., the ()-EA on the OneMax problem, as in Lemma 23) with the fitness order.

Definition 8:

For a Markov chain , is the probability of jumping from state x to state space in one step at time t.

Lemma 13 compares the EFHT of two Markov chains. It intuitively means that if one chain always has a larger probability of jumping into good states (i.e., with small j values), it needs less time for reaching the optimal state space.

Lemma 5:
Given a Markov chain and a homogeneous Markov chain with the same state space and the same optimal space , let denote the EFHT partition of . If for all , , and for all integers ,
formula
3
then for all ,  

To prove Lemma 13, we need the following lemma, which is proved by using the property of majorization and Schur concavity.

Lemma 6:
Let be an integer. If it satisfies that
formula
then it holds that
formula
Proof:

Let . Because of condition 1 that Ei is increasing, f is Schur-concave (Marshall et al., 2011, Theorem A.3). Conditions 2 and 3 imply that the vector majorizes . Thus, we have , which proves the lemma.

Proof of Lemma 13:

We prove one direction of the inequality, and the other can be proved similarly. We use Lemma 5 to derive a bound on , based on which this lemma holds.

To use Lemma 5 to analyze , we first construct a distance function as
formula
4
which satisfies that and by Lemma 3.
Then, we investigate for any x with .
formula
Since , increases with j and Eq. (3) holds, by Lemma 14, we have
formula
Thus, we have, for all , all ,

By Lemma 5, we get for all ,

3.1  On Deceptive Problems

Most practical EAs employ time-invariant operators; thus we can model an EA without noise by a homogeneous Markov chain, while for an EA with noise, since noise may change over time, we can just model it by a Markov chain. In the following analysis, we always denote them respectively by and , and denote the EFHT partition of by .

An evolutionary process can be characterized by variation (i.e., producing new solutions) and selection (i.e., weeding out bad solutions). Denote the state spaces before and after variation by and respectively, and then the variation process is a mapping and the selection process is a mapping (e.g., for the ()-EA on any pseudo-Boolean problem, and ). Note that is just the state space of the Markov chain. Let be the state transition probability by the variation process. Let denote the optimal solution set. The considered solution set (e.g., population) may be a multiset. For two multisets , we mean that .

Definition 9 (Deceptive Markov Chain):
A homogeneous Markov chain modeling an EA optimizing a problem without noise is deceptive if for any ,
formula
5
Theorem 1:
For an EA optimizing a problem f, which can be modeled by a deceptive Markov chain, if
formula
6
then noise makes f easier for .

The theorem intuitively means that if an evolutionary process is deceptive and the optimal solution is always accepted once generated in the noisy evolutionary process, then noise will be helpful.

Proof:

The two EAs with and without noise are different only on whether the fitness evaluation is disturbed by noise; thus they must have the same values on N1 and N2 for their running time Eq. (2). Then, comparing their expected running time is equivalent to comparing the EFHTs of their corresponding Markov chains.

In one step of the evolutionary process, denote the states before and after variation by and respectively, and denote the state after selection by . Because the selection process does not produce new solutions, it must satisfy that . Assume that . For (i.e., without noise), we have
formula
7
For (i.e., with noise), the condition Eq. (6) makes that once an optimal solution is generated, it will be always accepted. Thus, we have
formula
8
By combining Eqs. (5)–(8), we have
formula
Since , this inequality is equivalent to
formula
which implies that the condition Eq. (3) of Lemma 13 holds. Thus, by Lemma 13, we get , i.e., noise makes f easier for .

Then, we give a concrete deceptive evolutionary process, that is, the ()-EA optimizing the Trap problem. For the Trap problem given in Definition 8, it is to maximize the number of 0 bits except for the optimal solution . It is not hard to see that the EFHT only depends on (i.e., the number of 0 bits). We denote as with . The order of is shown in Lemma 14.

Lemma 7:

For any mutation probability , it holds that

For proving Lemma 14, we need the following two lemmas. Lemma 18 (Witt, 2013) says that it is more likely that the offspring generated by mutating a parent solution with fewer 0 bits has a smaller number of 0 bits. Note that we consider instead of in their original lemma. It still holds because of symmetry. We have also restricted instead of , which leads to the strict inequality in the conclusion. Lemma 19 is very similar to Lemma 14, except that the inequalities in condition 3 and the conclusion hold strictly.

Lemma 8 (Witt, 2013):
Let be two search points satisfying . Denote by and the random strings obtained by flipping each bit of x and y independently with probability p, respectively. If , then for any ,
formula
Lemma 9:
Let be an integer. If it satisfies that
formula
then it holds that
Proof:

Let for , and . Then, it is easy to see that the two vectors and satisfy conditions 2 and 3 of Lemma 14. Furthermore, condition 1 of Lemma 14 that Ei is increasing holds. Thus, by Lemma 14, we have

Then, we compare with .
formula
Thus, we have that is, the lemma holds.
Proof of Lemma 14:

First, trivially holds, because and . Then, we prove inductively on j.

(1) Initialization is to prove . For , because the next solution can be only or , we have , we have . For , because the next solution can be , or a solution with number of 0 bits, we have , where P denotes the probability that the next solution is . Then, . Thus, we have
formula
where the inequality is by .
(2) Inductive hypothesis assumes that
formula
Then, we consider . Let x and y be a solution with number of 0 bits and that with K number of 0 bits, respectively. Let a and b denote the number of 0 bits of the offspring solutions and , respectively. That is, and . For the independent mutations on x and y, we use and , respectively. Note that, are independently and identically distributed (i.i.d.), and are also i.i.d. Let and . Then, from Lemma 18, we have .
For , let P0 and Pi be the probability that for the offspring solutions, the least number of 0 bits is 0 (i.e., ), and the largest number of 0 bits is i, while the least number of 0 bits is larger than 0 (i.e., ), respectively. By considering the mutation and selection behavior of the ()-EA on the Trap problem, we have
formula
For , let and . Then, we have
formula
For comparing with , we need to show that
formula
9
For , we have
formula
For , we similarly have Thus,
formula
where the last equality is by letting .

Since , it is easy to verify that is increasing. Then, we have by . Thus, the Eq. (9) holds.

By subtracting from , we get
formula
where the inequality is by applying Lemma 19 to the formula in . The three conditions of Lemma 19 can be easily verified, because by inductive hypothesis; ; and Eq. (9) holds. Because , we have .

(3) Conclusion. According to steps (1) and (2), the lemma holds.

Theorem 2:

Either additive noise with or multiplicative noise with makes the Trap problem easier for the ()-EA with mutation probability less than 0.5.

Proof:

First, we are to show that the ()-EA optimizing the Trap problem can be modeled by a deceptive Markov chain. By Lemma 14, the EFHT partition of is and m in Definition 11 is equal to n here.

For any , we denote and as the probability that for the offspring solutions generated by bitwise mutation on x, (i.e., the least number of 0 bits is 0), and (i.e., the largest number of 0 bits is j, while the least number of 0 bits is larger than 0), respectively. For , because only the optimal solution or the solution with the largest number of 0 bits among the parent solution and offspring solutions will be accepted, we have
formula
This implies that Eq. (5) holds.
Then, we are to show that the condition of Theorem 16 (i.e., Eq. (6)) holds. For with additive noise, since , we have
formula
For multiplicative noise, since , then and . Thus, for these two noises, we have , which implies that if the optimal solution is generated, it will always be accepted. Thus, we have , . This implies that Eq. (6) holds.

Thus, by Theorem 16, we get that the Trap problem becomes easier for the ()-EA under these two types of noise.

3.2  On Flat Problems

Besides deceptive problems, we show that noise can also make flat problems easier for EAs. We take the Peak problem given in Definition 9 as the representative problem, which has the same fitness for all solutions except for the optimal solution . When using EAs to solve it, it provides no information for the search direction; thus it is hard for EAs. We analyze the ()-EA optimizing the Peak problem. The ()-EA is the same as the ()-EA except that it employs the strict selection strategy. That is, step 4 of Algorithm 1 changes to be “if .” The expected running time of the ()-EA with mutation probability on the Peak problem has been proved to be lower-bounded by (Droste et al., 2002).

Theorem 3:

One-bit noise with being a constant makes the Peak problem easier for the ()-EA with mutation probability , when starting from an initial solution x with .

Proof:

Let and model the ()-EA with one-bit noise and without noise for maximizing the Peak problem, respectively. It is not hard to see that both the EFHT and only depend on . We denote and as and with , respectively.

For (i.e., without noise) starting from a solution x with , in one step, any nonoptimal offspring solution has the same fitness as the parent and then will be rejected because of the strict selection strategy; only the optimal solution can be accepted, which happens with probability . Thus, we have
formula
which leads to
For (i.e., with one-bit noise), we assume using reevaluation, which reevaluates and evaluates in each iteration of Algorithm 1. When starting from x with , if the generated offspring is the optimal solution , it will be accepted with probability because only no bit flip for noise on and no 0 bit flip for noise on x can make ; otherwise, x will keep , because for any with . Thus, we have
formula
which leads to .
When starting from x with , if the offspring is , it will be accepted with probability because only no bit flip for noise on can make ; if , it will be accepted with probability because only flipping the unique 0 bit for noise on can make ; otherwise, x keeps because for any with . Let be the probability of mutating to by bitwise mutation with . Then, we have, for ,
formula
which leads to .
From Eq. (2), we know that the expected running times without noise and with one-bit noise are and , respectively. To prove that one-bit noise can be helpful, we need to show that there exists such that . Obviously, is impossible because . Then, for larger i with ,
formula
where the first inequality is because and for large enough n and pn being constant, and the last inequality is by .

This is equivalent to , which implies that noise is helpful when starting from an initial solution x with .

This theorem implies that the Peak problem becomes easier under noise when starting from an initial solution x with a large number of 0 bits. From the analysis, we can see that the reason for requiring a large is to make much larger than , which means that the negative effect of rejecting the optimal solution by noise can be compensated by the positive effect of accepting the solution x with .

For the ()-EA solving the Peak problem, any offspring solution will be accepted because its fitness is always not less than the fitness of the parent solution; thus the solution x in the evolutionary process almost performs a random walk over . In this case, we can intuitively find a similar effect of one-bit noise as that found in the ()-EA solving the Peak problem. Here, we assume that the single-evaluation strategy is used. Under one-bit noise, for any nonoptimal parent solution x, if , then and any offspring will be accepted; if and , then any offspring will be accepted; if and , any offspring with will be rejected because , and the optimal solution with will be rejected with probability pn. Compared with the transition behavior without noise, noise only has an effect when and : the negative effect of rejecting the optimal solution, which has the probability , and the positive effect of rejecting , which has the probability at least . Obviously, the negative effect can be compensated by the positive effect, which implies that one-bit noise is helpful. Thus, we have the following conjecture. The rigorous analysis is not easy, and we leave it to future work. We instead verify it in the experiment section.

Conjecture 1:

One-bit noise makes the Peak problem easier for the ()-EA with mutation probability .

4  On the Effect of Noise-Handling Strategies

In the previous section, we found that noise can make optimization easier for EAs when the problem presents some deceptiveness and flatness. Meanwhile, on other problems, noisy fitness evaluation can make an optimization harder for EAs. For example, Droste (2004) proved that the running time of the ()-EA on the OneMax problem can increase from polynomial to exponential in the presence of noise. Thus, in this section, we investigate how well different noise-handling strategies can perform when the noise is indeed harmful.

4.1  A Noise-Harmful Case

We consider the case that the ()-EA is used for optimizing the OneMax problem. Let and model the ()-EA with and without noise for maximizing OneMax, respectively. It is not hard to see that the EFHT only depends on . We denote as with . The order of is shown in Lemma 23.

Lemma 10:

For any mutation probability , it holds that

Proof:

We prove inductively on j.

(1) Initialization is to prove , which holds, since .

(2) Inductive hypothesis assumes that
formula
Then, we consider . We use the similar analysis method as in the proof of Lemma 14 to compare with .
For , let be the probability that the least number of 0 bits for the offspring solutions is i (i.e., ). By considering the mutation and selection behavior of the ()-EA on the OneMax problem, we have
formula
For , let . We have
formula
By subtracting from , we get
formula
where the inequality is by applying Lemma 19 to the formula in . The three conditions of Lemma 19 can be easily verified, because by inductive hypothesis; ; and the following inequality holds.
formula
Because , we have .

(3) Conclusion. According to steps (1) and (2), the lemma holds.

Theorem 4:

Any noise makes the OneMax problem harder for the ()-EA with mutation probability less than 0.5.

Proof:

We use Lemma 13 to prove it. By Lemma 23, the EFHT partition of is .

For any nonoptimal solution , we denote as the probability that the least number of 0 bits for the offspring solutions generated by bitwise mutation on x is j. For , because the solution with the least number of 0 bits among the parent solution and offspring solutions will be accepted, we have
formula
For , because of the fitness evaluation disturbed by noise, the solution with the least number of 0 bits among the parent and offspring solutions may be rejected. Thus, we have
formula
Then, we get
formula
which implies that the condition Eq. (3) of Lemma 13 holds. Thus, we get , , namely, noise makes the OneMax problem harder for the ()-EA.

In the following sections we analyze the effect of different noise-handling strategies for the ()-EA (a specific case of the ()-EA), optimizing the OneMax problem to investigate their usefulness.

4.2  On Reevaluation and Threshold Selection Strategies

4.2.1  Reevaluation

There are naturally two fitness evaluation options for EAs (Arnold and Beyer, 2002; Jin and Branke, 2005; Goh and Tan, 2007):

  • Single-evaluation. We evaluate a solution once, and use the evaluated fitness for this solution in the future.

  • Reevaluation. We access the fitness of a solution by evaluation every time.

For example, for the ()-EA in Algorithm 1, if using reevaluation, both and will be calculated and recalculated in each iteration; if using single-evaluation, only will be calculated and the previous obtained fitness will be reused. Note that the analysis in the previous section without explicitly indicating the employed evaluation strategy assumes single-evaluation.

Sudholt and Thyssen (2012), for an ACO with single-evaluation solving stochastic shortest path problems, constructed an example graph to show that exponential running time is required for approximating real shortest paths. The difficulty is because once a path is luckily evaluated to have a relatively small length due to noise, it will always be preferred and make the ACO get stuck in an inferior solution. By using reevaluation instead of single-evaluation when evaluating the best-so-far path, the ACO can easily solve the example graph (Doerr et al., 2012a). Reevaluation has also been employed for EAs solving noisy multiobjective optimization problems (e.g., Buche et al., 2002; Park and Ryu, 2011; Fieldsend and Everson, 2015).

Intuitively, reevaluation can smooth noise and thus could be better for noisy optimizations, but it also increases the fitness evaluation cost and thus increases the running time. Its usefulness was not clear.

In this section we compare these two options for the ()-EA, solving the OneMax problem under one-bit noise to show whether reevaluation is useful. Note that for one-bit noise, pn controls the noise strength, that is, noise becomes stronger as pn gets larger, and it is also the parameter of the PNT. In the following analysis, let indicate any polynomial of n.

Theorem 5:

For the ()-EA with mutation probability solving the OneMax problem under one-bit noise, if using single-evaluation, the PNT is .

The theorem is straightforwardly derived from the following two lemmas. Lemma 26 tells us the expected running time upper bound , which implies that the expected running time is polynomial if , i.e., . Lemma 27 tells us the lower bound , which implies that the running time is superpolynomial if , i.e., . By combining these results, we get that the maximum noise strength allowing polynomial expected running time is , i.e., the PNT is .

Lemma 11:

For the ()-EA using single-evaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is upper-bounded by .

Proof:

Let L denote the noisy fitness value of the current solution x. Because the ()-EA does not accept a solution with a smaller fitness (step 4 of Algorithm 1) and does not reevaluate the fitness of the current solution x, will never decrease. By applying the fitness level technique (Wegener, 2002; Sudholt, 2013), we first analyze the expected steps until L increases when starting from (denoted by ) and then sum them up to get an upper bound for the expected steps until L reaches the maximum value n. For , we analyze the probability P that L increases in two steps when , then . Note that one-bit noise can make L be , or , where is the number of 1 bits. When analyzing the noisy fitness of the offspring in each step, we need to first consider bitwise mutation on x and then one random bit flip for noise.

When , , L or .

(1) For , , since it is sufficient to flip one 0 bit for mutation and one 0 bit for noise in the first step, or flip one 0 bit for mutation and no bit for noise in the first step and flip one 0 bit for mutation and no bit for noise in the second step.

(2) For , since it is sufficient to flip no bit for mutation and one 0 bit for noise, or flip one 0 bit for mutation and no bit for noise in the first step.

(3) For , since it is sufficient to flip no bit for mutation and no bit or one 0 bit for noise in the first step.

Thus, for these three cases, we have
formula

When , or 1. By considering cases 2 and 3, we get the same lower bound for P.

When and the optimal solution has not been found, or . By considering cases 1 and 2, we get .

Based on this analysis, we get that the expected steps until are at most
formula

When , or n. The equality means that the optimal solution has been found. Because we are to get an upper bound for the expected running time of finding , we can pessimistically assume that . Starting from and (i.e., the current solution has one bits and the fitness is n), it will always keep in such a situation before finding , and the optimal solution can be generated and accepted in one step only through flipping the unique 0 bit for mutation and no bit for noise, which happens with probability . This implies that the expected steps for finding the optimal solution are at most .

Thus, the total expected running time is upper-bounded by .

Lemma 12:

For the ()-EA using single-evaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is lower-bounded by .

Proof:
Assume that the number of 1 bits of the initial solution x is less than , that is, . Let T denote the running time of finding the optimal solution when starting from x. Denote A as the event that in the evolutionary process, any solution with is never found. By the law of total expectation, we have
formula
We are first to show that . Let denote an evolutionary path from x to the optimal solution , which satisfies that . Then, is the sum of the probabilities of all possible such l. For any such l, there must exist a corresponding set of paths , in which the first solutions of any path are the same as that of l and the mth solution has number of 1 bits. Let q denote the probability of the subpath , and let . Then, . The probability of mutating from to ym is at least , and the acceptance probability of ym is at least , which is reached when and . Thus, we have
formula
Moreover, for any two different paths , it must hold that . Thus, . Because , we get . Then,
formula
We are then to derive a lower bound on . We further divide the running time T into two parts: the running time until finding a solution with for the first time (denoted by T1), and the remaining running time for finding the optimal solution (denoted by T2). Thus, we have
formula
For , when finding a solution with for the first time, we consider the case that the fitness is evaluated as n, which happens with probability . If it happens, because of the single-evaluation strategy, the solution will always have number of 1 bits and its fitness will always be n. From the upper-bound analysis in Lemma 26, we know that the probability of generating and accepting the optimal solution in one step in such a situation is . Thus,
formula
which implies that , and thus .

Because the initial solution is uniformly distributed over , we have . Thus, the expected running time of the whole process is lower-bounded by , i.e., .

Note that when , the derived lower bound would be quite loose. Thus, for filling up this gap, we are to derive another lower bound that does not depend on pn. From Droste et al. (2002, Lemma 23), we know that the expected running time of the ()-EA to optimize linear functions with positive weights is . Their proof idea is to analyze the expected running time until all the 0 bits of the initial solution have been flipped at least once, which is obviously a lower bound on the expected running time of finding the optimal solution . Because noise will not affect this analysis process, we can directly apply their result to our setting and then get the lower bound .

By combining the derived two lower bounds, we get that the expected running time of the whole process is lower-bounded by .

We then show the PNT using reevaluation in the following theorem, which can be straightforwardly derived from Lemma 29.

Theorem 6:

For the ()-EA with mutation probability solving the OneMax problem under one-bit noise, if using reevaluation, the PNT is .

Lemma 13 (Droste, 2004):

For the ()-EA using reevaluation with mutation probability on the OneMax problem under one-bit noise, the expected running time is polynomial when , and superpolynomial when .

4.2.2  Threshold Selection

During the process of evolutionary optimization, most of the improvements in one generation are small. When using reevaluation, because of noisy fitness evaluation, a considerable portion of these improvements are not real, where a worse solution appears to have a better fitness and then survives to replace the true better solution which appears to have a worse fitness. This may mislead the search direction of EAs and slow down the efficiency of EAs or make the EAs get trapped in the local optimal solution (see Section 4.2.1). To deal with this problem, a selection strategy for EAs handling noise was proposed (Markon et al., 2001; Bartz-Beielstein, 2005a), namely, threshold selection, where an offspring solution will be accepted only if its fitness is larger than the parent solution by at least a predefined threshold .

formula

For example, for the ()-EA with threshold selection as in Algorithm 3, step 4 changes to be “if ” rather than “if ” in Algorithm 1. Such a strategy can reduce the risk of accepting a bad solution due to noise. Although the good local performance (i.e., the progress of one step) of EAs with threshold selection has been shown on some problems (Markon et al., 2001; Bartz-Beielstein and Markon, 2002; Bartz-Beielstein, 2005b), its usefulness for the global performance (i.e., the running time until finding the optimal solution) of EAs under noise is not yet clear.

In this section we analyze the running time of the ()-EA with threshold selection solving OneMax under one-bit noise to see whether threshold selection is useful. Note that the analysis here assumes reevaluation. This is because using single-evaluation and threshold selection simultaneously will lead to infinite expected running time for any noise strength , as shown in the following theorem.

Theorem 7:

For the ()-EA with mutation probability