Abstract

In real-world optimization tasks, the objective (i.e., fitness) function evaluation is often disturbed by noise due to a wide range of uncertainties. Evolutionary algorithms are often employed in noisy optimization, where reducing the negative effect of noise is a crucial issue. Sampling is a popular strategy for dealing with noise: to estimate the fitness of a solution, it evaluates the fitness multiple () times independently and then uses the sample average to approximate the true fitness. Obviously, sampling can make the fitness estimation closer to the true value, but also increases the estimation cost. Previous studies mainly focused on empirical analysis and design of efficient sampling strategies, while the impact of sampling is unclear from a theoretical viewpoint. In this article, we show that sampling can speed up noisy evolutionary optimization exponentially via rigorous running time analysis. For the (11)-EA solving the OneMax and the LeadingOnes problems under prior (e.g., one-bit) or posterior (e.g., additive Gaussian) noise, we prove that, under a high noise level, the running time can be reduced from exponential to polynomial by sampling. The analysis also shows that a gap of one on the value of for sampling can lead to an exponential difference on the expected running time, cautioning for a careful selection of . We further prove by using two illustrative examples that sampling can be more effective for noise handling than parent populations and threshold selection, two strategies that have shown to be robust to noise. Finally, we also show that sampling can be ineffective when noise does not bring a negative impact.

1  Introduction

In many real-world optimization tasks, the exact objective (i.e., fitness) evaluation of candidate solutions is almost impossible, while we can obtain only a noisy one. Evolutionary algorithms (EAs) (Bäck, 1996) are general-purpose optimization algorithms inspired from natural phenomena, and have been widely and successfully applied to solve noisy optimization problems (Jin and Branke, 2005; Bianchi et al., 2009; Zeng et al., 2015). During evolutionary optimization, handling noise in fitness evaluation is very important, since noise may mislead the search direction and then deteriorate the efficiency of EAs. Many studies thus have focused on reducing the negative effect of noise in evolutionary optimization (Arnold, 2002; Beyer, 2000; Jin and Branke, 2005).

One popular way to cope with noise in fitness evaluation is sampling (Arnold and Beyer, 2006), which, instead of evaluating the fitness of one solution only once, evaluates the fitness times and then uses the average to approximate the true fitness. Sampling obviously can reduce the standard deviation of the noise by a factor of , while also increasing the computation cost times. This makes the fitness estimation closer to the true value, but computationally more expensive. In order to reduce the sampling cost as much as possible, many smart sampling approaches have been proposed, including adaptive (Aizawa and Wah, 1994; Stagge, 1998) and sequential (Branke and Schmidt, 2003; 2004) methods, which dynamically decide the size of for each solution in each generation.

The impact of sampling on the convergence of EAs in noisy optimization has been empirically and theoretically investigated (Gutjahr, 2003; Arnold and Beyer, 2006; Heidrich-Meisner and Igel, 2009; Rolet and Teytaud, 2010). On the running time, a more practical performance measure for how soon an algorithm can solve a problem, previous experimental studies have reported conflicting conclusions. In Aizawa and Wah (1994), it was shown that sampling can speed up a standard genetic algorithm on two test functions, while in Cantú-Paz (2004), sampling led to a larger computation time for a simple generational genetic algorithm on the OneMax function. However, little work has been done on theoretically analyzing the impact of sampling on the running time. Thus, there are many fundamental theoretical issues on sampling that have not been addressed, for example, if sampling can reduce the running time of EAs from exponential to polynomial in noisy environments, and if sampling will increase the running time in some cases.

The running time is usually counted by the number of fitness evaluations needed to find an optimal solution for the first time, because the fitness evaluation is deemed as the most costly computational process (Droste et al., 2002; Yu and Zhou, 2008; Qian et al., 2015b). Rigorous running time analysis has been a leading theoretical aspect for randomized search heuristics (Neumann and Witt, 2010; Auger and Doerr, 2011). Recently, progress has been made on the running time analysis of EAs. Numerous analytical results for EAs solving synthetic problems as well as combinatorial problems have been reported, for example, Neumann and Witt (2010) and Auger and Doerr (2011). Meanwhile, general running time analysis approaches have also been proposed, for example, drift analysis (He and Yao, 2001; Doerr, Johannsen, and Winzen, 2012; Doerr and Goldberg, 2013), fitness-level methods (Wegener, 2002; He and Yao, 2003; Sudholt, 2013; Dang and Lehre, 2015b), and switch analysis (Yu et al., 2015; Yu and Qian, 2015). However, most of them focus on noise-free environments, where the fitness evaluation is exact.

For EAs in noisy environments, few results have been reported on running time analysis. Droste (2004) first analyzed the (11)-EA on the OneMax problem in the presence of one-bit noise and showed the maximal noise level allowing a polynomial running time, where the noise level is characterized by the noise probability and is the problem size. This result was later extended to the LeadingOnes problem and to many different noise models in Gießen and Kötzing (2016), which also proved that small populations of size can make elitist EAs; that is, (+1)-EA and (1+)-EA, perform well in high noise levels. The robustness of populations to noise was also proved in the setting of non-elitist EAs with mutation only (Dang and Lehre, 2015a) or uniform crossover only (Prugel-Bennett et al., 2015). However, Friedrich et al. (2015) showed the limitation of parent populations to cope with noise by proving that the (+1)-EA needs super-polynomial time for solving OneMax in the presence of additive Gaussian noise with . This difficulty can be overcome by the compact genetic algorithm (cGA) (Friedrich et al., 2015) and a simple Ant Colony Optimization (ACO) algorithm (Friedrich et al., 2016), both of which find the optimal solution in polynomial time with a high probability. Recently, Qian et al. (in press) proved that the threshold selection strategy is also robust to noise: the expected running time of the (11)-EA using threshold selection on OneMax in the presence of one-bit noise is always polynomial regardless of the noise level. They also showed the limitation of threshold selection under asymmetric one-bit noise and further proposed smooth threshold selection, which can overcome the difficulty. Note that there was also a sequence of papers analyzing the running time of ACO on single destination shortest paths (SDSP) problems with edge weights disturbed by noise (Sudholt and Thyssen, 2012; Doerr, Hota, and Kötzing, 2012; Feldmann and Kötzing, 2013).

In addition to the above results, there exist two other pieces of work on running time analysis in noisy evolutionary optimization that involve sampling. Akimoto et al. (2015) proved that sampling with a large enough can make optimization under additive unbiased noise behave as optimization in a noise-free environment, and thus concluded that noisy optimization using sampling can be solved in running time, where is the noise-free running time. A similar result was also achieved for an adaptive Pareto sampling (APS) algorithm solving bi-objective optimization problems under additive Gaussian noise (Gutjahr, 2012). These results, however, do not describe any impact of sampling on the running time, because they do not compare the running time in noisy optimization without sampling.

In this article, we show that sampling can speed up noisy evolutionary optimization exponentially via rigorous running time analysis. For the (11)-EA solving the OneMax and the LeadingOnes problems under prior (e.g., one-bit) or posterior (e.g., additive Gaussian) noise, we prove that the running time is exponential when the noise level is high (i.e., Theorems 12151720), while sampling can reduce the running time to be polynomial (i.e., Theorems 1416, Corollaries 1, 2). Particularly, for the (11)-EA solving OneMax under one-bit noise with , the analysis also shows that a gap of one on the value of for sampling can lead to an exponential difference on the expected running time (i.e., Theorems 1314), which reveals that a careful selection of is important for the effectiveness of sampling.

As previous studies (Qian et al., in press; Gießen and Kötzing, 2016) have shown that parent populations and threshold selection can bring about robustness to noise, we also compare sampling with these two strategies. On the OneMax problem under additive Gaussian noise with , the (+1)-EA needs super-polynomial time (Friedrich et al., 2015) (i.e., Theorem 23), while the (11)-EA using sampling can solve the problem in polynomial time (i.e., Corollary 19). On the OneMax problem under asymmetric one-bit noise with , the (11)-EA using threshold selection needs at least exponential time (Qian et al., in press) (i.e., Theorem 24), while the (11)-EA using sampling can solve it in time (i.e., Theorem 25). Therefore, these results show that sampling can be more tolerant of noise than parent populations and threshold selection, respectively.

Finally, for the (11)-EA solving the Trap problem under additive Gaussian noise, we prove that noise does not bring a negative impact. Under the assumption that the positive impact of noise increases with the noise level, we conjecture that sampling is ineffective in this case since it will decrease the noise level. The conjecture is verified by experiments. Note that the conjecture is consistent with that in Qian et al. (in press). In that work, it is hypothesized that the impact of noise is correlated with the problem hardness: when the problem is EA-hard (He and Yao, 2004) with respect to a specific EA (e.g., the Trap problem for the (11)-EA), noise can be helpful and does not need to be handled, but when the problem is EA-easy (He and Yao, 2004), noise can be harmful and needs to be tackled.

This article extends our preliminary work (Qian et al., 2014) and improves one previous statement. In Qian et al. (2014), we proved a sufficient condition under which sampling is ineffective, and applied it to the cases that the (11)-EA solving OneMax and Trap under additive Gaussian noise. The proof assumed the monotonicity of a quantity. By finding that an upper/lower-bound of the quantity is monotonic, we hypothesized that the quantity itself is also monotonic. Considering that this property does not always hold, we have corrected our previous statement on the OneMax problem by proving that sampling with a moderate sample size is possible to exponentially reduce the running time of the (11)-EA from no sampling (i.e., Theorem 17, Corollary 19). Meanwhile, both analysis and experiments (i.e., Section 6) show that sampling is ineffective on the Trap problem.

The rest of this article is organized as follows. Section 2 introduces some preliminaries. The robustness analysis of sampling to prior and posterior noise is presented in Sections 3 and 4, respectively. Section 5 compares sampling with the other two strategies, parent populations and threshold selection, on the robustness to noise. Section 6 gives a case where sampling is ineffective. Section 7 concludes the article.

2  Preliminaries

In this section, we first introduce the noise models, problems, and evolutionary algorithms studied in this article, respectively, and then describe the sampling strategy, and finally present the analysis tools that we use throughout this article.

2.1  Noise Models

Noise models can be generally divided into two categories: prior and posterior (Jin and Branke, 2005; Gießen and Kötzing, 2016). For prior noise, the noise comes from the variation on a solution instead of the evaluation process. One-bit noise as presented in Definition 1 is a representative one, which flips a random bit of a solution before evaluation with probability . For posterior noise, the noise comes from the variation on the fitness of a solution. A representative model is additive Gaussian noise as presented in Definition 2, which adds a value drawn from a Gaussian distribution. Both one-bit noise and additive Gaussian noise have been widely used in previous empirical and theoretical studies (e.g., Beyer, 2000; Droste, 2004; Jin and Branke, 2005; Gießen and Kötzing, 2016). In this article, we will also use these two kinds of noise models.

Definition 1 (One-Bit Noise):
Given a parameter , let and denote the noisy and true fitness of a binary solution , respectively, then
formula
where is generated by flipping a uniformly randomly chosen bit of .
Definition 2 (Additive Gaussian Noise):
Given a Gaussian distribution , let and denote the noisy and true fitness of a solution , respectively, then
formula
where is randomly drawn from , denoted by .

In addition to the above noises, we also consider a variant of one-bit noise called asymmetric one-bit noise (Qian et al., in press), in Definition 3. For the flipping of asymmetric one-bit noise on a solution , if , a random 1 bit is flipped; if , a random 0 bit is flipped; otherwise, the probability of flipping a specific 0 bit is , and the probability of flipping a specific 1 bit is , where is the number of 0-bits of . Note that for one-bit noise, the probability of flipping any specific bit is .

Definition 3 (Asymmetric One-Bit Noise):
Given a parameter , let and denote the noisy and true fitness of a binary solution , respectively, then with probability , otherwise , where is generated by flipping the -th bit of , and is a uniformly randomly chosen position of
formula

2.2  Optimization Problems

As most theoretical analyses of EAs start from simple synthetic problems, we also use two well-known test functions OneMax and LeadingOnes, which have been widely studied in both noise-free (e.g., He and Yao, 2001; Droste et al., 2002; Sudholt, 2013) and noisy (e.g., Droste, 2004; Dang and Lehre, 2015a; Gießen and Kötzing, 2016) evolutionary optimization.

The OneMax problem as presented in Definition 4 aims to maximize the number of 1-bits of a solution. Its optimal solution is (briefly denoted as ) with the function value . It has been shown that the expected running time of the (11)-EA on OneMax is (Droste et al., 2002).

Definition 4 (OneMax):
The OneMax Problem of size is to find an bits binary string such that
formula

The LeadingOnes problem as presented in Definition 5 aims to maximize the number of consecutive 1-bits counting from the left of a solution. Its optimal solution is with the function value . It has been proved that the expected running time of the (11)-EA on LeadingOnes is (Droste et al., 2002).

Definition 5 (LeadingOnes):
The LeadingOnes Problem of size is to find an bits binary string such that
formula

We will also use an EA-hard problem Trap in Definition 6, the aim of which is to maximize the number of 0-bits of a solution except for the optimal solution . Its optimal function value is , and the function value for any non-optimal solution is not larger than 0. The expected running time of the (11)-EA on Trap has been proven to be (Droste et al., 2002).

Definition 6 (Trap):
The Trap Problem of size is to find an bits binary string such that, let ,
formula

2.3  Evolutionary Algorithms

In this article, we consider the (11)-EA as described in Algorithm 1, which is a simple EA for maximizing pseudo-Boolean problems over . The (11)-EA reflects the common structure of EAs. It maintains only one solution (i.e., the population size is 1), and repeatedly improves the current solution by using bit-wise mutation (i.e., step 3) and selection (i.e., steps 4 and 5). The (11)-EA has been widely used in the running time analysis of EAs (see Neumann and Witt, 2010; Auger and Doerr, 2011).

Algorithm 1 [(11)-EA. Given a function over to be maximized, it consists of the following steps:

  1. uniformly randomly selected from .

  2. Repeat until the termination condition is met

  3.     flip each bit of independently with probability .

  4.    if

  5.       .

For the (11)-EA in noisy environments, only a noisy fitness value is available, and thus step 4 of Algorithm 1 changes to be “if ”. Note that we assume that the reevaluation strategy is used (as in Droste, 2004; Doerr, Hota, and Kötzing, 2012; Gießen and Kötzing, 2016), that is, when accessing the fitness of a solution, it is always calculated by sampling a new random variate, or drawing a new random single-bit mask. For example, for the (11)-EA, both and will be evaluated and reevaluated in each iteration. The running time in noisy optimization is usually defined as the number of fitness evaluations needed to find an optimal solution w.r.t. the true fitness function for the first time (Droste, 2004; Akimoto et al., 2015; Gießen and Kötzing, 2016).

In noisy optimization, a worse solution may appear to have a “better” fitness and then survive to replace the true better solution which has a “worse” fitness. This may mislead the search direction of EAs, and then deteriorate the efficiency of EAs. To deal with this problem, a selection strategy for EAs handling noise was proposed (Markon et al., 2001; Bartz-Beielstein, 2005).

  • threshold selection: an offspring solution will be accepted only if its fitness is larger than the parent solution by at least a predefined threshold .

For example, when using threshold selection, the 4th step of the (11)-EA in Algorithm 1 changes to be “if ” rather than “if ”. Such a strategy can reduce the risk of accepting a bad solution due to noise. In Qian et al. (in press), it has been proved that threshold selection with can make the (11)-EA solve the OneMax problem in polynomial time even if one-bit noise occurs with probability 1.

2.4  Sampling

In noisy evolutionary optimization, sampling as described in Definition 7 has often been used to reduce the negative effect of noise (Aizawa and Wah, 1994; Stagge, 1998; Branke and Schmidt, 2003; 2004). It approximates the true fitness using the average of a number of random evaluations. Sampling can estimate the true fitness more accurately. For example, the output fitness by sampling under additive Gaussian noise can be represented by with , that is, sampling reduces the variance of noise by a factor of . However, the computation time for the fitness estimation of a solution is also increased by times.

Definition 7 (Sampling):
Sampling first evaluates the fitness of a solution times independently and obtains the noisy fitness values , and then outputs their average as
formula

For the (11)-EA using sampling, the 4th step of Algorithm 1 changes to be “if ”. Note that is equivalent to that sampling, is not used.

2.5  Analysis Tools

To derive running time bounds in this article, we first model EAs as Markov chains, and then use a variety of drift theorems.

The evolution process usually goes forward only based on the current population, thus, an EA can be modeled as a Markov chain (e.g., in He and Yao (2001) and Yu and Zhou (2008)) by taking the EA’s population space as the chain’s state space; that is, . Note that the population space consists of all possible populations. Let denote the set of all optimal populations, which contain at least one optimal solution. The goal of the EA is to reach from an initial population. Thus, the process of an EA seeking can be analyzed by studying the corresponding Markov chain with the optimal state space . Note that we consider the discrete state space (i.e., is discrete) in this article.

Given a Markov chain and , we define its first hitting time (FHT) as a random variable such that . That is, is the number of steps needed to reach the optimal space for the first time starting from . The mathematical expectation of , , is called the expected first hitting time (EFHT) of this chain starting from . If is drawn from a distribution , is called the EFHT of the Markov chain over the initial distribution . Thus, the expected running time of the corresponding EA starting from is equal to , where and are the number of fitness evaluations for the initial population and each iteration, respectively. For example, for the (11)-EA using sampling, and due to the reevaluation strategy. Note that when involving the expected running time of an EA on a problem in this article, it is the expected running time starting from a uniform initial distribution ; that is, .

Thus, in order to analyze the expected running time of EAs, we just need to analyze the EFHT of the corresponding Markov chains. In the following, we introduce the drift theorems which will be used to derive the EFHT of Markov chains in the article.

Drift analysis was first introduced to the running time analysis of EAs by He and Yao (2001). Since then, it has become a popular tool in this field, and many variants have been proposed (e.g., in Doerr, Johannsen, and Winzen (2012) and Doerr and Goldberg (2013)). In this article, we will use its additive (i.e., Lemma 8) as well as multiplicative (i.e., Lemma 9) version. To use them, a function has to be constructed to measure the distance of a state to the optimal state space . The distance function satisfies that and . Then, we need to investigate the progress on the distance to in each step; that is . For additive drift analysis (i.e., Lemma 8), an upper bound of the EFHT can be derived through dividing the initial distance by a lower bound of the progress. Multiplicative drift analysis (i.e., Lemma 9) is much easier to use when the progress is roughly proportional to the current distance to the optimum.

Lemma 1 (Additive Drift Analysis (He and Yao, 2001)):
Given a Markov chain and a distance function , if for any and any with , there exists a real number such that
formula
then the EFHT satisfies that
Lemma 2 (Multiplicative Drift Analysis (Doerr, Johannsen, and Winzen, 2012)):
Given a Markov chain and a distance function , if for any and any with , there exists a real number such that
formula
then the EFHT satisfies that
formula
where .

The simplified drift theorem (Oliveto and Witt, 2011, 2012) as presented in Lemma 10 was proposed to prove exponential lower bounds on the FHT of Markov chains, where is usually represented by a mapping of . It requires two conditions: a constant negative drift and exponentially decaying probabilities of jumping towards or away from the goal state. To relax the requirement of a constant negative drift, the simplified drift theorem with self-loops (Rowe and Sudholt, 2014) as presented in Lemma 11 has been proposed, which takes into account large self-loop probabilities.

Lemma 3 (Simplified Drift Theorem (Oliveto and Witt, 2011; 2012)):
Let , , be real-valued random variables describing a stochastic process over some state space. Suppose there exists an interval , two constants and, possibly depending on , a function satisfying such that for all the following two conditions hold:
formula
Then there is a constant such that for it holds .
Lemma 4 (Simplified Drift Theorem with Self-loops (Rowe and Sudholt, 2014)):
Let , , be real-valued random variables describing a stochastic process over some state space. Suppose there exists an interval , two constants and, possibly depending on , a function satisfying such that for all the following two conditions hold:
formula
Then there is a constant such that for it holds .

3  Robustness to Prior Noise

In this section, by comparing the expected running time of the (11)-EA with or without sampling for solving the OneMax and the LeadingOnes problems under one-bit noise, we show the robustness of sampling to prior noise.

3.1  The OneMax Problem

One-bit noise with is considered here. We first analyze the case in which sampling is not used. Note that Droste (2004) proved that the expected running time is super-polynomial for . Gießen and Kötzing (2016) have recently re-proved the super-polynomial lower bound for by using the simplified drift theorem (Oliveto and Witt, 2011; 2012). However, their proof does not cover . Here, we use the simplified drift theorem with self-loops (Rowe and Sudholt, 2014) to prove the lower bound of the exponential running time for as shown in Theorem 12.

Theorem 1:

For the (11)-EA solving the OneMax problem under one-bit noise with , the expected running time is exponential.

Proof:

We use Lemma 11 to prove this theorem. Let be the number of 0-bits of the solution after iterations of the (11)-EA. We consider the interval , i.e., the parameters (i.e., the global optimum) and in Lemma 11.

Then, we analyze the drift for . Let denote the probability that the next solution after bit-wise mutation and selection has number of 0-bits (i.e., ). We thus have
formula
1

We then analyze the probabilities for . Let denote the probability that the offspring solution generated by bit-wise mutation has number of 0-bits. Note that one-bit noise with makes the noisy fitness and the true fitness of a solution have a gap of one; that is, . For a solution with , with a probability of ; otherwise, . Let and denote the current solution and the offspring solution, respectively.

(1) When , . Thus, the offspring will be discarded in this case, which implies that .

(2) When , the offspring solution will be accepted if and only if , the probability of which is , since it needs to flip one 0-bit of and flip one 1-bit of in noise. Thus, .

(3) When , will be accepted if and only if , the probability of which is , since it needs to flip one 0-bit of and flip one 1-bit of in noise. Thus, .

(4) When , will be rejected if and only if , the probability of which is , since it needs to flip one 1-bit of and flip one 0-bit of in noise. Thus, .

(5) When , . Thus, the offspring will always be accepted in this case, which implies that .

We then bound the probabilities . For , , since it is sufficient to flip 1-bits and keep other bits unchanged; , since it is necessary to flip at least 0-bits. Thus, we can upper bound as follows:
formula
For , we also need a tighter upper bound (see Lemma 9 in Paixão et al. (2015))
formula
By applying these probabilities to Eq. (1), we have
formula
To investigate the condition of Lemma 11, we also need to analyze the probability for . We have
formula
It is easy to verify that . Thus, , which implies that condition 1 of Lemma 11 holds.
For condition 2 of Lemma 11, we need to compare with for . We rewrite as , and show that condition 2 holds with and . For , it trivially holds, because . For , according to the analysis on , we have
formula
where the first inequality is because for decreasing the number of 0-bits by at least in mutation, it is necessary to flip at least 0-bits. Furthermore, we have
formula
where the last inequality holds with . Thus,
formula
which implies that condition 2 of Lemma 11 holds.

Note that . Thus, by Lemma 11, the probability that the running time is when starting from a solution with is exponentially small. Due to the uniform initial distribution, the probability that the initial solution has is exponentially small by Chernoff’s inequality. Thus, the expected running time is exponential.

Then, we analyze the case in which sampling with is used. The expected running time is still exponential, as shown in Theorem 13. The proof is very similar to that of Theorem 12. The change of the probabilities led by increasing from 1 to 2 does not affect the application of the simplified drift theorem with self-loops (i.e., Lemma 11). The detailed proofs are shown in the supplementary material due to space limitations.

Theorem 2:

For the (11)-EA solving the OneMax problem under one-bit noise with , if using sampling with , the expected running time is exponential.

We have shown that sampling with is not effective. In the following, we prove that increasing from 2 to 3 can reduce the expected running time to be polynomial as shown in Theorem 14, the proof of which is accomplished by applying multiplicative drift analysis (Doerr, Johannsen, and Winzen, 2012).

Theorem 3:

For the (11)-EA solving the OneMax problem with under one-bit noise with , if using sampling with , the expected running time is .

Proof:

We use Lemma 9 to prove this theorem. We first construct a distance function as , where is the number of 0-bits of the solution . It is easy to verify that and .

Then, we investigate for any with (i.e., ). We denote the number of 0-bits of the current solution by (where ). Let be the probability that the next solution after bit-wise mutation and selection has number of 0-bits (where ). Note that we are referring to the true number of 0-bits of a solution instead of the effective number of 0-bits after noisy evaluation. Thus,
formula
2
We then analyze for as in the proof of Theorem 12. Note that for a solution , the fitness value output by sampling with is the average of noisy fitness values output by three independent fitness evaluations; that is, .

(1) When , . Thus, the offspring will be discarded, then we have .

(2) When , will be accepted if and only if , the probability of which is , since it needs to always flip one 0-bit of and flip one 1-bit of in three noisy fitness evaluations. Thus, .

(3) When , there are three possible cases for the acceptance of : , and . The probability of is , since it needs to always flip one 0-bit of in three noisy evaluations. The probability of is , since it needs to flip one 0-bit of in two noisy evaluations and flip one 1-bit in the other noisy evaluation. Similarly, we can derive that the probabilities of and are and , respectively. Thus, .

(4) When , there are three possible cases for the rejection of : , and . The probability of is , since it needs to always flip one 1-bit of in three noisy evaluations. The probability of is , since it needs to flip one 1-bit of in two noisy evaluations and flip one 0-bit in the other evaluation. Similarly, we can derive that the probabilities of and are and , respectively. Thus, .

(5) When , . Thus, will always be accepted, then we have .

By applying these probabilities to Eq. (2), we have
formula
3
We simplify the above equation by using simple mathematical calculations.
formula
where the inequality is because .
By replacing with in the above equation, we get
formula
Thus, Eq. (3) becomes
formula
4
We then bound the three mutation probabilities , , and . For decreasing the number of 0-bits by 1 in mutation, it is sufficient to flip one 0-bit and keep other bits unchanged, thus we have . For increasing the number of 0-bits by 2, it is necessary to flip at least two 1-bits, thus we have . For increasing the number of 0-bits by 1, it needs to flip one more 1-bit than the number of 0-bits it flips; thus, we have
formula
By applying these probability bounds to Eq. (4), we have
formula
When , 1 +  and , thus we get
formula
where the first inequality is by using , and the last inequality holds with .
When , using Eq. (3), we get
formula
where the last inequality holds with .
Thus, the condition of Lemma 9 holds with . We then get, noting that and ,
formula
that is, the expected running time is upper bounded by .

Thus, we have shown that sampling is robust to noise for the (11)-EA solving the OneMax problem in the presence of one-bit noise. By comparing Theorem 13 with Theorem 14, we also find that a gap of one on the value of can lead to an exponential difference on the expected running time, which reveals that a careful selection of is important for the effectiveness of sampling. The complexity transition from to is because sampling with can make false progress (i.e., accepting solutions with more 0-bits) dominated by true progress (i.e., accepting solutions with fewer 0-bits), while sampling with is not sufficient.

We have also conducted experiments to complement the theoretical results, which give bounds only. For each value of and , we run the (11)-EA 1000 times independently. In each run, we record the number of fitness evaluations until an optimal solution w.r.t. the true fitness function is found for the first time. Then the total number of evaluations of the 1000 runs are averaged as the estimation of the expected running time, called as the estimated ERT. We will always compute the estimated ERT in this way for the experiments throughout this article.

We estimate the expected running time of the (11)-EA using sampling with from 1 to 30. The results for are plotted in Figure 1. We can observe that the curves are high at and drop suddenly at , which is consistent with our theoretical results in Theorems 1214. Note that the curves grow linearly since , which is because ERT  EFHT (i.e., the number of fitness evaluations in each iteration the number of iterations), and when the noise has been sufficiently reduced by sampling, the number of iterations cannot further reduce as increases, but the sampling cost increases linearly with .

Figure 1:

Estimated ERT for the (11)-EA using sampling on the OneMax problem under one-bit noise with .

Figure 1:

Estimated ERT for the (11)-EA using sampling on the OneMax problem under one-bit noise with .

3.2  The LeadingOnes Problem

One-bit noise with is considered here. For the case in which sampling is not used, Gießen and Kötzing (2016) have proved the exponential running time lower bound as shown in Theorem 15. We prove in Theorem 16 that sampling can reduce the expected running time to be polynomial.

Theorem 4:

(Gießen and Kötzing, 2016) For the (11)-EA solving the LeadingOnes problem under one-bit noise with , the expected running time is .

Theorem 5:

For the (11)-EA solving the LeadingOnes problem under one-bit noise with , if using sampling with , the expected running time is .

Proof:

We use Lemma 8 to prove this theorem. Let denote the number of leading 1-bits of a solution . We first construct a distance function as . It is easy to verify that and .

Then, we analyze for any with . For the current solution , assume that (where ). Let be the offspring solution produced by mutating . We consider three mutation cases for :

(1) The -th leading 1-bit is flipped and the first leading 1-bits remain unchanged, which leads to . Thus, .

(2) The -th bit (which must be 0) is flipped and the first leading 1-bits remain unchanged, which leads to . Thus, we have .

(3) The first bits remain unchanged, which leads to . Thus, .

Assume that . We then analyze the acceptance probability of , i.e., . Note that , where is the fitness output by one independent noisy evaluation. By one-bit noise with , the value can be calculated as follows:

(1) The noise does not occur, whose probability is . Thus, .

(2) The noise occurs, the probability of which is .

(2.1) It flips the -th leading 1-bit, then . Thus, we have .

(2.2) It flips the -th bit, which leads to . Thus, we have . Note that reaches the minimum when has 0-bit at position , and reaches the maximum when has all 1-bits since position .

(2.3) Otherwise, remains unchanged. Thus, we have .

For each , let be the solution which has only 1-bits except for the -th bit (i.e., ), and let be the solution with leading 1-bits and otherwise only 0-bits (i.e., ). Then we have the stochastic ordering , which implies that . We can similarly get . Thus, it is easy to see that
formula
5
Let be the probability that is generated by mutating . By combining the mutation probability with the acceptance probability, we have
formula
6
We then bound the probabilities and . First, we have
formula
where the random variable is used to represent for convenience. We then calculate the expectation and variance of and . Based on the analysis of , we can easily derive
formula
Note that the last inequalities for and hold with . Thus, we have
formula
Then, we can get the bounds on the probabilities and by Chebyshev’s inequality. Note that is integer-valued.
formula
Similarly, we have
formula
By applying these two probability bounds to Eq. (6), we have
formula
Thus, condition of Lemma 8 holds with . We can get, noting that ,
formula
that is, the expected iterations of the (11)-EA for finding the optimal solution are upper bounded by . Because the expected running time is i.e., (the number of fitness evaluations in each iteration) the expected iterations and , we conclude that the expected running time is .

4  Robustness to Posterior Noise

In the above section, we have shown that sampling can be robust to one-bit noise (a kind of prior noise) for the (11)-EA solving the OneMax and the LeadingOnes problems. In this section, by comparing the expected running time of the (11)-EA with or without sampling for solving OneMax and LeadingOnes under additive Gaussian noise, we will prove that sampling can also be robust to posterior noise.

4.1  The OneMax Problem

Additive Gaussian noise with is considered here. We first analyze the case in which sampling is not used. By applying the original simplified drift theorem (Oliveto and Witt, 2011; 2012), we prove that the expected running time is exponential, as shown in Theorem 17.

Theorem 6:

For the (11)-EA solving the OneMax problem under additive Gaussian noise with , the expected running time is exponential.

Proof:
We use Lemma 10 to prove this theorem. Let be the number of 0-bits of the solution after iterations of the (11)-EA. We consider the interval , i.e., the parameters and in Lemma 10. Then, we analyze the drift for . Let denote the probability that the next solution after bit-wise mutation and selection has number of 0-bits (i.e., ), and let denote the probability that the offspring solution generated by bit-wise mutation has number of 0-bits (i.e., ). Then, we have, for ,
formula
where and . We thus have
formula
7
Let . Then, , where the first inequality is by , and the last one is obtained by calculating the CDF of the standard normal distribution. Furthermore, , and . Applying these probability bounds to Eq. (7), we have
formula
Thus, , which implies that condition 1 of Lemma 10 holds. For condition 2, we need to investigate . Because it is necessary to flip at least bits, we have
formula
which implies that condition 2 of Lemma 10 holds with and . Note that . Thus, by Lemma 10, the expected running time is exponential.

Note that Friedrich et al. (2015) have proved that for solving OneMax under additive Gaussian noise with , the classical (+1)-EA needs super-polynomial expected running time. Our result in Theorem 17 is complementary to their result with , since it covers a constant variance. We then prove in Corollary 19 that using sampling can reduce the expected running time to be polynomial. The proof idea is that sampling with a large enough can reduce the noise to be , which allows a polynomial running time, as shown in the following lemma. In the following analysis, let indicate any polynomial of .

Lemma 5:

(Gießen and Kötzing, 2016) Suppose posterior noise, sampling from some distribution with variance . Then we have that the (11)-EA optimizes OneMax in polynomial time if .

Corollary 1:

For the (11)-EA solving the OneMax problem under additive Gaussian noise with and , if using sampling with , the expected running time is polynomial.

Proof:

The noisy fitness is , where . The fitness output by sampling is , where . Thus, , where . That is, sampling reduces the variance of noise to be . Because , we have . By Lemma 18, the expected iterations of the (11)-EA for finding the optimal solution is polynomial. We know that the expected running time is the expected iterations. Since , the expected running time is polynomial.

Thus, the comparison between Theorem 17 and Corollary 19 correct our previous statement in Qian et al. (2014), that sampling is ineffective for the (11)-EA solving OneMax under additive Gaussian noise. We have conducted experiments to complement the theoretical results, which give bounds only. For the additive Gaussian noise, we set and . The results for are plotted in Figure 2. Note that the point with in the figure corresponds to the ERT without sampling.

Figure 2:

Estimated ERT for the (11)-EA using sampling on the OneMax problem under additive Gaussian noise with and .

Figure 2:

Estimated ERT for the (11)-EA using sampling on the OneMax problem under additive Gaussian noise with and .

From Figure 2(b and c), we can observe that the ERT has a fast drop at the beginning of the curve, reaches the minimum at a small sample size, and consistently grows after that. The minimum is much smaller than the value at , thus it is clear that a moderate sampling can reduce the running time from no sampling, which is consistent with our theoretical result. However, in Figure 2(a), the ERT always increases with , which is similar to what was observed in Figure 1 in Qian et al. (2014). The setting in Qian et al. (2014) is , , and . A too small (e.g., ) makes the decrease of the number of iterations easily dominated by the increase of , therefore, we did not observe the dropping stage of the curve.

4.2  The LeadingOnes Problem

Additive Gaussian noise with is considered here. We first analyze the case in which sampling is not used. Using the original simplified drift theorem (Oliveto and Witt, 2011; 2012), we prove that the expected running time is exponential, as shown in Theorem 20.

Theorem 7:

For the (11)-EA solving the LeadingOnes problem under additive Gaussian noise with , the expected running time is exponential.

Proof:
We use Lemma 10 to prove this theorem. Let be the number of 0-bits of the solution after iterations of the (11)-EA. As in the proof of Theorem 17, we have
formula
8
For , we use . For , we consider possible cases such that one 1-bit is flipped and the other bits remain unchanged, the probability of which is . Let and denote the current solution and the offspring solution, respectively. Let denote the number of leading 1-bits of ; that is, . The noisy fitness of is , where . Then, the acceptance probability of in these cases can be calculated as follows:

(1) If the flipped 1-bit is the -th leading 1-bit, , where and . Thus, the acceptance probability is , where .

(2) Otherwise, . Thus, the acceptance probability is .

Applying these probability bounds to Eq. (8), we have
formula
where the second “” is since the term in reaches the minimum when .
Let . Using for (see Eq. (D.17) in Mohri et al. (2012)), we have
formula
where the last inequality is by . Thus, we have
formula
where the last inequality is by . Thus, , which implies that condition 1 of Lemma 10 holds. As in the proof of Theorem 17, it is easy to verify that condition 2 of Lemma 10 holds with and . Thus, we can conclude that the expected running time is exponential.

We then prove in Corollary 22 that using sampling can reduce the expected running time to be polynomial. The idea is that sampling with a large enough can reduce the noise to be , which allows a polynomial running time, as shown in the following lemma.

Lemma 6:

(Gießen and Kötzing, 2016) Suppose posterior noise, sampling from some distribution with variance . Then we have that the (11)-EA optimizes LeadingOnes in time if .

Corollary 2:

For the (11)-EA solving the LeadingOnes problem under additive Gaussian noise with and , if using sampling with , the expected running time is polynomial.

Proof:

As in the proof of Corollary 19, sampling reduces the variance of noise to be . Because