## Abstract

In real-world optimization tasks, the objective (i.e., fitness) function evaluation is often disturbed by noise due to a wide range of uncertainties. Evolutionary algorithms are often employed in noisy optimization, where reducing the negative effect of noise is a crucial issue. Sampling is a popular strategy for dealing with noise: to estimate the fitness of a solution, it evaluates the fitness multiple () times independently and then uses the sample average to approximate the true fitness. Obviously, sampling can make the fitness estimation closer to the true value, but also increases the estimation cost. Previous studies mainly focused on empirical analysis and design of efficient sampling strategies, while the impact of sampling is unclear from a theoretical viewpoint. In this article, we show that sampling can speed up noisy evolutionary optimization exponentially via rigorous running time analysis. For the (11)-EA solving the OneMax and the LeadingOnes problems under prior (e.g., one-bit) or posterior (e.g., additive Gaussian) noise, we prove that, under a high noise level, the running time can be reduced from exponential to polynomial by sampling. The analysis also shows that a gap of one on the value of for sampling can lead to an exponential difference on the expected running time, cautioning for a careful selection of . We further prove by using two illustrative examples that sampling can be more effective for noise handling than parent populations and threshold selection, two strategies that have shown to be robust to noise. Finally, we also show that sampling can be ineffective when noise does not bring a negative impact.

## 1 Introduction

In many real-world optimization tasks, the exact objective (i.e., fitness) evaluation of candidate solutions is almost impossible, while we can obtain only a noisy one. Evolutionary algorithms (EAs) (Bäck, 1996) are general-purpose optimization algorithms inspired from natural phenomena, and have been widely and successfully applied to solve noisy optimization problems (Jin and Branke, 2005; Bianchi et al., 2009; Zeng et al., 2015). During evolutionary optimization, handling noise in fitness evaluation is very important, since noise may mislead the search direction and then deteriorate the efficiency of EAs. Many studies thus have focused on reducing the negative effect of noise in evolutionary optimization (Arnold, 2002; Beyer, 2000; Jin and Branke, 2005).

One popular way to cope with noise in fitness evaluation is sampling (Arnold and Beyer, 2006), which, instead of evaluating the fitness of one solution only once, evaluates the fitness times and then uses the average to approximate the true fitness. Sampling obviously can reduce the standard deviation of the noise by a factor of , while also increasing the computation cost times. This makes the fitness estimation closer to the true value, but computationally more expensive. In order to reduce the sampling cost as much as possible, many smart sampling approaches have been proposed, including adaptive (Aizawa and Wah, 1994; Stagge, 1998) and sequential (Branke and Schmidt, 2003; 2004) methods, which dynamically decide the size of for each solution in each generation.

The impact of sampling on the convergence of EAs in noisy optimization has been empirically and theoretically investigated (Gutjahr, 2003; Arnold and Beyer, 2006; Heidrich-Meisner and Igel, 2009; Rolet and Teytaud, 2010). On the running time, a more practical performance measure for how soon an algorithm can solve a problem, previous experimental studies have reported conflicting conclusions. In Aizawa and Wah (1994), it was shown that sampling can speed up a standard genetic algorithm on two test functions, while in Cantú-Paz (2004), sampling led to a larger computation time for a simple generational genetic algorithm on the OneMax function. However, little work has been done on theoretically analyzing the impact of sampling on the running time. Thus, there are many fundamental theoretical issues on sampling that have not been addressed, for example, if sampling can reduce the running time of EAs from exponential to polynomial in noisy environments, and if sampling will increase the running time in some cases.

The running time is usually counted by the number of fitness evaluations needed to find an optimal solution for the first time, because the fitness evaluation is deemed as the most costly computational process (Droste et al., 2002; Yu and Zhou, 2008; Qian et al., 2015b). Rigorous running time analysis has been a leading theoretical aspect for randomized search heuristics (Neumann and Witt, 2010; Auger and Doerr, 2011). Recently, progress has been made on the running time analysis of EAs. Numerous analytical results for EAs solving synthetic problems as well as combinatorial problems have been reported, for example, Neumann and Witt (2010) and Auger and Doerr (2011). Meanwhile, general running time analysis approaches have also been proposed, for example, drift analysis (He and Yao, 2001; Doerr, Johannsen, and Winzen, 2012; Doerr and Goldberg, 2013), fitness-level methods (Wegener, 2002; He and Yao, 2003; Sudholt, 2013; Dang and Lehre, 2015b), and switch analysis (Yu et al., 2015; Yu and Qian, 2015). However, most of them focus on noise-free environments, where the fitness evaluation is exact.

For EAs in noisy environments, few results have been reported on running time analysis. Droste (2004) first analyzed the (11)-EA on the OneMax problem in the presence of one-bit noise and showed the maximal noise level allowing a polynomial running time, where the noise level is characterized by the noise probability and is the problem size. This result was later extended to the LeadingOnes problem and to many different noise models in Gießen and Kötzing (2016), which also proved that small populations of size can make elitist EAs; that is, (+1)-EA and (1+)-EA, perform well in high noise levels. The robustness of populations to noise was also proved in the setting of non-elitist EAs with mutation only (Dang and Lehre, 2015a) or uniform crossover only (Prugel-Bennett et al., 2015). However, Friedrich et al. (2015) showed the limitation of parent populations to cope with noise by proving that the (+1)-EA needs super-polynomial time for solving OneMax in the presence of additive Gaussian noise with . This difficulty can be overcome by the compact genetic algorithm (cGA) (Friedrich et al., 2015) and a simple Ant Colony Optimization (ACO) algorithm (Friedrich et al., 2016), both of which find the optimal solution in polynomial time with a high probability. Recently, Qian et al. (in press) proved that the threshold selection strategy is also robust to noise: the expected running time of the (11)-EA using threshold selection on OneMax in the presence of one-bit noise is always polynomial regardless of the noise level. They also showed the limitation of threshold selection under asymmetric one-bit noise and further proposed smooth threshold selection, which can overcome the difficulty. Note that there was also a sequence of papers analyzing the running time of ACO on single destination shortest paths (SDSP) problems with edge weights disturbed by noise (Sudholt and Thyssen, 2012; Doerr, Hota, and Kötzing, 2012; Feldmann and Kötzing, 2013).

In addition to the above results, there exist two other pieces of work on running time analysis in noisy evolutionary optimization that involve sampling. Akimoto et al. (2015) proved that sampling with a large enough can make optimization under additive unbiased noise behave as optimization in a noise-free environment, and thus concluded that noisy optimization using sampling can be solved in running time, where is the noise-free running time. A similar result was also achieved for an adaptive Pareto sampling (APS) algorithm solving bi-objective optimization problems under additive Gaussian noise (Gutjahr, 2012). These results, however, do not describe any impact of sampling on the running time, because they do not compare the running time in noisy optimization without sampling.

In this article, we show that sampling can speed up noisy evolutionary optimization exponentially via rigorous running time analysis. For the (11)-EA solving the OneMax and the LeadingOnes problems under prior (e.g., one-bit) or posterior (e.g., additive Gaussian) noise, we prove that the running time is exponential when the noise level is high (i.e., Theorems ^{12}, ^{15}, ^{17}, ^{20}), while sampling can reduce the running time to be polynomial (i.e., Theorems ^{14}, ^{16}, Corollaries 1, 2). Particularly, for the (11)-EA solving OneMax under one-bit noise with , the analysis also shows that a gap of one on the value of for sampling can lead to an exponential difference on the expected running time (i.e., Theorems ^{13}, ^{14}), which reveals that a careful selection of is important for the effectiveness of sampling.

As previous studies (Qian et al., in press; Gießen and Kötzing, 2016) have shown that parent populations and threshold selection can bring about robustness to noise, we also compare sampling with these two strategies. On the OneMax problem under additive Gaussian noise with , the (+1)-EA needs super-polynomial time (Friedrich et al., 2015) (i.e., Theorem ^{23}), while the (11)-EA using sampling can solve the problem in polynomial time (i.e., Corollary ^{19}). On the OneMax problem under asymmetric one-bit noise with , the (11)-EA using threshold selection needs at least exponential time (Qian et al., in press) (i.e., Theorem ^{24}), while the (11)-EA using sampling can solve it in time (i.e., Theorem ^{25}). Therefore, these results show that sampling can be more tolerant of noise than parent populations and threshold selection, respectively.

Finally, for the (11)-EA solving the Trap problem under additive Gaussian noise, we prove that noise does not bring a negative impact. Under the assumption that the positive impact of noise increases with the noise level, we conjecture that sampling is ineffective in this case since it will decrease the noise level. The conjecture is verified by experiments. Note that the conjecture is consistent with that in Qian et al. (in press). In that work, it is hypothesized that the impact of noise is correlated with the problem hardness: when the problem is EA-hard (He and Yao, 2004) with respect to a specific EA (e.g., the Trap problem for the (11)-EA), noise can be helpful and does not need to be handled, but when the problem is EA-easy (He and Yao, 2004), noise can be harmful and needs to be tackled.

This article extends our preliminary work (Qian et al., 2014) and improves one previous statement. In Qian et al. (2014), we proved a sufficient condition under which sampling is ineffective, and applied it to the cases that the (11)-EA solving OneMax and Trap under additive Gaussian noise. The proof assumed the monotonicity of a quantity. By finding that an upper/lower-bound of the quantity is monotonic, we hypothesized that the quantity itself is also monotonic. Considering that this property does not always hold, we have corrected our previous statement on the OneMax problem by proving that sampling with a moderate sample size is possible to exponentially reduce the running time of the (11)-EA from no sampling (i.e., Theorem ^{17}, Corollary ^{19}). Meanwhile, both analysis and experiments (i.e., Section 6) show that sampling is ineffective on the Trap problem.

The rest of this article is organized as follows. Section 2 introduces some preliminaries. The robustness analysis of sampling to prior and posterior noise is presented in Sections 3 and 4, respectively. Section 5 compares sampling with the other two strategies, parent populations and threshold selection, on the robustness to noise. Section 6 gives a case where sampling is ineffective. Section 7 concludes the article.

## 2 Preliminaries

In this section, we first introduce the noise models, problems, and evolutionary algorithms studied in this article, respectively, and then describe the sampling strategy, and finally present the analysis tools that we use throughout this article.

### 2.1 Noise Models

Noise models can be generally divided into two categories: prior and posterior (Jin and Branke, 2005; Gießen and Kötzing, 2016). For prior noise, the noise comes from the variation on a solution instead of the evaluation process. One-bit noise as presented in Definition ^{1} is a representative one, which flips a random bit of a solution before evaluation with probability . For posterior noise, the noise comes from the variation on the fitness of a solution. A representative model is additive Gaussian noise as presented in Definition ^{2}, which adds a value drawn from a Gaussian distribution. Both one-bit noise and additive Gaussian noise have been widely used in previous empirical and theoretical studies (e.g., Beyer, 2000; Droste, 2004; Jin and Branke, 2005; Gießen and Kötzing, 2016). In this article, we will also use these two kinds of noise models.

In addition to the above noises, we also consider a variant of one-bit noise called asymmetric one-bit noise (Qian et al., in press), in Definition ^{3}. For the flipping of asymmetric one-bit noise on a solution , if , a random 1 bit is flipped; if , a random 0 bit is flipped; otherwise, the probability of flipping a specific 0 bit is , and the probability of flipping a specific 1 bit is , where is the number of 0-bits of . Note that for one-bit noise, the probability of flipping any specific bit is .

### 2.2 Optimization Problems

As most theoretical analyses of EAs start from simple synthetic problems, we also use two well-known test functions OneMax and LeadingOnes, which have been widely studied in both noise-free (e.g., He and Yao, 2001; Droste et al., 2002; Sudholt, 2013) and noisy (e.g., Droste, 2004; Dang and Lehre, 2015a; Gießen and Kötzing, 2016) evolutionary optimization.

The OneMax problem as presented in Definition ^{4} aims to maximize the number of 1-bits of a solution. Its optimal solution is (briefly denoted as ) with the function value . It has been shown that the expected running time of the (11)-EA on OneMax is (Droste et al., 2002).

The LeadingOnes problem as presented in Definition ^{5} aims to maximize the number of consecutive 1-bits counting from the left of a solution. Its optimal solution is with the function value . It has been proved that the expected running time of the (11)-EA on LeadingOnes is (Droste et al., 2002).

We will also use an EA-hard problem Trap in Definition ^{6}, the aim of which is to maximize the number of 0-bits of a solution except for the optimal solution . Its optimal function value is , and the function value for any non-optimal solution is not larger than 0. The expected running time of the (11)-EA on Trap has been proven to be (Droste et al., 2002).

### 2.3 Evolutionary Algorithms

In this article, we consider the (11)-EA as described in Algorithm 1, which is a simple EA for maximizing pseudo-Boolean problems over . The (11)-EA reflects the common structure of EAs. It maintains only one solution (i.e., the population size is 1), and repeatedly improves the current solution by using bit-wise mutation (i.e., step 3) and selection (i.e., steps 4 and 5). The (11)-EA has been widely used in the running time analysis of EAs (see Neumann and Witt, 2010; Auger and Doerr, 2011).

**Algorithm 1** [(11)-EA. *Given a function over to be maximized, it consists of the following steps:*

*uniformly randomly selected from .**Repeat until the termination condition is met**flip each bit of independently with probability .**if**.*

For the (11)-EA in noisy environments, only a noisy fitness value is available, and thus step 4 of Algorithm 1 changes to be “if ”. Note that we assume that the reevaluation strategy is used (as in Droste, 2004; Doerr, Hota, and Kötzing, 2012; Gießen and Kötzing, 2016), that is, when accessing the fitness of a solution, it is always calculated by sampling a new random variate, or drawing a new random single-bit mask. For example, for the (11)-EA, both and will be evaluated and reevaluated in each iteration. The running time in noisy optimization is usually defined as the number of fitness evaluations needed to find an optimal solution w.r.t. the true fitness function for the first time (Droste, 2004; Akimoto et al., 2015; Gießen and Kötzing, 2016).

In noisy optimization, a worse solution may appear to have a “better” fitness and then survive to replace the true better solution which has a “worse” fitness. This may mislead the search direction of EAs, and then deteriorate the efficiency of EAs. To deal with this problem, a selection strategy for EAs handling noise was proposed (Markon et al., 2001; Bartz-Beielstein, 2005).

**threshold selection:**an offspring solution will be accepted only if its fitness is larger than the parent solution by at least a predefined threshold .

For example, when using threshold selection, the 4th step of the (11)-EA in Algorithm 1 changes to be “if ” rather than “if ”. Such a strategy can reduce the risk of accepting a bad solution due to noise. In Qian et al. (in press), it has been proved that threshold selection with can make the (11)-EA solve the OneMax problem in polynomial time even if one-bit noise occurs with probability 1.

### 2.4 Sampling

In noisy evolutionary optimization, sampling as described in Definition ^{7} has often been used to reduce the negative effect of noise (Aizawa and Wah, 1994; Stagge, 1998; Branke and Schmidt, 2003; 2004). It approximates the true fitness using the average of a number of random evaluations. Sampling can estimate the true fitness more accurately. For example, the output fitness by sampling under additive Gaussian noise can be represented by with , that is, sampling reduces the variance of noise by a factor of . However, the computation time for the fitness estimation of a solution is also increased by times.

For the (11)-EA using sampling, the 4th step of Algorithm 1 changes to be “if ”. Note that is equivalent to that sampling, is not used.

### 2.5 Analysis Tools

To derive running time bounds in this article, we first model EAs as Markov chains, and then use a variety of drift theorems.

The evolution process usually goes forward only based on the current population, thus, an EA can be modeled as a Markov chain (e.g., in He and Yao (2001) and Yu and Zhou (2008)) by taking the EA’s population space as the chain’s state space; that is, . Note that the population space consists of all possible populations. Let denote the set of all optimal populations, which contain at least one optimal solution. The goal of the EA is to reach from an initial population. Thus, the process of an EA seeking can be analyzed by studying the corresponding Markov chain with the optimal state space . Note that we consider the discrete state space (i.e., is discrete) in this article.

Given a Markov chain and , we define its *first hitting time* (FHT) as a random variable such that . That is, is the number of steps needed to reach the optimal space for the first time starting from . The mathematical expectation of , , is called the *expected first hitting time* (EFHT) of this chain starting from . If is drawn from a distribution , is called the EFHT of the Markov chain over the initial distribution . Thus, the expected running time of the corresponding EA starting from is equal to , where and are the number of fitness evaluations for the initial population and each iteration, respectively. For example, for the (11)-EA using sampling, and due to the reevaluation strategy. Note that when involving the expected running time of an EA on a problem in this article, it is the expected running time starting from a uniform initial distribution ; that is, .

Thus, in order to analyze the expected running time of EAs, we just need to analyze the EFHT of the corresponding Markov chains. In the following, we introduce the drift theorems which will be used to derive the EFHT of Markov chains in the article.

Drift analysis was first introduced to the running time analysis of EAs by He and Yao (2001). Since then, it has become a popular tool in this field, and many variants have been proposed (e.g., in Doerr, Johannsen, and Winzen (2012) and Doerr and Goldberg (2013)). In this article, we will use its additive (i.e., Lemma ^{8}) as well as multiplicative (i.e., Lemma ^{9}) version. To use them, a function has to be constructed to measure the distance of a state to the optimal state space . The distance function satisfies that and . Then, we need to investigate the progress on the distance to in each step; that is . For additive drift analysis (i.e., Lemma ^{8}), an upper bound of the EFHT can be derived through dividing the initial distance by a lower bound of the progress. Multiplicative drift analysis (i.e., Lemma ^{9}) is much easier to use when the progress is roughly proportional to the current distance to the optimum.

The simplified drift theorem (Oliveto and Witt, 2011, 2012) as presented in Lemma ^{10} was proposed to prove exponential lower bounds on the FHT of Markov chains, where is usually represented by a mapping of . It requires two conditions: a constant negative drift and exponentially decaying probabilities of jumping towards or away from the goal state. To relax the requirement of a constant negative drift, the simplified drift theorem with self-loops (Rowe and Sudholt, 2014) as presented in Lemma ^{11} has been proposed, which takes into account large self-loop probabilities.

## 3 Robustness to Prior Noise

In this section, by comparing the expected running time of the (11)-EA with or without sampling for solving the OneMax and the LeadingOnes problems under one-bit noise, we show the robustness of sampling to prior noise.

### 3.1 The OneMax Problem

One-bit noise with is considered here. We first analyze the case in which sampling is not used. Note that Droste (2004) proved that the expected running time is super-polynomial for . Gießen and Kötzing (2016) have recently re-proved the super-polynomial lower bound for by using the simplified drift theorem (Oliveto and Witt, 2011; 2012). However, their proof does not cover . Here, we use the simplified drift theorem with self-loops (Rowe and Sudholt, 2014) to prove the lower bound of the exponential running time for as shown in Theorem ^{12}.

For the (11)-EA solving the OneMax problem under one-bit noise with , the expected running time is exponential.

We use Lemma ^{11} to prove this theorem. Let be the number of 0-bits of the solution after iterations of the (11)-EA. We consider the interval , i.e., the parameters (i.e., the global optimum) and in Lemma ^{11}.

We then analyze the probabilities for . Let denote the probability that the offspring solution generated by bit-wise mutation has number of 0-bits. Note that one-bit noise with makes the noisy fitness and the true fitness of a solution have a gap of one; that is, . For a solution with , with a probability of ; otherwise, . Let and denote the current solution and the offspring solution, respectively.

(1) When , . Thus, the offspring will be discarded in this case, which implies that .

(2) When , the offspring solution will be accepted if and only if , the probability of which is , since it needs to flip one 0-bit of and flip one 1-bit of in noise. Thus, .

(3) When , will be accepted if and only if , the probability of which is , since it needs to flip one 0-bit of and flip one 1-bit of in noise. Thus, .

(4) When , will be rejected if and only if , the probability of which is , since it needs to flip one 1-bit of and flip one 0-bit of in noise. Thus, .

(5) When , . Thus, the offspring will always be accepted in this case, which implies that .

^{9}in Paixão et al. (2015))

^{11}, we also need to analyze the probability for . We have It is easy to verify that . Thus, , which implies that condition 1 of Lemma

^{11}holds.

^{11}, we need to compare with for . We rewrite as , and show that condition 2 holds with and . For , it trivially holds, because . For , according to the analysis on , we have where the first inequality is because for decreasing the number of 0-bits by at least in mutation, it is necessary to flip at least 0-bits. Furthermore, we have where the last inequality holds with . Thus, which implies that condition 2 of Lemma

^{11}holds.

Note that . Thus, by Lemma ^{11}, the probability that the running time is when starting from a solution with is exponentially small. Due to the uniform initial distribution, the probability that the initial solution has is exponentially small by Chernoff’s inequality. Thus, the expected running time is exponential.

Then, we analyze the case in which sampling with is used. The expected running time is still exponential, as shown in Theorem ^{13}. The proof is very similar to that of Theorem ^{12}. The change of the probabilities led by increasing from 1 to 2 does not affect the application of the simplified drift theorem with self-loops (i.e., Lemma ^{11}). The detailed proofs are shown in the supplementary material due to space limitations.

For the (11)-EA solving the OneMax problem under one-bit noise with , if using sampling with , the expected running time is exponential.

We have shown that sampling with is not effective. In the following, we prove that increasing from 2 to 3 can reduce the expected running time to be polynomial as shown in Theorem ^{14}, the proof of which is accomplished by applying multiplicative drift analysis (Doerr, Johannsen, and Winzen, 2012).

For the (11)-EA solving the OneMax problem with under one-bit noise with , if using sampling with , the expected running time is .

We use Lemma ^{9} to prove this theorem. We first construct a distance function as , where is the number of 0-bits of the solution . It is easy to verify that and .

^{12}. Note that for a solution , the fitness value output by sampling with is the average of noisy fitness values output by three independent fitness evaluations; that is, .

(1) When , . Thus, the offspring will be discarded, then we have .

(2) When , will be accepted if and only if , the probability of which is , since it needs to always flip one 0-bit of and flip one 1-bit of in three noisy fitness evaluations. Thus, .

(3) When , there are three possible cases for the acceptance of : , and . The probability of is , since it needs to always flip one 0-bit of in three noisy evaluations. The probability of is , since it needs to flip one 0-bit of in two noisy evaluations and flip one 1-bit in the other noisy evaluation. Similarly, we can derive that the probabilities of and are and , respectively. Thus, .

(4) When , there are three possible cases for the rejection of : , and . The probability of is , since it needs to always flip one 1-bit of in three noisy evaluations. The probability of is , since it needs to flip one 1-bit of in two noisy evaluations and flip one 0-bit in the other evaluation. Similarly, we can derive that the probabilities of and are and , respectively. Thus, .

(5) When , . Thus, will always be accepted, then we have .

Thus, we have shown that sampling is robust to noise for the (11)-EA solving the OneMax problem in the presence of one-bit noise. By comparing Theorem ^{13} with Theorem ^{14}, we also find that a gap of one on the value of can lead to an exponential difference on the expected running time, which reveals that a careful selection of is important for the effectiveness of sampling. The complexity transition from to is because sampling with can make false progress (i.e., accepting solutions with more 0-bits) dominated by true progress (i.e., accepting solutions with fewer 0-bits), while sampling with is not sufficient.

We have also conducted experiments to complement the theoretical results, which give bounds only. For each value of and , we run the (11)-EA 1000 times independently. In each run, we record the number of fitness evaluations until an optimal solution w.r.t. the true fitness function is found for the first time. Then the total number of evaluations of the 1000 runs are averaged as the estimation of the expected running time, called as the estimated ERT. We will always compute the estimated ERT in this way for the experiments throughout this article.

We estimate the expected running time of the (11)-EA using sampling with from 1 to 30. The results for are plotted in Figure 1. We can observe that the curves are high at and drop suddenly at , which is consistent with our theoretical results in Theorems ^{12}–^{14}. Note that the curves grow linearly since , which is because ERT EFHT (i.e., the number of fitness evaluations in each iteration the number of iterations), and when the noise has been sufficiently reduced by sampling, the number of iterations cannot further reduce as increases, but the sampling cost increases linearly with .

### 3.2 The LeadingOnes Problem

One-bit noise with is considered here. For the case in which sampling is not used, Gießen and Kötzing (2016) have proved the exponential running time lower bound as shown in Theorem ^{15}. We prove in Theorem ^{16} that sampling can reduce the expected running time to be polynomial.

(Gießen and Kötzing, 2016) For the (11)-EA solving the LeadingOnes problem under one-bit noise with , the expected running time is .

For the (11)-EA solving the LeadingOnes problem under one-bit noise with , if using sampling with , the expected running time is .

We use Lemma ^{8} to prove this theorem. Let denote the number of leading 1-bits of a solution . We first construct a distance function as . It is easy to verify that and .

Then, we analyze for any with . For the current solution , assume that (where ). Let be the offspring solution produced by mutating . We consider three mutation cases for :

(1) The -th leading 1-bit is flipped and the first leading 1-bits remain unchanged, which leads to . Thus, .

(2) The -th bit (which must be 0) is flipped and the first leading 1-bits remain unchanged, which leads to . Thus, we have .

(3) The first bits remain unchanged, which leads to . Thus, .

Assume that . We then analyze the acceptance probability of , i.e., . Note that , where is the fitness output by one independent noisy evaluation. By one-bit noise with , the value can be calculated as follows:

(1) The noise does not occur, whose probability is . Thus, .

(2) The noise occurs, the probability of which is .

(2.1) It flips the -th leading 1-bit, then . Thus, we have .

(2.2) It flips the -th bit, which leads to . Thus, we have . Note that reaches the minimum when has 0-bit at position , and reaches the maximum when has all 1-bits since position .

(2.3) Otherwise, remains unchanged. Thus, we have .

^{8}holds with . We can get, noting that , that is, the expected iterations of the (11)-EA for finding the optimal solution are upper bounded by . Because the expected running time is i.e., (the number of fitness evaluations in each iteration) the expected iterations and , we conclude that the expected running time is .

## 4 Robustness to Posterior Noise

In the above section, we have shown that sampling can be robust to one-bit noise (a kind of prior noise) for the (11)-EA solving the OneMax and the LeadingOnes problems. In this section, by comparing the expected running time of the (11)-EA with or without sampling for solving OneMax and LeadingOnes under additive Gaussian noise, we will prove that sampling can also be robust to posterior noise.

### 4.1 The OneMax Problem

Additive Gaussian noise with is considered here. We first analyze the case in which sampling is not used. By applying the original simplified drift theorem (Oliveto and Witt, 2011; 2012), we prove that the expected running time is exponential, as shown in Theorem ^{17}.

For the (11)-EA solving the OneMax problem under additive Gaussian noise with , the expected running time is exponential.

^{10}to prove this theorem. Let be the number of 0-bits of the solution after iterations of the (11)-EA. We consider the interval , i.e., the parameters and in Lemma

^{10}. Then, we analyze the drift for . Let denote the probability that the next solution after bit-wise mutation and selection has number of 0-bits (i.e., ), and let denote the probability that the offspring solution generated by bit-wise mutation has number of 0-bits (i.e., ). Then, we have, for , where and . We thus have Let . Then, , where the first inequality is by , and the last one is obtained by calculating the CDF of the standard normal distribution. Furthermore, , and . Applying these probability bounds to Eq. (7), we have Thus, , which implies that condition 1 of Lemma

^{10}holds. For condition 2, we need to investigate . Because it is necessary to flip at least bits, we have which implies that condition 2 of Lemma

^{10}holds with and . Note that . Thus, by Lemma

^{10}, the expected running time is exponential.

Note that Friedrich et al. (2015) have proved that for solving OneMax under additive Gaussian noise with , the classical (+1)-EA needs super-polynomial expected running time. Our result in Theorem ^{17} is complementary to their result with , since it covers a constant variance. We then prove in Corollary ^{19} that using sampling can reduce the expected running time to be polynomial. The proof idea is that sampling with a large enough can reduce the noise to be , which allows a polynomial running time, as shown in the following lemma. In the following analysis, let indicate any polynomial of .

(Gießen and Kötzing, 2016) Suppose posterior noise, sampling from some distribution with variance . Then we have that the (11)-EA optimizes OneMax in polynomial time if .

For the (11)-EA solving the OneMax problem under additive Gaussian noise with and , if using sampling with , the expected running time is polynomial.

The noisy fitness is , where . The fitness output by sampling is , where . Thus, , where . That is, sampling reduces the variance of noise to be . Because , we have . By Lemma ^{18}, the expected iterations of the (11)-EA for finding the optimal solution is polynomial. We know that the expected running time is the expected iterations. Since , the expected running time is polynomial.

Thus, the comparison between Theorem ^{17} and Corollary ^{19} correct our previous statement in Qian et al. (2014), that sampling is ineffective for the (11)-EA solving OneMax under additive Gaussian noise. We have conducted experiments to complement the theoretical results, which give bounds only. For the additive Gaussian noise, we set and . The results for are plotted in Figure 2. Note that the point with in the figure corresponds to the ERT without sampling.

From Figure 2(b and c), we can observe that the ERT has a fast drop at the beginning of the curve, reaches the minimum at a small sample size, and consistently grows after that. The minimum is much smaller than the value at , thus it is clear that a moderate sampling can reduce the running time from no sampling, which is consistent with our theoretical result. However, in Figure 2(a), the ERT always increases with , which is similar to what was observed in Figure 1 in Qian et al. (2014). The setting in Qian et al. (2014) is , , and . A too small (e.g., ) makes the decrease of the number of iterations easily dominated by the increase of , therefore, we did not observe the dropping stage of the curve.

### 4.2 The LeadingOnes Problem

Additive Gaussian noise with is considered here. We first analyze the case in which sampling is not used. Using the original simplified drift theorem (Oliveto and Witt, 2011; 2012), we prove that the expected running time is exponential, as shown in Theorem ^{20}.

For the (11)-EA solving the LeadingOnes problem under additive Gaussian noise with , the expected running time is exponential.

^{10}to prove this theorem. Let be the number of 0-bits of the solution after iterations of the (11)-EA. As in the proof of Theorem

^{17}, we have For , we use . For , we consider possible cases such that one 1-bit is flipped and the other bits remain unchanged, the probability of which is . Let and denote the current solution and the offspring solution, respectively. Let denote the number of leading 1-bits of ; that is, . The noisy fitness of is , where . Then, the acceptance probability of in these cases can be calculated as follows:

(1) If the flipped 1-bit is the -th leading 1-bit, , where and . Thus, the acceptance probability is , where .

(2) Otherwise, . Thus, the acceptance probability is .

^{10}holds. As in the proof of Theorem

^{17}, it is easy to verify that condition 2 of Lemma

^{10}holds with and . Thus, we can conclude that the expected running time is exponential.

We then prove in Corollary ^{22} that using sampling can reduce the expected running time to be polynomial. The idea is that sampling with a large enough can reduce the noise to be , which allows a polynomial running time, as shown in the following lemma.

(Gießen and Kötzing, 2016) Suppose posterior noise, sampling from some distribution with variance . Then we have that the (11)-EA optimizes LeadingOnes in time if .

For the (11)-EA solving the LeadingOnes problem under additive Gaussian noise with and , if using sampling with , the expected running time is polynomial.

As in the proof of Corollary ^{19}, sampling reduces the variance of noise to be . Because , we have