## Abstract

Recently, ant colony optimization (ACO) algorithms have proven to be efficient in uncertain environments, such as noisy or dynamically changing fitness functions. Most of these analyses have focused on combinatorial problems such as path finding. We rigorously analyze an ACO algorithm optimizing linear pseudo-Boolean functions under additive posterior noise. We study noise distributions whose tails decay exponentially fast, including the classical case of additive Gaussian noise. Without noise, the classical EA outperforms any ACO algorithm, with smaller being better; however, in the case of large noise, the EA fails, even for high values of (which are known to help against small noise). In this article, we show that ACO is able to deal with *arbitrarily large* noise in a graceful manner; that is, as long as the evaporation factor is small enough, dependent on the variance of the noise and the dimension *n* of the search space, optimization will be successful. We also briefly consider the case of *prior noise* and prove that ACO can also efficiently optimize linear functions under this noise model.

## 1 Introduction

Ant colony optimization (ACO) is a metaheuristic for designing randomized general-purpose optimization algorithms inspired by the foraging behavior of ant colonies. ACO has been successfully applied as a heuristic technique for solving combinatorial optimization problems.

In real-world optimization problems, sometimes a large degree of uncertainty is present, due to the complexity of candidate solution generation, noisy measurement processes, and rapidly changing problem environments. Empirically, ACO seems particularly well-suited to uncertain problems, due to its dynamic and distributed nature, and in some cases it can outperform classical state-of-the-art approaches on dynamic network routing problems (Di Caro et al., 2008). We will focus on a version of the *Max–Min Ant System* (MMAS; Stützle and Hoos, 2000) applied to pseudo-Boolean optimization (i.e., optimization where solutions are coded as bit strings).

Jin and Branke (2005) surveyed a number of sources of uncertainty that randomized search heuristics must often deal with in practice: (1) noisy objective functions, (2) dynamically changing problems, (3) approximation errors in the objective function, and (4) a requirement that an optimal solution must be robust to changes in design variables and environmental parameters that occur after optimization is complete. Arguably, the two most important sources of uncertainty are (1) and (2) above, namely, *stochastic* problems and *dynamic* problems (see also Bianchi et al., 2009, for a recent survey). In stochastic problems, the objective function value of a search point follows a random distribution, and that distribution does not change over time. In dynamic problems, the evaluation of fitness is deterministic but changes over time.

In order to address these practical issues, the theoretical analysis of randomized search heuristics under uncertainty has recently gained momentum. For example, a number of recent studies have rigorously analyzed the performance of evolutionary algorithms in stochastic environments (Dang and Lehre, 2014; Gießen and Kötzing, 2014). For ant colony optimization, a series of articles have considered the performance of ACO on single-destination shortest paths (SDSP) problems with stochastic weights. This work was initiated by Sudholt and Thyssen (2012) and later followed up by Doerr et al. (2012a), who showed that by augmenting the ant system with a reevaluation strategy on the best-so-far solution, many of the difficulties with noise discovered by Sudholt and Thyssen could be overcome. Feldmann and Kötzing (2013) showed that an ant system that uses a fitness-proportional update rule (called *MMAS-fp*) can efficiently optimize SDSP on graphs with stochastic weights. MMAS-fp is closer to systems that are used by practitioners (Stützle and Hoos, 2000) and is the ant system variant that we analyze in the present study.

For the optimization of functions over bit strings, analyses of ACO suggest that it often performs worse than evolutionary algorithms (EAs) and simple hill-climbers in a noise-free setting (Kötzing et al., 2011). On the other hand, ACO can outperform EAs on dynamic problems (Kötzing and Molter, 2012; Lissovoi and Witt, in press). So far, the question of how robust ACO is to noisy evaluation on pseudo-Boolean optimization remains unanswered.

The goal of this study is to observe the robustness of ACO to noise on a class of simple objective functions. In particular, we are interested in the *scalability* of the run time of the algorithm as a function of noise intensity (measured by variance). We study the algorithm on linear functions; a linear function maps a bit string *x* to , with a predefined sequence of weights *w _{i}*. If we set all weights to 1, we recover the definition of the important test function. Note that we only allow positive weights, as fitness-proportional pheromone updates rely on nonnegative fitness (and weights of 0 do not contribute to the fitness).

The main result of this article is given in Section 3, where we show that robustness can be achieved for additive posterior noise from a Gaussian distribution; for all variances , there is a parameter setting such that MMAS-fp on is successful in time . Thus, we say that MMAS-fp handles Gaussian noise *gracefully*. Such a graceful scaling *cannot* be achieved by the EA, as was shown by Friedrich et al. (2015a).

Additionally, we show that a similar statement holds for an ACO algorithm that imposes max–min bounds on the pheromone values. Rather than relying on the standard technique of *pheromone freezing*, employed by many early works on the mathematical analysis of ACO algorithms, we instead achieve this by showing that all pheromones *drift* in the right direction (in particular, they will never be close to any pheromone bound on the wrong side of the spectrum); see Corollary 7.

In Section 4 we extend our findings to other noise models and show that we can also achieve the same robustness in the presence of other additive noise distributions (which fulfill a certain restriction), as well as with the model of prior noise from Droste (2004). We discuss our findings and conclude the report in Section 5.

This article extends an earlier conference version (Friedrich et al., 2015b) by generalizing the results on to arbitrary linear functions.

## 2 Preliminaries

We consider the optimization of pseudo-Boolean functions—that is, functions , which we call *fitness functions*. In the following, *n* always denotes the dimension of the solution space . For any bit string , the function value is called the *fitness of x*, and we are interested in finding a solution such that is maximal.

The fitness functions we are going to investigate are linear, following the notation of Droste et al. (2002)—that is, given a vector *w* of length *n* of positive weights, , where ; thus, the all-ones string is the global optimum. In this study, index variables *i* refer to the set if not stated otherwise.

For a given weight vector *w*, we let denote the minimum weight, the maximum weight, and the sum of all weights. We let denote the number of zeros in *x* and denote the sum of weights *w _{i}* with . Analogously, we define and . Note that for all

*x*.

Throughout the article, we consider a noisy version of *f*. The noise added to the actual fitness will be, at first, a normally distributed random variable *D* with mean 0 and variance . More formally, for a given deterministic fitness function *f*, with .

We say that an event *A* occurs *with high probability* if only occurs with a probability less than the inverse of any polynomial—that is, for any polynomial .

### 2.1 Algorithms

Our main algorithm of interest is MMAS-fp (Alg. 1), an ACO algorithm with a so-called *fitness-proportional* update rule.

To find an optimal solution, MMAS-fp maintains a vector , the components of which are called *pheromones*, where and are the lower and upper bounds for each pheromone, respectively; we will assume . Further, we let with denote the pheromone vector in the *t*-th iteration of the algorithm.

MMAS-fp starts off with every pheromone and iteratively generates solutions until is optimal. These solutions are sampled according to in the manner . Thus, our pheromone vector corresponds to the probabilities of sampling 0s, and we desire the pheromones to go *down*. This is equivalent to a definition of using a simulated ant walk on the construction graph shown in Figure 1, where choosing an edge means that and choosing an edge means that . Our pheromones correspond to the pheromones on the edges .

*unbiased*in the sense of Lehre and Witt (2010).

The variable is the so-called *evaporation factor*; is a parameter of the algorithm. Intuitively, regulates the impact of the fitness of a sampled solution on the corresponding pheromone update. To maintain meaningful values of and sensible updates, it must hold that for all *x*.

We assume that MMAS-fp knows *W* in its update rule for theoretical purposes. In reality, one either has to have an upper bound on *W* or choose accordingly. We further assume that *W* is a polynomial in *n* and that , so that a *polynomial* run time is possible with high probability. The proofs given could be extended to arbitrary values of *W*, but this would need case distinctions with respect to the growth of the weights and would make the proofs overly technical.

Note that MMAS-fp does not keep a best-so-far solution but, instead, always updates its pheromone vector regardless of the quality of the solution sampled. The update is proportional to the fitness of the sampled solution, hence the suffix *fp*. If an updated pheromone is not capped due to the pheromone bounds or *u*, we speak of a *normal* update.

In the following, we will denote a pheromone vector simply as if we are not interested in *t*. Likewise, *x* denotes the solution sampled by , and denotes the pheromone vector after the update according to *x*.

The other algorithm we consider is ACO-fp. It is a special case of MMAS-fp where and *u* = 1; that is, only trivial pheromone bounds are enforced.

### 2.2 Tools Used

The proofs in this article rely heavily on *drift theory*, a tool that bounds the hitting times of stochastic processes with a bias (called *drift*).

We first state one drift theorem, as well as two concentration bounds regarding drift, that we will use in our proofs.

The following proposition gives tail bounds for our noise by using standard estimates of the Q-function (Madhow, 2008).

## 3 Gaussian Noise

In this section we are going to bound the run time of MMAS-fp for optimizing , which will be denoted as *f*, for convenience, in the proofs but not in the statements of the lemmas and so forth.

First off, we bound the noise and thus the difference of two consecutive pheromones with high probability. Since the noise is unbounded, an update can exceed the pheromone bounds by a large amount. By bounding the noise with high probability, we make sure that such updates happen only very rarely. We then set such that it can compensate for likely noise levels.

In the following we always assume, for any , that we choose such that holds for all *x* with high probability, and we let . Note that *s* is the greatest difference between two consecutive pheromones, as stated by Lemma 5.

For pheromones in , every update is normal with high probability. However, since the update is done relative to and *s* is an upper bound, can be quite pessimistic as a bound, because will be small when it is close to . Thus, we shortly want to discuss when a *decreasing* update is normal—that is, when . Because of our previous assumption regarding , we have . So if , then . Hence, we only have to consider a nonnormal decreasing update if .

From now on, we always implicitly condition on the event that the noise does not exceed the bounds mentioned in the proof of Lemma 5, which holds with high probability in any polynomial number of steps of MMAS-fp for any of the *n* pheromone values via a simple union bound. Other runs are referred to as *fails*.

We now show that, in expectation, the result of a normal update of any pheromone will be farther away from the upper bound *u* than before, thus meaning that the pheromone drifts *away* from *u*.

Consider MMAS-fp optimizing and let a pheromone vector in some iteration of the algorithm be given. Let be given, and suppose and . Then the drift of toward 1 is at most .

From the conditions of the lemma we see that the pheromone is updated normally.

We proceed by showing that each pheromone considered in Lemma 6 cannot reach a constant value greater than within polynomial time during a nonfailing run of MMAS-fp with high probability; that is, they go in the wrong direction for this to occur. This is due to the negative drift we just proved—that is, the considered pheromones move, in expectation, down, and are thus unlikely to cover large distances going up.

Consider MMAS-fp optimizing and let be a constant. If , for any , then each pheromone, during a nonfailing run of MMAS-fp, reaches values of at least *d* in polynomial time with only superpolynomially low probability.

This means that the probability for a single pheromone starting from to reach *d* in any polynomial number of steps is superpolynomially small for , as given, if that pheromone only takes values in .

Now consider a pheromone to be in . Even the best update could not get that pheromone above . So we can apply the above argument.

Note that , as this is what we need to apply Theorem 3.

Using a union bound, we can now say that none of the *n* pheromones reaches *d* as in Corollary 7 in polynomially many steps with high probability. So we now always assume pheromones to be in if not stated otherwise.

We now consider the sum of all pheromone values and prove that it drifts toward 0; that is, all pheromone values together decrease in expectation. This does, however, not hold when *all* pheromones are very close to because, in this case, an increase of any pheromone would have a far higher impact than a decrease. The following lemma thus only holds if there is at least one pheromone with a sufficiently large value ().

Consider MMAS-fp optimizing with , for any , and . Consider further that pheromones dropped below . If at least one pheromone is at least , then the drift of the sum of all pheromones toward 0 is .

In this proof, for the drift calculations, we pessimistically assume pheromones below as having reached . The drift per pheromone is relative to the pheromones value, according to Lemma 6, but it also is positive (i.e., toward 0) as long as the pheromone is updated normally. So there can only be a negative drift (i.e., drift away from 0) for pheromone values in .

In this proof, we write and ignore constant factors, because we make sure that the overall drift will be positive toward 0 in asymptotics. Note that there cannot be any positive drift toward 0 if since none of the pheromones could drop any lower.

We split up the overall drift of the sum of all pheromones into the (positive) drift (toward 0) of the pheromones that are at least and into the (negative) drift of the *k* pheromones that are below .

Let *Y* denote the index set of those former pheromones, *Z* the index set of the latter *k*, and let *K* be the event that exactly *k* pheromones are below . The string *x ^{y}* denotes the bit string consisting only of those elements of

*x*whose index is in

*Y*, and

*x*is defined analogously with respect to

^{z}*Z*. The sum of pheromones with index in

*Y*is denoted by , and the sum of those with index in

*Z*by . Further, let denote ; let and be defined analogously.

*Y*, both the increase and decrease during an update scale with , regardless of the corresponding sampled bit

*x*. However, if a bit

_{i}*x*was sampled as 0, the corresponding pheromone additionally gets an increase of . We give bounds for the expected values needed. and Let

_{i}*a*be a variable ranging over

*Y*;

*j*ranges over all pheromone indices. Let denote the Iverson bracket (indicator function). We make use of Corollary 7. Now we can calculate the desired drift. For the negative drift, we have to look at and . All of the

*k*pheromones where the corresponding bit position was sampled as 1 cannot drop any lower due to our assumption. The remainder of the

*k*pheromones get a normal update in the best case: Thus, we get: The remaining estimations follow analogously to the ones beforehand: Note that

*a*now refers to indices in

*Z*. So we get the negative drift: Now we can look at the general drift: which we want to be positive.

*a*ranges over indices in

*Y*, and

*j*ranges over all indices. so we just check that the following holds: This holds, because we assume . So we can write the drift as .

Overall, we finally get a drift of .

We are now ready to give the main theorem, which shows efficient optimization given additive posterior noise. It follows from an application of the Multiplicative Drift Theorem (Theorem 1) with its respective concentration bound (Theorem 2).

Consider MMAS-fp with parameters , for any , and optimizing . Then, for any , the algorithm finds the optimum after steps with high probability.

Let be the time until all pheromones are below . That means that has dropped below *n*.

According to Lemma 8, there is a positive drift of the pheromone sum toward 0 of at least .

*T*using Theorem 1. Note that we have to scale all values by

*n*to do so. This, however, does not affect the drift.

Now assume that all pheromones have dropped below . The probability to sample the optimum now is . Thus, the optimum will be sampled in expectation in many tries.

The overall expected run time is bounded above by the following worst case scenario: The algorithm needs steps until all pheromones drop below . Then the algorithm does not sample the optimum, and all pheromones are set back to (that does not actually happen, but we are looking at a worst case). As we mentioned before, in this scenario, the algorithm would need tries to sample the optimum, once the pheromones are low enough. Hence, the algorithm has to be restarted times. This results in an expected run time in .

We can easily make a union bound over the run failing via too large noise (as bounded in the proof of Lemma 5) and the run failing because of the drift concentrations (per pheromone or for the sum). Since all of these probabilities are superpolynomially small, the overall failing probability of the algorithm is superpolynomially small—that is, the algorithm succeeds with high probability.

As a direct corollary, we get that we can disregard pheromone bounds (i.e., setting these bounds to 0 and 1) and still get the same result. Intuitively, this holds because we have drift in the right direction in each pheromone at any time (while still not close to the target). Recall that ACO-fp is a special case of MMAS-fp (Algorithm 1) with trivial pheromone bounds.

Consider ACO-fp optimizing . If , for any , then, for any , the algorithm finds the optimum after steps with high probability.

Recall that all previous proofs are actually proofs for ACO-fp as well. We can thus argue analogously as in the proof of Theorem 9. The upper bound *d* is at most 1 because . The lower bound of for ACO-fp satisfies as well.

## 4 Other Noise Models

In this section we consider the optimization of linear functions perturbed by other noise than additive posterior Gaussian noise.

### 4.1 Posterior Noise

We start with a generalization of Corollary 10 by taking non-Gaussian noise into account; that is, we optimize *f _{D}*, with and

*D*being a random variable (possibly not Gaussian).

The idea behind why the following noise models still do not harm optimization is that we can, again, bound the noise with high probability as we already did in the proof of Lemma 5.

The drifts of Lemmas 6 and 8 hold here as well, since the noise is additive and posterior, and thus cancels out in expectation.

To be able to use Corollary 7, we need to bound with high probability. Because of Equation (1) we can do so, and it follows that any pheromone reaches values of at least a constant with superpolynomially low probability for .

We can now use an argumentation analogous to that in Theorem 9, proving the corollary.

A similar corollary holds for ACO-fp.

We can use the same proof as for Corollary 11 and argue as in the proof of Corollary 10 that we do not need the special pheromone bounds.

### 4.2 Prior Noise

In this section, we have a look at the noise model from Droste (2004), where , whereas with probability , and with for each such with probability . That means that, with probability *p*, a single bit in *x* gets flipped chosen uniformly at random.

Note that we can bound in the same way as we did in the proof of Lemma 5, since . Since the noise is not additive, we do *not* have to bound with respect to the noise.

Because we assume again that , we have normal updates for pheromones at least in and at most , this time with probability 1.

The following proofs are all similar to the ones from Section 3, and the order of argumentation is the same. We first prove that individual pheromones, when updated normally, drift away from *u*. This results in an upper bound on each pheromone that holds with high probability. We then, again, look at the sum of all pheromones and prove that it drifts (under sufficient conditions) toward 0. Applying Theorem 1 finally yields the desired run time.

Consider MMAS-fp optimizing *f _{p}* and let a pheromone vector in some iteration of the algorithm be given. Let be given and suppose and . Then the drift of toward 1 is at most .

*A*denote the event that

*x*was sampled as 1. Note that is updated normally.

_{i}Let *P* denote the event that during the evaluation of *f*—that is, a bit flipped—and let *j* and *k* range from 1 to *n*.

Consider MMAS-fp optimizing *f _{p}*, and let be a constant. If , then each pheromone, during a nonfailing run of MMAS-fp, reaches values of at least

*d*in polynomial time with only superpolynomially low probability.

*T*as the hitting time of one pheromone reaching a value of at least

*d*:

The probability for a single pheromone starting from to reach *d* in any polynomial number of steps is superpolynomially small for .

Again, a union bound argument gives us the guarantees of Corollary 14 for all pheromones.

Consider MMAS-fp optimizing *f _{p}* with and . Consider further that pheromones dropped below . If at least one pheromone is at least , then the drift of the sum of all pheromones toward 0 is .

We are going to use the same notation as in the proof of Lemma 8. *P* shall, again, denote the event that the bit flip occurred.

This is basically the same expression as in the proof of Lemma 8, and the drift is again . So we can conclude analogously.

We can now state that MMAS-fp and ACO-fp are able to optimize efficiently, given prior noise.

Consider MMAS-fp or ACO-fp with parameters and optimizing *f _{p}*. Then, for any , both algorithms find the optimum after steps with high probability.

The argumentation is exactly as in the proof of Theorem 9, but now we use Lemmas 13 and 15, and Corollary 14. The value of has to be changed accordingly using Corollary 14.

## 5 Discussion and Summary

In this work we saw that two simple ACO algorithms on linear pseudo-Boolean functions scale gracefully with noise for Gaussian distributions—that is, the run time depends only linearly on the variance of the noise. We get similar results for many other noise models and for different ACO algorithms, suggesting that ACO algorithms are generally good for dealing with noise (at least in settings where the underlying fitness function is simple enough, as in the case of linear functions). Many of these settings are not solvable by simple hill climbers (Droste, 2004).

The analysis of metaheuristics such as ACO on noisy fitness functions is of particular interest because this is a specific area where only very few tailored approaches exist for designing efficient algorithms. Our proofs give insight into why ACO is robust to various noise models: By choosing the evaporation factor to be small enough, the noise cannot harm an update with high probability. In the end, better solutions turn out to be better in expectation, and thus, optimization succeeds.

One drawback of the ACO algorithms analyzed in this report is that the variance of the noise must be known in order to correctly set the evaporation factor . This problem can be bypassed as follows: Guess a variance of 1 and run the algorithm until it has a constant success probability if the guess was correct. If the optimum was not found so far, double the guess and repeat. This standard doubling scheme leads to a *noise-oblivious* ACO algorithm with an expected run time of at most a constant factor away from the ACO that knows the noise in advance.

## Acknowledgments

The research leading to these results received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant Agreement No. 618091 (SAGE). We thank the anonymous reviewers of the conference version as well as the anonymous reviewers of the journal version for their helpful comments, which significantly improved the quality of this article.