Black-box complexity theory provides lower bounds for the runtime of black-box optimizers like evolutionary algorithms and other search heuristics and serves as an inspiration for the design of new genetic algorithms. Several black-box models covering different classes of algorithms exist, each highlighting a different aspect of the algorithms under considerations. In this work we add to the existing black-box notions a new elitist black-box model, in which algorithms are required to base all decisions solely on (the relative performance of) a fixed number of the best search points sampled so far. Our elitist model thus combines features of the ranking-based and the memory-restricted black-box models with an enforced usage of truncation selection. We provide several examples for which the elitist black-box complexity is exponentially larger than that of the respective complexities in all previous black-box models, thus showing that the elitist black-box complexity can be much closer to the runtime of typical evolutionary algorithms. We also introduce the concept of p-Monte Carlo black-box complexity, which measures the time it takes to optimize a problem with failure probability at most p. Even for small p, the p-Monte Carlo black-box complexity of a function class can be smaller by an exponential factor than its typically regarded Las Vegas complexity (which measures the expected time it takes to optimize ).
Black-box models are classes of algorithms that are designed to help us understand how efficient commonly used search strategies like evolutionary algorithms (EAs) and other randomized search heuristics (RSHs) are. In simple words, the black-box complexity of a class of functions is the minimal number of function queries that is needed until, for an arbitrary member of the class, an optimal solution is queried for the first time, where the minimum is taken over all algorithms belonging to some specific class of black-box algorithms (formal definitions will be given in Section 2). The black-box models hence differ in the specifications of the algorithms under consideration. Different specifications typically yield different lower bounds for the efficiency of search heuristics. By comparing these lower bounds, we learn how certain algorithmic choices influence the running time of evolutionary algorithms. For example, if we know that for a problem the -black-box complexity is , while its unrestricted black-box complexity (i.e., its black-box complexity with respect to all black-box algorithms) is only , then the restrictions used to define account for the discrepancy in best possible runtime.
Several models exist, each designed to analyze a different aspect of search heuristics. For example, the memory-restricted model (Doerr and Winzen, 2014a; Droste et al., 2006) helps us to understand the influence of the population size on the efficiency of the search strategy, while the ranking-based black-box model (Doerr and Winzen, 2014b; Fournier and Teytaud, 2011; Teytaud and Gelly, 2006) analyzes how much a heuristic loses by not using absolute but merely relative fitness values.
Having been introduced to the evolutionary computation community by Droste et al. (2003, 2006), black-box complexity is a young but highly active area of current research efforts (Anil and Wiegand, 2009; Badkobeh et al., 2014, 2015; Doerr et al., 2014a,b, 2013; Doerr and Winzen, 2014a,b,c; Doerr, Johannsen, et al., 2011; Fournier and Teytaud, 2011; Jansen, 2015; Lehre and Witt, 2012; Rowe and Vose, 2011; Teytaud and Gelly, 2006) (each of these papers will be mentioned below). The insights from black-box complexity studies can be used to design more efficient genetic algorithms, as the recent GA from Doerr et al. (2015) shows.
We contribute to the existing literature a new model, which we call the elitist black-box model. As the name suggests, our model is designed to analyze the effect of elitist behavior on the performance of search heuristics. We do so by enforcing that the algorithms in this model maintain a population that contains only the best so-far sampled individuals. Here, is a parameter of the model (called the memory- or population-size), while quality is measured according to increasing fitness values. Note that for population size , the best so-far sampled search points may have (but need not have) different fitness values. If more than search points of current-best fitness have been sampled, only of them can be stored in the population. In other words, we include in this model only those algorithms which maintain a population size of always, which use truncation selection in the selection for replacement step, and which base all their decisions solely on relative fitness values. While the restrictions on memory size and ranking basedness have been analyzed in the isolated models mentioned above, the combination of the two has not been studied before (with the exception of our own recent work on the OneMax function (Doerr and Lengler, 2015b)), nor has been the effect of an enforced truncation selection, despite the fact that in evolutionary computation (EC) the usage of truncation selection is very common as it can be seen as a literate interpretation of the “survival of the fittest” principle in an optimization context.
We emphasize that our elitist model combines three different restrictions: one on the size of the memory, one on basing all decisions only on the ranking of the search points with respect to their fitness (and not on absolute fitness values), and one on the selection for replacement step. We further note that the term “elitist” is not used consistently in the EC literature. Some subcommunities in EC call an algorithm elitist if and only if the next generation consists of the best so-far search points (e.g., Auger and Doerr (2011, page 145)—this is the most commonly applied interpretation in the theory of EAs, and is the one that we base our nomenclature on), while others speak of an elitist algorithm if and only if the next generation contains at least one of the best so-far solutions (Auger and Doerr, 2011, page 22), and yet another group requires an elitist algorithm to keep in the population every best so-far search point (so that the next population must be larger than if there are more than search points of current-best fitness, cf. Beyer et al. (2002)). Finally, a fourth notion of elitism requires that the new population consists only of the search point of the current-best fitness value, thus the next population must be smaller than if there are less than best so-far search points (e.g., the algorithm in Yang (2007) uses such a selection).
A short version of this work has been presented at the 2015 GECCO conference in Madrid, Spain (Doerr and Lengler, 2015a).
1.1 Previous Work
Among the most important algorithmic choices in the design of evolutionary algorithms are the population size, the sampling strategies (often called variation operators), and the selection rules. Existing black-box models cover these aspects in the following way. While the memory-restricted model (Doerr and Winzen, 2014a; Droste et al., 2006) and the parallel black-box model (Badkobeh et al., 2014, 2015) analyze the influence of the population size, the unbiased model (Doerr et al., 2013; Lehre and Witt, 2012; Rowe and Vose, 2011) considers the efficiency of search strategies using only so-called unbiased variation operators. The influence of the selection rules have been analyzed in the comparison-based and ranking-based black-box model (Doerr and Winzen, 2014b; Fournier and Teytaud, 2011; Teytaud and Gelly, 2006), with a focus on not revealing full fitness information to the algorithm but rather the comparison or the ranking of search points. The idea behind the latter two models is that, in contrast to other search strategies like the physics-inspired simulated annealing, many evolutionary algorithms base their selection solely on relative and not on absolute fitness values. By providing only relative fitness values, the models aim at understanding how this worsens the performance of the algorithms, and indeed it can be shown that for some function classes the ranking-based and the comparison-based black-box complexities are larger than the unrestricted ones.
While the comparison-based and the ranking-based models provide only relative fitness values, they do not require the algorithms to always select the better ones, a strategy that is in use by many common and widely applied black-box optimization strategies such as EAs or local hill climbers such as Randomized Local Search (RLS). On the other hand, many practical algorithms intentionally keep suboptimal solutions to enhance population diversity, or to better explore the search space (Ursem, 2002; Črepinšek et al., 2013). It has been shown that in some situations, specific elitist algorithms like RLS or the EA are inferior to nonelitist algorithms (Friedrich et al., 2009; Jägersküpper and Storch, 2007; Oliveto and Zarges, 2015). In this article, we go one step further and investigate the performance of all elitist algorithms simultaneously.
1.2 Our Model, New Complexity Measures, and Results
We provide in this work a model to analyze the impact of elitist behavior on the runtime of black-box optimizers. In our elitist black-box model the population of the algorithms may contain only search points of best-so-far fitness values. That is, if the population size is , then at any point in time only the best-so-far search points (of possibly different fitness values) are allowed to be kept in the population. Ties may be broken arbitrarily. For example, if more than search points of current-best fitness have been sampled, only (an arbitrary selection of) of them can be stored in the population. All other previously sampled search points are not allowed to influence the behavior of the algorithm any more. Furthermore, we do not reveal absolute fitness values to the algorithm, thus forcing it to base all its decisions on relative fitness values.
We show (Section 3) that already for quite simple function classes there can be an exponential gap between the efficiency of elitist and nonelitist black-box algorithms. As we shall see in Theorem 5, this remains true even if we regard (1 + 1) memory-restricted unary unbiased comparison-based algorithms, which constitutes the most restrictive combination of the existing black-box models. We will see that such algorithms can crucially profit from eventually giving preference to search points of fitness inferior to that of the current best search points. We also show (Section 4) that some shortcomings of previous models can be eliminated when they are combined with an elitist selection requirement. More precisely we show that the elitist unary unbiased black-box complexity of , a classical benchmark function in evolutionary computation, is of order and thus nonpolynomial for . In contrast, the unary unbiased black-box complexity of is known to be polynomial even for extreme values of (Doerr et al., 2014a).
In previous models, the black-box complexity has been defined in a Las Vegas manner; that is, it measures the expected number of function evaluations until the algorithm hits the optimum. On the other hand, many results in the black-box complexity literature are based on algorithms that with high (or constant) probability find the optimum after a certain number of steps, and then random restarts are used to bound the expected runtime. In (the strict version of) the elitist model, algorithms are not allowed to perform random restarts since new search points can be kept in the population only if they are among the best ones sampled so far. Since this is a rather artificial problem (many real-world optimization routines make use of restarts), we introduce in this work the concept of Monte Carlo black-box complexities. Roughly speaking, the -Monte Carlo black-box runtime of a black-box algorithm on a function is the minimal number of queries needs in order to find the optimum of with probability at least . The complexity class is then derived in the usual way (cf. Section 2.1). We regard in our work both Monte Carlo complexities and standard (i.e., Las Vegas) complexities. For elitist black-box algorithms these two notions can differ substantially as we shall see in Section 3.1.
In the following we consider only discrete search spaces, and even more restrictively, only pseudo-Boolean functions . However, generalizations to nonfinite or continuous search spaces are straightforward.
2 The Elitist Black-Box Model
The elitist black-box model covers all algorithms that follow the pseudocode in Algorithm 1. To describe it in a more detailed fashion, a elitist black-box algorithm is initialized by sampling search points. We allow these search points to be sampled adaptively; that is, the -th sample may depend on the ranking of the first search points, where, obviously, by ranking we regard the ranking induced by the fitness function . To be very precise here, we note that two search points have the same rank if and only if they have the same fitness; that is, the search points of with maximal -values are rank one, the ones with second largest -values are rank two and so on.
The optimization phase proceeds in rounds. In each round a elitist black-box algorithm samples new search points from distributions that depend only on the current population and the ranking of . Note that in such an optimization step the offspring do not need to be independent of each other. Assume, for example, that we create an offspring by random crossover; that is, we take some parents from the current population and set the entries of by choosing (in an arbitrary way) some bit values from these parents; then it is allowed to also create another offspring from these parents whose entries in those positions in which the parents do not agree equal . These two offspring are obviously not independent of each other. However, we do require that the offspring are created before any evaluation of the offspring happens. That is, the -th offspring may not depend on the ranking or fitness of the first offspring. (We have decided for this version as we feel that it best captures the spirit of EAs such as the EA that can process the offspring in parallel.) When all search points have been generated, the algorithm proceeds to the selection for replacement step. In this step, the algorithm sorts the search points according to their fitness ranking, where it is free to break ties in any way. Then the new population consists of the best search points according to this ordering; that is, truncation selection is applied.
The elitist black-box model covers many common EAs such as EAs, Randomized Local Search (RLS), and other hill climbers. It does not cover algorithms with nonelitist selection rules like tournament or fitness-proportional selection.
Several extensions and variants of the model are possible, including in particular one in which the first search points cannot be sampled adaptively, where the selection has to be unbiased among search points of the same rank, where only offspring can be selected (comma strategies), or a nonranking-based version in which absolute instead of relative fitness information is provided. Note that the latter would allow for fitness-dependent mutation rates, which are excluded by the variant analyzed here. The lower bounds presented in Sections 3 and 4 actually hold for this nonranking-based model (and are thus even more powerful than such only applicable to the model described in Algorithm 1). The model can certainly also be extended to an unbiased elitist one, in which the distribution in line 7 of Algorithm 1 has to be unbiased in the sense of Lehre and Witt (2012). See Section 4 for results on the unbiased elitist model.
Note that elitist black-box algorithms covered by Algorithm 1 are memory-restricted in the sense of Doerr and Winzen (2014a) and Droste et al. (2006); that is, they cannot store any other information than the current population and its ranking. All information about previous search points (e.g., their number) has to be discarded. The (1 + 1) version of the elitist model is comparison-based (i.e., a query reveals only if an offspring has worse, equal, or better fitness than its parent), while the versions are ranking-based in the sense of Doerr and Winzen (2014b). This means that the algorithm has no information about absolute fitness values, but it knows how the fitness values of the search points compare to each other. To stress the difference between the latter two models, we remark the following: for or the ranking-based black-box model provides more information than the comparison-based one as it gives a full ranking of all current search points, while in the comparison-based we always have to select two search points which are compared against each other. The ranking-based black-box complexity can thus be smaller by a logarithmic factor than the comparison-based complexity.1
2.1 Monte Carlo vs. Las Vegas Black-Box Complexities
As discussed in Section 1.2, usually the black-box complexity of a function class is defined in a Las Vegas manner (measuring the expected number of function evaluations), while in the case of elitist black-box complexity we also introduce a -Monte Carlo black-box complexity, where we allow some failure probability (see below for formal definitions). If we make a statement about the Monte Carlo complexity without specifying , then we mean that for every constant the statement holds for the -Monte Carlo complexity. However, we sometimes also regard -Monte Carlo complexities for nonconstant , thus yielding high probability statements (we say that an event happens with high probability if for ).
For most black-box complexities, the Las Vegas and the Monte Carlo notions are closely related: every Las Vegas algorithm is also (up to a factor of in the runtime) a -Monte Carlo algorithm by Markov’s inequality, and a Monte Carlo algorithm can be turned into a Las Vegas algorithm by restarting the algorithm until the optimum is found. In particular, if restarts are allowed then Las Vegas and Monte Carlo complexities differ by at most a constant factor. This has been made explicit in Doerr et al. (2014b, Remark 2) and is heavily used there as well as in a number of other results on black-box complexity. It is not difficult to see that such a reasoning fails for elitist black-box algorithms, as they are not allowed to do arbitrary restarts: if the sampled solution intended for a restart is not as good as the ones currently in the memory, it has to be discarded (Line 9 of Algorithm 1). Las Vegas and Monte Carlo elitist black-box complexities may therefore differ significantly from each other; see Section 3.1 for an example with an exponentially large gap.
We come to the formal definition. Let be a class of pseudo-Boolean functions, and let . The Las Vegas complexity of an algorithm for is the maximum expected number of function evaluations of before evaluates an optimal search point for the first time, where the maximum is taken over all . The Las Vegas complexity of with respect to a class of algorithms is the minimum (“best”) Las Vegas complexity among all for . The -Monte Carlo complexity of with respect to is the minimum number such that there is an algorithm in which has for all a probability of at least to find an optimum within the first function evaluations. The elitist Las Vegas (elitist -Monte Carlo) black-box complexity of is the Las Vegas (-Monte Carlo) complexity of with respect to the class of all elitist black-box algorithms.
To ease terminology, we will say that an algorithm spends time on a function if it uses at most function evaluations on . Moreover, we call the runtime of an algorithm on a function the random variable describing the number of function evaluations of until it evaluates for the first time an optimal search point of . In this way, the Las Vegas complexity of on is the worst-case (over all ) expected runtime of the best algorithm .
If we are interested in the asymptotic -Monte Carlo complexity of an algorithm on a function class , then we will frequently make use of the following observation, which follows from Markov’s inequality and the law of total expectation.
Let . Assume that there is an event of probability such that conditioned on the algorithm finds the optimum after expected time at most . Then the -Monte Carlo complexity of on is at most . In particular, if then the -Monte Carlo complexity is .
2.2 (Non-)Applicability of Yao’s Principle
A convenient tool in black-box complexity theory is Yao’s Principle. In simple words, Yao’s Principle allows to restrict one’s attention to bounding the expected runtime of a best-possible deterministic algorithms on a random input instead of regarding the best-possible performance of a random algorithm on an arbitrary input. Analyzing the former is often considerably easier than directly bounding the performance of any possible randomized algorithm. Yao’s Principle states that is a lower bound for the expected performance of a best possible randomized algorithm for the regarded problem. In most applications a very easy distribution on the input can be chosen, often the uniform one. Formally, Yao’s Principle is the following.
Why does this example not contradict Yao’s Principle? Reading Lemma 2 carefully, we see that it makes a statement only about such randomized algorithms that are a convex combination of deterministic ones. In other words, the randomized algorithms (on a fixed input size) are given by making one random choice at the beginning, determining which of the finitely many deterministic algorithms we apply. For typical classes of algorithms every randomized algorithm is such a convex combination of deterministic algorithms (and randomized algorithms are, in fact, often defined this way). In this case Yao’s Principle can be summarized in the way we described before Lemma 2; that is, as a statement that links the worst-case expected runtime of randomized algorithms with the best expected runtime of deterministic algorithms on random input. The previous paragraph, however, explains that in the elitist black-box model there are randomized algorithms which cannot be expressed as a convex combination of deterministic ones. For this reason, we can never apply Yao’s Principle directly to the class of elitist black-box algorithms. Similar considerations hold for other classes of memory-restricted black-box algorithms, but have not been mentioned explicitly in the literature. We are not aware of any other class of algorithms where such an anomaly occurs and find the putative nonapplicability of Yao’s Principle quite noteworthy.
Due to the problems outlined above, we will often consider in our lower-bound proofs a superset of algorithms which contains all elitist ones and which has the property that every randomized algorithm in can be expressed as a convex combination of (more precisely, a probability distribution over) deterministic ones. A lower bound shown for this broader class trivially applies to all elitist black-box algorithms. In particular, in a class of black-box algorithms in which every algorithm knows the number of previous queries, we may apply Yao’s Principle since in such models every randomized strategy is a convex combination of deterministic strategies. The reason for that is a well-known (and completely formal) argument, which we summarize below.
The main idea is that for every (fixed) possible outcome of the random decisions of the randomized algorithm there exists a deterministic algorithm that behaves exactly the same way. For sake of exposition, we will assume first that flips only a single coin for each query. Then we claim that for every , for the first queries can be obtained as a convex combination of deterministic algorithms; that is, we may randomly choose a deterministic algorithm which behaves exactly like for the first queries. In fact, it is very easy to choose such an algorithm: we just do coin flips in advance, and let use the -th coin flip for the -th query (using that knows the number of previous queries). Note that after fixing the coin flips, becomes a deterministic algorithm for the first queries. Thus we may regard the coin flips as a way to choose a deterministic instantiation of . The same argument carries over if we do not limit ourselves to queries (we use an infinite number of coin flips), and if flips more than one bit per query (we flip not just one coin for the -th query, but rather an infinite sequence of coins).
3 Exponential Gaps to Previous Models
We provide some function classes for which the elitist black-box complexity is exponentially larger than their black-box complexities in any of the previously regarded models. In particular, the black-box complexity will still be small in a model in which all algorithms have to be unbiased, memory restricted with size bound one, and purely comparison based. This shows that our model strengthens the existing landscape of black-box models considerably. The example will also show that the Las Vegas complexity of a problem can be exponentially larger than its Monte Carlo complexity.
3.1 Twin Peaks
We first describe a type of landscape for which the elitist black-box complexity is exponentially large. The following theorem captures the intuition that elitist algorithms are very bad if there are several local optima that the algorithm needs to explore in order to determine the best one of them. This remains true if we grant the algorithm access to the absolute (instead of the relative) fitness values, as we will show in Remark 4.
Let . Let be a class of functions from to such that for every set with ,
there is a function such that is the unique global optimum, and is the unique second-best search point of ;
also contains the function that is obtained from by switching the fitness of and . More formally, is defined by , , and for .
Then the (1 + 1) elitist Las Vegas black-box complexity and the -Monte Carlo black-box complexity of is .
To give an intuition, we first give an outline of the proof that is not quite correct. Assume that a black-box algorithm encounters either or . By definition of , it does not know the global optimum before querying either or . It thus needs to query either or first. Assume that it queries first. Then if the algorithm is unlucky (if is not the global optimum; that is, the algorithm optimizes ), the algorithm is stuck in a local optimum which it cannot leave except by sampling the optimum . Due to the memory restriction the algorithm has lost any information about the objective function except possibly that is one of the two best search points. But since for all , the algorithm would then have lost any information about , and would still have test possible optima.
Unfortunately, this intuitive argument fails: after querying the algorithm does have some information about , despite the severely restricted memory. For illustration, consider the following toy case. Let , so the search space consists of only four search points . To keep the size of as small as possible, and still to satisfy the second condition of the theorem, we assume that for all , and that contains no further functions. So contains exactly 12 functions, one for each ordered pair of search points. Moreover, assume that every function in has value set so that the fitness tells us whether we are in the best or second-best search point. Finally assume that has the property , and has the property . Consider the following algorithm . The first query of is always . If is at , then it queries , and with probability and , respectively. If is at or , then the next query is random. We want to understand what the algorithm “knows” when it is at , and is a second-best point. In this case, the only functions to be considered are , and . The algorithm knows that cannot be the optimum since it always queries first. The functions and were a priori equally likely to be chosen. However, for the function the algorithm is very unlikely to visit , since from to algorithm goes directly to the optimum with probability . On the other hand, for the algorithm always visits , since is rejected in this case. Therefore, by Bayes’ theorem the a posteriori probability that the optimum is at is . Hence, although there are three options for the optimum, the option is impossible, and the option is unlikely. (We remark that is not a good algorithm because it may not always terminate; however, this can easily be fixed without affecting the argument.)
The example above shows that it is not true in general that has lost all information about the search space when it enters the second-best search point. Rather it can still draw information from the order in which it typically queries search points. However, informally this information is limited to one bit, namely the (possibly probabilistic) question whether for the function it first visits or first. Therefore, the algorithm cannot gain much, and the intuitive argument outlined at the beginning still works approximately, as we will show below.
To turn the above intuition into a formal proof, we employ Yao’s Principle (Lemma 2). As described in Section 2.2, we need to consider a larger class of algorithms defined as follows. Assume the algorithm has to optimize or . We call the time until the algorithm queries for the first time or the “first phase,” while we call the remaining time the “second phase.” Since we are not too much interested in the time that the algorithm spends in the first phase, we simply give away to the algorithm the set . That is, we give the algorithm complete information about the functions and , but do not tell the algorithm which one of the two is the one to be optimized. Since the two function coincide everywhere except for and , during the first phase the algorithm knows everything about the objective function except which of the two points or is the optimum. We also give the algorithm access to unlimited memory throughout this phase. During this phase every randomized algorithm is a convex combination of deterministic ones. So we may use Yao’s Principle, choose a probability distribution on , and restrict ourselves to an algorithm that is deterministic in the first phase. For the probability distribution on , we choose a set of two -bit strings uniformly at random, and then we pick either or , each with probability .
Note that in the first phase the algorithm does not gain any additional information by querying any search point since it can predict the fitness value of without actually querying it. We may thus assume that the first query of is either or . Let be the set of all sets , where and . In the first phase the algorithm essentially assigns to each set either or . Let us denote the corresponding function by . With probability , is the global optimum, and with probability it is not.
With probability the algorithm enters the second phase, in which we no longer allow it to access anything but the current search point and possibly its fitness. For the sake of exposition, we first consider the case that the algorithm may not access the fitness, and describe afterwards how to change the argument otherwise. The algorithm can be randomized in this second phase. Recall that the instance is taken uniformly at random, and that samples whenever . Therefore, conditioned on seeing , the global optimum is uniformly distributed in . Since does not have any additional memory in this phase, every subsequent query has probability at most to be the optimum, independent of any previous queries. Hence, needs in expectation at least additional queries to find , and the probability to find the optimum with additional queries is at most by the union bound.
It remains to show that is large with high probability. Let . Since the sets form a partition of , and since there are such sets, the average size2 of the is . Let . Then . Since the random instance is chosen uniformly at random from , the set is also uniformly at random, and with probability at least an instance from is chosen, and thus . Thus for every , conditioned on entering the second phase we have with probability at least . Choosing shows that with probability at least the algorithm needs at least steps. This concludes the proof.
Theorem 3 essentially also holds if we allow the algorithms to access absolute fitness values. More precisely, let be a class of functions as in Theorem 3, and let be the set of all second-best fitness values. If has subexponential size, then the (1 + 1) elitist Las Vegas black-box complexity and the -Monte Carlo black-box complexity of remain exponential even if the algorithms have access to the absolute fitness values.
The same proof as for Theorem 3 still works, only that for every we let . This partitions into subsets, and since , on average these sets are still exponentially large. The theorem now follows in the same way as before, with the sets replaced by .
The Double OneMax Problem Theorem 3 provides us with landscapes that are very hard for elitist algorithms. We now give a more concrete example, the class of doubleOneMaxfunctions. This class is of the type as described in Theorem 3, but at the same time it is easy for a very simple nonelitist algorithm, namely a variant of RLS using restarts (cf. Algorithm 3). The basis for double OneMax functions is OneMax, one of the best studied example functions in the theory of evolutionary computation. The original OneMax function simply counts the number of ones in a bit string. Maximizing OneMax thus corresponds to finding the all-ones string.
Search heuristics are typically invariant with respect to problem encoding, and as such they have the same expected runtime for any function from the generalized OneMax function class , where is defined by (2). We call , the unique global optimum of function , the target string of .
OneMax is one of the best-understood problems in the theory of evolutionary computation, and serves as a showcase also in many publications on black-box complexity. Most notably, it is known that the unrestricted black-box complexity of OneMax is (Anil and Wiegand, 2009; Droste et al., 2006; Erdös and Rényi, 1963), and that this bound holds also in the ranking-based (Doerr and Winzen, 2014b) and the (1 + 1) memory-restricted (Doerr and Winzen, 2014a) black-box models. The unary unbiased black-box complexity (cf. Section 4 for a brief explanation of this model) of OneMax is (Lehre and Witt, 2012), its binary unbiased black-box complexity is linear (Doerr, Johannsen, et al., 2011), and its -ary unbiased black-box complexity is (Doerr and Winzen, 2014c). Finally, the (1 + 1) elitist Monte Carlo black-box complexity of OneMax is (Doerr and Lengler, 2015b).
A very simple heuristic optimizing OneMax in steps is Randomized Local Search (RLS). Since a variant of RLS will be used in our subsequent proofs, we give its pseudocode in Algorithm 2. RLS is initialized with a uniform sample . In each iteration one bit position is chosen uniformly at random. The -th bit of is flipped and the fitness of the resulting search point is evaluated. The better of the two search points and is kept for future iterations (favoring the newly created individual in case of ties). As is easily verified, RLS is a unary unbiased (1 + 1) elitist black-box algorithm, where unbiased refers to the notion of unbiasedness defined in Lehre and Witt (2012, Section 3.2).3
Let . The (1 + 1) elitist -Monte Carlo black-box complexity of and its unary unbiased, (1 + 1)-memory restricted, comparison-based black-box complexity is , while the (1 + 1) elitist Las Vegas black-box complexity of and its -Monte Carlo black-box complexity are even if we allow the algorithms to access absolute fitness values.
The class satisfies the conditions from Theorem 3, so the lower bound for the (1 + 1) elitist Las Vegas black-box complexity of follows immediately from Theorem 3 and Remark 4. For the upper bound, consider the random local search algorithm (RLS) with random restarts as given by Algorithm 3. This algorithm is initialized like RLS. The only difference to RLS (Algorithm 2) is that during the optimization process, instead of mutating the current best search point, it may restart completely by drawing a point uniformly at random from and replacing the current best solution by regardless of their fitness values. We show that this algorithm has expected optimization time .
Whenever then the one-bit flip has probability at least to increase the fitness of (this can be proven by an easy case distinction with respect to whether or not ). This is at least as large as the progress probability for OneMax. Since it is well-known that with high probability RLS finds the optimum of OneMax within, say, steps (Auger and Doerr, 2011, Theorem 1.23), with high probability it finds either or in this time. Therefore, if no restarts happen in steps (which is true with constant probability), then with high probability Algorithm 3 also finds either or in this time. Note that the search space is 2-vertex transitive; that is, there is an automorphism of the search space that maps to and vice versa. By definition of , the same automorphism maps to . Hence, since RLS with restarts is an unbiased algorithm, it will reach before with probability , and vice versa. Thus, when the algorithm queries either or , then it finds the global optimum with probability . Summarizing, after each restart, the algorithm has at least a constant probability to find the global optimum in the next steps. This proves both upper bounds in Theorem 5.
3.2 Hidden Paths
We provide another example with an exponential gap between elitist and nonelitist black-box complexities, which gives some more insight into the disadvantage of elitist algorithms. We use essentially the OneMax function, patched with a path of low fitness that leads to the global optimum. In this example, every elitist algorithm fails with high probability to find the optimum in polynomial time, since it is blind to all search points of small fitness value. Both the Monte Carlo and the Las Vegas elitist black-box complexity of the problem are exponential in , so that (unlike the example from Section 3.1) the problem cannot be easily mended by allowing restarts. On the other hand, there are memory-restricted, unary unbiased (but not elitist) algorithms that solve the problem efficiently.
For , let be the bitwise complement of ; that is, for all . Let further To each and each , we associate a path of length as follows. For , let be the search point obtained from by flipping the -th bit. Note that differs from in exactly bits.
The unary unbiased (1 + 1) memory-restricted black-box complexity of is , while its (1 + 1) Monte Carlo (and thus, also Las Vegas) elitist black-box complexity is , also for the nonranking-based version of the elitist model in which full (absolute) fitness information is revealed to the algorithm.
For the upper bound, we need to describe a memory-restricted unary unbiased black-box algorithm that optimizes in quadratic time. The algorithm proceeds as follows. While its current search point has fitness at least , it finds the local optimum using Randomized Local Search (RLS). This takes expected time . From it jumps to the starting point of the path . The algorithm now follows the path by using again RLS but accepting an offspring if and only if it increases the parent’s fitness by exactly 1 or if the offspring’s fitness is . In particular, in this phase the algorithm rejects any search point with fitness between and . Since this algorithm needs time to advance one step on the path, and the path has length , it has expected runtime .
For the lower bound we again extend the class of elitist black-box algorithms to a larger class that allows to apply Yao’s Principle. After an algorithm in has sampled its first search point, we distinguish two cases. If the search point has fitness at most , then the algorithm may access the position of the global optimum (and thus, terminate in one more step). If the first search point has fitness larger than , then the algorithm may access the position of the local optimum . Moreover, it may access a counter that tells it how many steps it has performed so far. Apart from that, it may only access (one of) the best search point(s) it has found so far, and its fitness. Then is the set of all algorithms that can be implemented with this additional information. In this way, every randomized algorithm in is a convex combination of deterministic ones, so that we can apply Yao’s Principle. So let be a deterministic algorithm, and consider the uniform distribution on .
If the first search point has fitness or at most , then is done after one query or it can terminate in at most one additional step, respectively. However, by the Chernoff bound these two events happen only with probability , so from now on we assume that the first search point has fitness larger than . Observe that by the accessible information the algorithm can determine the OneMax value for all . In particular, for every search point of larger fitness except for the algorithm can predict the fitness value without querying it. On the other hand, if it queries a search point of lower fitness, then it is not allowed to keep its fitness value, and after the query it is in the same state as before. Either way, the algorithm can predict in which state it will be if the query does not hit the optimum, so cannot obtain additional information about except by querying the optimum . Since was chosen uniformly at random, all search points in distance from have the same probability to be the global optimum. Hence, needs in expectation at least queries to find the optimum.
A similar statement as the one in Theorem 6 holds also for ranking-based algorithms if we slightly increase the memory of the algorithms regarded. Indeed, there exists a unary unbiased (2+1) memory-restricted ranking-based algorithm optimizing in expected function evaluations. Regard, e.g., the algorithm that maintains throughout the second phase a search point of fitness and that accepts an offspring of if and only if the fitness of is larger than that of but smaller than that of (in which case ). Then is sampled (but not accepted into the population, see Remark 8) after steps.
On the other hand, the (2+1) elitist black-box complexity is still exponential, since with probability at least (in fact, at least ) the OneMax values of the first two search points are at least .
As indicated in Remark 7 it can make a crucial difference for (nonelitist) black-box algorithms if we only require them to sample an optimum or whether we require the algorithm to accept it into the population. For example, the algorithm described in Remark 7 does not accept the optimum when finding it.
4 Combining Unbiased and Elitist Black-Box Models
In this section we demonstrate that apart from providing more realistic lower bounds for some function classes, the elitist black-box model is also an interesting counterpart to existing black-box models. Indeed, we show that some of the unrealistically low black-box complexities of the unbiased black-box model proposed in Lehre and Witt (2012) disappear when elitist selection is required.
More specifically, we regard the unary unbiased (1 + 1) elitist black-box complexity4 of Jump functions, which we define in the following way (this definition is in line with Doerr et al. (2014a), but deviates from Droste et al. (2002); most notably, in our definition the and not the fitness values before are blanked out. See Jansen (2015) for an alternative way to generalize the Jump function class. For a parameter the function assigns to each bit string the function value if and otherwise. Despite the fact that all common mutation-based search heuristics need fitness evaluations to optimize this function, the unary unbiased black-box complexity of these functions are surprisingly low; see Table 1 for a summary of results presented in Doerr et al. (2014a); Doerr, Kötzing, et al. (2011) for . Interestingly, even for extreme jump functions in which only the fitness value is visible and all other OneMax values are replaced by zero, polynomial-time unary unbiased black-box algorithms exist. It is thus interesting to see that the situation changes dramatically when the algorithms are required to be elitist, as the following theorem shows.
|Model .||range of .||unary unbiased .||elitist unary unbiased .|
|Model .||range of .||unary unbiased .||elitist unary unbiased .|
For the (Las Vegas and Monte Carlo) unary unbiased (1 + 1) elitist black-box complexity of the jump function is . For all it is . In particular, for the black-box complexity is superpolynomial in and for it is .
For any constant the upper bound is achieved by the simple EA (Droste et al., 2002). For general , consider the algorithm that produces an offspring as follows. With probability the offspring is a search point uniformly at random from , with probability the algorithm flips exactly one bit (uniformly at random), and with probability it flips exactly bits (also uniformly at random). The offspring is accepted if its fitness is at least the fitness of the current search point. We claim that this algorithm finds a point of positive fitness in expected time . Indeed, it produces random search points with probability , and each such uniform sample has OneMax value with probability by Stirling’s formula. In summary, in each step the algorithm produces queries a search point of fitness with probability , so the expected time until a search point of positive fitness is queried is at most , as claimed. Once such a search point is found, by a coupon collector argument with high probability it increases the fitness to in at most steps by one-bit flips (and possibly -bit flips). Afterwards, since there are search points in distance , the algorithm needs in expectation at most steps to find the optimum. This proves the upper bound for the Las Vegas complexity, which in turn implies the upper bound for the Monte Carlo complexity.
Since is elitist, it can never accept a point in . Therefore, in every subsequent step before finding the optimum, it will be in some search point with distance from the optimum. If an unbiased mutation has some probability to produce from , then every other search point in distance has also probability to be the offspring. In particular, since there are such points, equals the probability that the offspring has distance of , which is at most 1. Hence, at any point the probability to sample the optimum in the next step is at most . Therefore, needs in expectation at least steps to find the optimum. Moreover, by the union bound the probability that needs at most steps is at most .
We give the proof only in the case that is a power of 2, which is less technical. Consider a unary unbiased (1 + 1) elitist black-box algorithm . Let be the event that the first search point of strictly positive fitness that queries is the optimum. We claim that . Before we prove the claim, we discuss how it implies the theorem. Conditioned on , Theorem 10 tells us that needs at least additional steps in expectation, and at most with probability at most . Thus, the probability to find the optimum in at most steps is at most . Rephrasing this statement, for every (constant) the algorithm needs more than steps with probability at least . This proves the lower bound on the Monte Carlo complexity, which in turn implies the lower bound on the Las Vegas complexity by Markov’s inequality.
So it remains to show that . In fact, we will show that this is true for every (unrestricted) black-box algorithm . Note that such unrestricted algorithms are in particular not memory restricted, so by Yao’s Principle (Lemma 2) it suffices to proof the lower bound for all deterministic back-box algorithms on random input. So let be such a deterministic algorithm. We regard a uniformly chosen function; that is, the target string of the OneMax function underlying the function is chosen from uniformly at random. For ease of terminology, we say that wins if the first search point of positive fitness that queries is the optimum, that loses otherwise, and that terminates with the -th query if either wins or loses with the -th queries.
Consider the following sequence of search points. Let be the first query of , and for let be the search point that queries in round if the previous search points all had fitness 0. We may assume that the queries are all different from each other, and in this case the sequence forms a permutation of the search space that determines . (In fact, may be ill-defined for large because it can happen that terminates with probability 1 with the first queries. For example, if , then there are only search points of fitness 0, so terminates with probability 1 with the first queries. In this case, for consistency of notation we fill up the sequence in an arbitrary way, with the queries being irrelevant for the question whether wins or loses.) With this notation, the event can equivalently be phrased as the event that the optimum is not the first search point of positive fitness in the list .
If is a power of 2 then it is well known that one can partition the hypercube into sets of size such that for each the pairwise distance between any two points in is exactly (e.g., the cosets of the Walsh-Hadamard code as described in Section 17.5.1 of Arora and Barak, 2009). Let be the index of the set containing the target string , i.e., . Regardless of the jump size , each search point in has positive fitness. Indeed, for each either we have (in which case the fitness of equals ) or the distance and thus the fitness of to equals . Since is chosen uniformly at random, the probability that is the first search point of set to appear in the sequence equals . On the other hand, if is not the first search point of , then this implies the event . Therefore, , as required.
We have introduced elitist black-box complexity as a tool to analyze the performance of search heuristics with elitist selection rules. Several examples provide evidence that the elitist black-box complexities can give a much more realistic estimation of the expected runtime of typical search heuristics. We have also seen that some unrealistically low black-box complexities in the unbiased model disappear when elitist selection is enforced.
We have also introduced the concept of Monte Carlo black-box complexities and have brought to the attention of the community the fact that these can be significantly lower than the previously regarded Las Vegas complexities. In addition, it can also be significantly easier to derive bounds for the Monte Carlo black-box complexities (see Doerr and Lengler, 2015b). Both complexity notions correspond to runtime analysis statements often seen in the evolutionary computation literature and should thus co-exist in black-box complexity research.
While we regard in this work toy problems, it would be interesting to analyze the influence of elitist selection on the performance of algorithms in more challenging optimization problems. Our findings enliven the question for which problems nonelitist selection like tournament or so-called fitness-dependent selection can be beneficial, initial findings for which can be found in Friedrich et al. (2009) and Oliveto and Zarges (2015). Negative examples are presented in Happ et al. (2008); Neumann et al. (2009); Oliveto and Witt (2014).
This research benefited from the support of the “FMJH Program Gaspard Monge in optimization and operation research,” and from the support to this program from EDF (Électricité de France).