Abstract

We analyze the unbiased black-box complexities of jump functions with small, medium, and large sizes of the fitness plateau surrounding the optimal solution. Among other results, we show that when the jump size is , that is, when only a small constant fraction of the fitness values is visible, then the unbiased black-box complexities for arities 3 and higher are of the same order as those for the simple OneMax function. Even for the extreme jump function, in which all but the two fitness values and n are blanked out, polynomial time mutation-based (i.e., unary unbiased) black-box optimization algorithms exist. This is quite surprising given that for the extreme jump function almost the whole search space (all but a fraction) is a plateau of constant fitness. To prove these results, we introduce new tools for the analysis of unbiased black-box complexities, for example, selecting the new parent individual not only by comparing the fitnesses of the competing search points but also by taking into account the (empirical) expected fitnesses of their offspring.

1  Introduction

The analysis of black-box complexities in evolutionary computation aims in several complementary ways at supporting the development of superior evolutionary algorithms. Comparing the runtime of currently used randomized search heuristics (RSH) with that of an optimal black-box algorithm allows a fair evaluation of the quality of today’s heuristics. With specific black-box complexity notions, we can understand how algorithm components and parameters such as the population size, the selection rules, or the sampling procedures influence the runtime of RSH. Finally, research in black-box complexity proved to be a source of inspiration for developing new algorithmic ideas leading to the design of better search heuristics.

In this work, we analyze the unbiased black-box complexities of jump functions, which are observed as difficult for evolutionary approaches because of their large plateaus of constant (and low) fitness. Our results show surprisingly that even extreme jump functions revealing only the three different fitness values 0, , and n can be optimized by a mutation-based unbiased black-box algorithm in polynomial time. We introduce new methods that facilitate our analyses. Perhaps the most interesting one is a routine that creates a number of samples from which it estimates the distance of the current search point to the fitness layer . Our algorithm thus benefits significantly from analyzing some nonstandard statistics of the fitness landscape. We believe this to be an interesting idea that should be investigated further. Our hope is that it can be used to design new search heuristics.

1.1  Black-Box Complexity

Black-box complexity studies how many function evaluations are needed in expectation by an optimal black-box algorithm until it queries for the first time an optimal solution for the problem at hand. Randomized search heuristics, like evolutionary algorithms, simulated annealing, and ant colony algorithms, are typical black-box optimizers. They are typically problem-independent, and as such they learn about the problem to be solved only by generating and evaluating search points. The black-box complexity of a problem is thus a lower bound for the number of fitness evaluations needed by any search heuristic to solve it.

Several black-box complexity notions covering different aspects of randomized search heuristics exist, for example, the unrestricted model of Droste et al. (2006), which does not restrict in any way the sampling or selection procedure of the algorithm; the ranking-based model of Teytaud and Gelly (2006) and Doerr and Winzen (2014a), in which the algorithms are required to base their selection only on relative and not on absolute fitness values; the memory-restricted model of Droste et al. (2006) and Doerr and Winzen (2012), in which the algorithm can store only a limited number of search points and their corresponding fitness values; and the unbiased model of Lehre and Witt (2012), which in addition to allowing the classification of algorithms according to the arity of the variation operators, requires the algorithms to treat the representation of the search points symmetrically. By comparing the respective black-box complexities of a problem, one learns how the runtime of RSH is influenced by certain algorithmic choices, such as the population size, the use of crossover, the selection rules, and so on.

For all existing black-box models, however, the typical optimal black-box algorithm is a highly problem-tailored algorithm that is not necessarily nature-inspired. Still we can learn from such artificial algorithms about RSH, as shown by Doerr et al. (2015). In that work, a new genetic algorithm is presented that optimizes the OneMax function in runtime , thus showing that the bound for the binary unbiased black-box complexity of found by Doerr et al. (2011a) is not as unnatural as it might at first seem.

Here we consider unbiased black-box complexities. The unbiased black-box model is one of the standard models for analyzing the influence of the arity on the performance of optimal black-box algorithms. It was originally defined by Lehre and Witt (2012) for bit string representations and was later generalized to domains different from bit strings by Rowe and Vose (2011) and Doerr et al. (2013). For bit string representations, the unbiased model requires the optimizing algorithms to treat different positions of the bit strings equally, similarly with the two different possible bit contents (thus the term unbiased). For example, unbiased algorithms are not allowed to explicitly write a 1 or a 0 at a specific position of a bit string to be evaluated; instead, such algorithms can either sample a bit string uniformly at random, or generate one from previously evaluated solutions via operators that are unbiased (i.e., treat positions and bit contents equally). Section 3 gives a detailed description of the model.

1.2  Jump Functions

In this paper, we are concerned with the optimization of functions mapping bit strings of fixed length (i.e., elements of the hypercube ) to real numbers; such functions are called pseudo-Boolean. A famous pseudo-Boolean function often considered as a test function for optimization is the OneMax function, mapping any to the number of 1s in x (the Hamming weight of x).

Other popular test functions are the jump functions. For a non-negative integer , we define the as derived from OneMax by blanking out any useful information within the strict -neighborhood of the optimum (and the minimum) by giving all these search points a fitness value of 0. In other words, if , and otherwise. This definition is mostly similar to the two, not fully agreeing, definitions used by Droste et al. (2002) and Lehre and Witt (2010). See Section 2.

functions are well-known test functions for randomized search heuristics. Droste et al. (2002) analyzed the optimization time of the (1+1) evolutionary algorithm on functions. From their work, it is easy to see that for our definition of functions, a runtime of for the (1+1) evolutionary algorithm on follows for all . We are not aware of any natural mutation-based randomized search heuristic with significantly better performance (except for large , where simple random search with its runtime becomes superior). For all , Jansen and Wegener (2002) present a crossover-based algorithm for . With an optimal choice for the parameter involved, which in particular implies a very small crossover rate of , it has an optimization time1 of for constant and an optimization time of for , c a constant.

1.3  Results

We analyze the unbiased black-box complexity of functions for a broad range of jump sizes . We distinguish between short, long, and extreme jump functions for , , and , respectively. Our findings are summarized in Table 1.

Table 1:
Unbiased black-box complexities of for different regimes of . The lower bound for arity follows from Lehre and Witt (2012, Theorem 6). All other results are original to the present paper. The binary and ternary upper bounds for long jump functions follow from the corresponding ones for extreme jump functions.
ArityShort Jump Long Jump Extreme Jump
k = 1    Thm. 14  Thm. 20 
k = 2   Thm. 4    Thm. 18 
    Cor. 7  Thm. 16, Lem. 15 
ArityShort Jump Long Jump Extreme Jump
k = 1    Thm. 14  Thm. 20 
k = 2   Thm. 4    Thm. 18 
    Cor. 7  Thm. 16, Lem. 15 

Contrasting the runtime results for classic evolutionary approaches on , we show that for jump functions with small jump sizes , the k-ary unbiased black-box complexities are of the same order as those of the easy test function (which can be seen as a function with parameter ). As an intermediate result we prove (Lemma 3) that a black-box algorithm having access to a jump function with can retrieve (with high probability) the true value of a search point using only a constant number of queries. This implies that we get the same runtime bounds for short jump functions as are known for OneMax. For this is (Lehre and Witt, 2012); for it is  (Doerr et al., 2011a); and for it is  (Doerr and Winzen, 2014b).

A result like Lemma 3 is not to be expected to hold for larger values of . Nevertheless, we show that also long jump functions, where can be as large as , have unbiased black-box complexities of the same asymptotic order as for arities . For we get a bound of , and for we get , both surprisingly low black-box complexities. Even for the case of extreme jump functions, where and n even (a jump function revealing only the optimum and the fitness ), we are able to show polynomial unbiased black-box complexities for all arities .

Note that already for long jump functions, the fitness plateau that the algorithms have to cross has exponential size. For the extreme jump function, all but a fraction of the search points form one single fitness plateau. This is the reason why none of the popular randomized search algorithms will find the optimum of long and extreme jump functions in subexponential time.

Our results indicate that even without the fitness function revealing useful information for search points close to the optimum, efficient optimization is still possible in the framework of unbiased black-box algorithms (provided there is enough knowledge about the location of the optimum).

The results regarding short jump can be found in Section 4, the results on long jump in Section 5, and the results on extreme jump in Section 6. Note that the bound on the binary unbiased black-box complexity of long jump follows from the same bound on extreme jump. The lower bounds partly follow from a more general result of independent interest (Theorem 2), which implies that for all , for all k, the k-ary unbiased black-box complexity of is less than or equal to that of .

1.4  Methods

In order to show the upper bounds on the black-box complexities, we give efficient algorithms optimizing the different jump functions. For arity , these algorithms are based on iteratively getting closer to the optimum. However we do not (and in fact cannot) rely on fitness information about these closer search points; the fitness is 0 in almost all cases. Instead, we rely on the empirical expected fitness of offspring. For this we use mutation operators that have a good chance of sampling offspring with nonzero fitness. We show that already a polynomial number of samples suffices to distinguish search points whose fitness differs only minimally. In order to minimize the number of samples required, we choose this number adaptively depending on the estimated number of 1s in the search point to be evaluated; we also allow fairly frequent incorrect decisions, as long as the overall progress to the optimum is guaranteed.

In one of our proofs we make use of an additive Chernoff bound for negatively correlated variables. This bound is implicit in a paper by Panconesi and Srinivasan (1997) and is of independent interest.

2  Jump Functions

As mentioned, several definitions for jump functions exist. We use here the version that is inspired by the idea of blanking out the full -neighborhood of the optimum and of its complement (the latter is needed because otherwise one could optimize the function by simply searching for the unique search point with value zero and complementing it). To be precise, for all , is the function that assigns to each fitness
formula
where denotes the number of 1s in x (also known as the Hamming weight of x).
The jump function analyzed by Droste et al. (2002, def. 24) assigns to x fitness value
formula
Not only can this function be optimized by searching for the complement of the optimum (as described), but it also provides more information for those x with . While for classic runtime analysis of randomized search heuristics this does not pose any problems, these properties are not desirable for black-box complexity studies. Lehre and Witt (2010) therefore designed a different jump function , assigning to each x fitness value
formula
Our version is mostly similar to the latter, with the only difference being the function values for bit strings x with . In our version the sizes of the blanked out areas around the optimum and its complement are equal, while for that area is larger around the complement than around the optimum.
Jansen (2015) introduced yet another version of the jump function, inspired by the idea that the spirit of jump functions is to “[locate] an unknown target string that is hidden in some distance to points a search heuristic can find easily.” Jansen’s definition also has black-box complexity analysis in mind. For some search point with , his jump function assigns to bit string x the fitness value
formula
Since these functions do not reveal information about the optimum other than its -neighborhood, the (unrestricted) black-box complexity of the class is  (Jansen, 2015, Theorem 4). For constant this expression is , very different from the results on the unrestricted black-box complexity of in Buzdalov et al. (2015) or from our results.

3  The Unbiased Black-Box Model

The unbiased black-box model introduced in Lehre and Witt (2012) is by now one of the standard complexity models in evolutionary computation. In particular, the unary unbiased model gives a more realistic complexity estimate for a number of functions than the original unrestricted black-box model of Droste et al. (2006). An important advantage of the unbiased model is that it allows us to analyze the influence of the arity of the sampling operators in use. In addition, new search points can be sampled only either uniformly at random or from distributions that depend on previously generated search points in an unbiased way. In this section, we give a brief definition of the unbiased black-box model, pointing to Lehre and Witt (2012) and Doerr and Winzen (2014b) for a more detailed introduction.

For all non-negative integers k, a k-ary unbiased distribution is a family of probability distributions over such that for all inputs the following two conditions hold:

  1. [-invariance]

  2. [permutation-invariance]
    formula
    where is the bitwise exclusive-OR, Sn the set of all permutations of the set , and for .

An operator sampling from a k-ary unbiased distribution is called a k-ary unbiased variation operator.

A k-ary unbiased black-box algorithm is one that follows the scheme of Algorithm 1 (here and in the following with we abbreviate ). The k-ary unbiased black-box complexity, denoted , of some class of functions is the minimum complexity of with respect to all k-ary unbiased black-box algorithms, where, naturally, the complexity of an algorithm A for is the maximum expected number of black-box queries that A performs on a function until it queries for the first time a search point of maximal fitness. We let -ary unbiased black-box complexity be based on the model in which operators of arbitrary arity are allowed.

formula

The unbiased black-box model includes most of the commonly studied search heuristics, such as many and evolutionary algorithms (EAs), simulated annealing, the Metropolis algorithm, and the randomized local search algorithm.

We recall a simple remark from Doerr et al. (2014b) that helps us shorten some of the proofs in the subsequent sections.

Remark 1:

Suppose for a problem P there exists a black-box algorithm A that, with constant success probability, solves P in s iterations (that is, queries an optimal solution within s queries). Then the black-box complexity of P is at most .

A useful tool for proving lower bounds is Theorem 2. It formalizes the intuition that the black-box complexity of a function can only get harder if we “blank out” some of the fitness values. This is exactly the situation of the Jump functions.

Theorem 2:

For all sets of pseudo-Boolean functions C, all , and all such that for all with we have .

Proof:

Let C, k, and f be as in the statement of the theorem. Let A be any k-ary unbiased black-box algorithm for . We derive from this a k-ary unbiased black-box algorithm for C by using queries to and then mapping the resulting objective value with f. Clearly, finds an optimum of after no more expected queries than A for , using the condition on the set of optimal points. Thus, the theorem follows.

From Theorem 2 we immediately obtain a lower bound of for the unbiased black-box complexities of jump functions. The theorem implies that the k-ary unbiased black-box complexity of OneMax is a lower bound of that of any jump function. In general, the k-ary unbiased black-box complexity of any pseudo-Boolean function f is at least the unrestricted black-box complexity of the class of functions obtained from f by first applying an automorphism of the hypercube . That the latter for is was shown independently by Droste et al. (2006) and Erdös and Rényi (1963). A similar line of arguments proves the lower bound for extreme jump functions (see Section 6).

The lower bound for the unary unbiased black-box complexity of follows immediately from the bound proven by Lehre and Witt (2012, Theorem 6) for all pseudo-Boolean functions with unique global optimum.

4  Short Jump Functions

The key idea for obtaining the bounds on short jump functions, i.e., jump functions with jump size , is the following lemma. It shows that one can compute, with high probability, the OneMax value of any search point x with few black-box calls to . With this, we can orient ourselves on the large plateau surrounding the optimum and thus revert to the problem of optimizing .

We collect these computations in a subroutine, to be called by black-box algorithms.

Lemma 3:

For all constants and all , there is a unary unbiased subroutine s using queries to such that, for all bit strings x, with probability .

Proof:

We assume n to be large enough so that . We use a unary unbiased variation operator , which samples uniformly an -neighbor (a bit string that differs in exactly positions) of the argument. Next we give the subroutine s, which uses to approximate OneMax as desired (see Algorithm 2). Intuitively, the subroutine samples bit strings in the -neighborhood of x; if , then it is likely that at least once only 1s of x have been flipped, leading to a value of ; as no sample will have a lower value, adding to the minimum non-zero fitness of one of the sampled bit strings gives the desired output. The case of x with is analogous.

formula
Clearly, the subroutine is correct with certainty on all x with . The other two cases are nearly symmetric, so we only analyze x with . Clearly, the return value of the subroutine is correct if and only if at least one of the t samples flips only 1s in x (note that holds due to ). We denote the probability of this event with p. We start by bounding the probability that a single sample flips only 1s. We choose which k bits to flip iteratively so that after i iterations, there are at least bit positions with a 1 out of ni unchosen bit positions left to choose. This gives the bound of
formula
using Bernoulli’s inequality. Let be such that . We have
formula

With Lemma 3 at hand, the results stated in Table 1 follow easily from the respective OneMax bounds proven by Doerr et al. (2011a), Doerr and Winzen (2012), and Droste et al. (2006).

Theorem 4:

For and , the unbiased black-box complexity of is for unary variation operators, and it is for k-ary variation operators with .

Proof:

First note that the black-box complexities claimed for are shown for OneMax in Droste et al. (2006) for , in Doerr et al. (2011a) for , and in Doerr and Winzen (2014b) for .

We use Lemma 3 with and run the unbiased black-box algorithms of the appropriate arity for OneMax; all sampled bit strings are evaluated using the subroutine s. Thus, this algorithm samples as if working on OneMax and finds the bit string with all 1s after the desired number of iterations. Note that for up to uses of s, we expect no more than incorrect evaluations of s. Therefore, there is a small chance of failing, and the claim follows from Remark 1.

5  Long Jump Functions

In this section, we give bounds on long jump functions; we start with a bound on the ternary black-box complexity, followed by a bound on the unary black-box complexity. Note that the bound on the binary unbiased black-box complexity of long jump follows from the same bound on extreme jump.

5.1  Ternary Unbiased Optimization of Long Jump Functions

We show that ternary operators allow for solving the problem independently in different parts of the bit string and then combining the partial solutions. This has the advantage that, as in Section 4, we can revert to optimizing OneMax, and the missing fitness values will not show in any partial problems.

We start with a lemma regarding the possibility of simulating unbiased algorithms for OneMax on subsets of the bits.

Lemma 5:

For all bit strings we let (this set is isomorphic to a hypercube). Let A be a k-ary unbiased black-box algorithm optimizing with constant probability in time at most . Then there is a -ary unbiased black-box subroutine as follows.

  • Inputs to are and the Hamming distance a of x and y; x and y are accessible as search points sampled previous to the call of the subroutine.

  • has access to an oracle returning for all .

  • After at most queries has found the with maximal value with constant probability.

Proof:

Let x and y with Hamming distance a be given as detailed in the statement of the theorem. Note that is isomorphic to . Without loss of generality, assume that x and y differ on the first a bits, and let be the last na bits of x (which equal the last na bits of y). Thus, .

We employ A optimizing . Sampling a uniformly random point in is clearly unbiased in x and y. However, the resulting OneMax value is not the value that A requires, unless is the all-0s string. In order to correct for this, we need to know the number of 1s in . This we can compute from , and a as follows. Let zx and zy be such that and . We have
formula
Thus, for any bit string sampled by we can pass the OneMax value of z to A. In iteration t, when A uses a k-ary unbiased operator that samples according to the distribution , uses the -ary unbiased operator that samples according to the distribution such that
formula
For any such that is not defined by this equation, we let this distribution be the uniform distribution over . From the additional conditioning on x and y we see that is indeed unbiased. Note that samples only points from . As described, can now use the OneMax value of the resulting to compute the OneMax value of z and pass that on to A as the answer to the query. This shows that the simulation is successful as desired.
Theorem 6:

Let . For all , the k-ary unbiased black-box complexity of is .

Proof:

We optimize blockwise, where each block is optimized by itself and the correct OneMax value is available as long as only bits within the block are modified. Then the different optimized blocks are merged to obtain the optimum.

Let , and assume for the moment that a divides n. Algorithm 3 gives a formal description of the intuitive idea. This algorithm uses the following unbiased operators:

  • . The operator is a 0-ary operator that samples a bit string uniformly at random.

  • . For two search points x and y and an integer k, the operator generates a search point by randomly flipping k bits in x among those bits where x and y agree. If x and y agree in fewer than k bits, then all bits where x and y agree are flipped.

  • . For three search points , and z, the operator returns a bit string identical to the first argument, except where the second and third differ; there the bits of x are flipped. Note that this operator is deterministic.

  • . For three search points , and z, the operator copies x, except where the second differs from the third; there it copies y. This is also a deterministic operator.

formula

Furthermore, we use the subroutine from Lemma 5 with a fixed time budget that guarantees constant success at each call, returning the best bit string found (note that if a does not divide n, the last call to has to be with respect to a different Hamming distance).

5.1.1  Expected Number of Queries

A uniformly sampled bit string has exactly 1s with probability , which shows that the first line takes an expected number of queries. Since , all loops have a constant number of iterations. The body of the second loop takes as long as a single optimization of OneMax with arity k, which is in , so that initial sampling in line 1 makes no difference in the asymptotic runtime. Thus, the total number of queries is .

5.1.2  Correctness

The algorithm first generates a reference string x with . The first loop generates bit strings , which have a Hamming distance of a to the reference string x; in this way the different partition the bit positions into sets of at most a positions each. The next loop optimizes (copies of) x on each of the selected sets of a bits independently as if optimizing OneMax. For the bit strings encountered during this optimization we always observe the correct OneMax value, as their Hamming distance to x is at most a, and x has exactly 1s. The last loop copies the optimized bit positions into b by copying the bits in which and x differ (those are the incorrect ones). This selects the correct bits in each segment with constant probability according to Lemma 55. As all segments have a constant independent failure probability, we get a constant overall failure probability (since a is constant), and Remark 1 concludes the proof. The proof trivially carries over to the case of n not divisible by a.

Thus, we immediately get the following corollary, using the known run time bounds for OneMax from Doerr and Winzen (2014b).

Corollary 7:

Let . Then the unbiased black-box complexity of is

  • , for ternary variation operators;

  • , for k-ary variation operators with ;

  • , for k-ary variation operators with or unbounded arity.

Note the upper bound of for the ternary unbiased black-box-complexity, which we improve in Section 6 to . For all higher arities, the theorem presented in this section gives the best known bound.

5.2  Unary Unbiased Optimization of Long Jump Functions

When optimizing a function via unary unbiased operators, the only way to estimate the value of a search point x (equivalently, its Hamming distance from the optimum) is by sampling suitable offspring that have a nonzero fitness. When is small, that is, when many values can be derived straight from the fitness, we can simply flip bits and hope that the retrieved fitness value is by smaller than . This was the main idea for dealing with short jump functions (see Section 4).

When is larger, this does not work anymore, simply because the chance that we only flip 1-bits to zero is too small. Therefore, in this section, we resort to a sampling approach that, via strong concentration results, learns the expected fitness of the sampled offspring of x, and from this the value of the parent x. This leads to a unary unbiased black-box complexity of for all jump functions with .

5.2.1  Proof Outline and Methods

Since we aim at an asymptotic statement, let us assume that n is sufficiently large and even. Also, since we do not elaborate on the influence of the constant , we may assume (by replacing by a minimally smaller value) that is such that is even.

A first idea to optimize with could be to flip each bit of the parent x with probability . Such an offspring u has an expected fitness of . If is constant, then by Chernoff bounds samples u are enough to ensure that the average observed fitness satisfies with probability , c an arbitrary constant. This is enough to build a unary unbiased algorithm using fitness evaluations.

We improve this first approach via two ideas. The more important one is to not flip an expected number of bits independently, but to flip exactly that many bits (randomly chosen). By this, we avoid adding extra variation via the mutation operator. This pays off when x already has many 1s—if , then we observe that only samples suffice to estimate the value of x precisely (allowing a failure probability of as before).

The price for not flipping bits independently (but flipping a fixed number of bits) is that we have to deal with hypergeometric distributions, and when sampling repeatedly, with sums of these. The convenient way of handling such sums is to rewrite them as sums of negatively correlated random variables and then to argue that Chernoff bounds also hold for these. This was stated explicitly by Doerr (2011) for multiplicative Chernoff bounds, but not for additive ones. Since for our purposes an additive Chernoff bound is more convenient, we extract such a bound from the original paper Panconesi and Srinivasan (1997).

The second improvement stems from allowing a larger failure probability. This occasionally leads to wrong estimates of and consequently to wrong decisions on whether to accept x or not, but as long as this does not happen too often, we still expect to make progress toward the optimum. To analyze this, we formulate the progress of the distance to the optimum as random walk and use the gambler’s ruin theorem to show that the expected number of visits to each state is constant.

5.2.2  Estimating the Distance to the Optimum

We start with some preliminary considerations that might be helpful for similar problems as well. Let . Let be its Hamming distance from the all-1s string. Fix some enumeration of the zero-bits of x. Let u be an offspring of x obtained from flipping exactly bits. For , define a -valued random variable Xi by if and only if the ith zero of x is flipped in u. We first argue that
formula
1
Suppose first x has exactly many 1-bits, so the number of bits flipped equals the number of 0-bits. Any bit that was 0 in x increases the OneMax value by 1 when flipped, while not flipping such a bit means that instead some 1-bit is flipped, leading to a decrease in the OneMax value; this gives (1). If x has kmore 1-bits than , then certainly k 1-bits have to flip, and for the 0-bits the same argument as in the previous case holds, leading to (1). If x has kfewer 1-bits than , then certainly k 0-bits cannot flip; counting them with a value of in (1) needs to be offset by adding k, leading again to (1).

By construction, and . Consequently, , which is in for all a.

Let . Note that the Xi are not independent. However, they are negatively correlated and for this reason still satisfy the usual Chernoff bounds. This was made precise by Doerr (2011, Theorems 1.16, 1.17) but only giving multiplicative Chernoff bounds (Doerr, 2011, Theorem 1.9). Since for our purposes an additive Chernoff bound (sometimes called Hoeffding bound) is more suitable, we look at the paper by Panconesi and Srinivasan (1997). There, Theorem 3.2 applied with correlations parameter and the simply being independent copies of the Xi together with Equation (2) give the first part of the following lemma. By setting the random variables , the second claim of the lemma follows from the first.

Lemma 8:

Let be binary random variables. Let .

  1. Assume that for all , we have . Then .

  2. Assume that for all , we have . Then .

Note that our -valued Xi are derived from a hypergeometric distribution (which leads to random variables fulfilling the assumptions of both parts of the lemma) via a simple affine transformation. Consequently, the following corollary directly implied by the lemma applies to these Xi.

Corollary 9:

Let be -valued random variables. Assume that for all and both , we have . Let . Then .

From this, we observe that . In particular,
formula
2
Independent copies of sets of negatively correlated random variables again are negatively correlated. Let Y be the sum of T independent copies of X. Then the corollary again yields
formula

Similarly, . We summarize these findings in the following lemma.

Lemma 10:

Let . Let be obtained independently from x, each by flipping exactly random bits. Let and . The probability that does not equal is at most . The probability that is not in is at most .

Proof:

By construction, has the same distribution as Y above. The probability that deviates from its expectation , or equivalently, that deviates from its expectation a, by at least an additive term of , is at most .

Building on the previous analysis, we now easily derive an estimator for a value not revealed by a jump function. It overcomes the possible problem of sampling an offspring with fitness zero by restarting the procedure using the command .

We analyze the function for the case that is positive, which is the only situation in which we use this function in the following.

Corollary 11:

The function given in Algorithm 4 takes as inputs a search point x and an integer T; if is positive, then the algorithm terminates using an expected number of at most q fitness evaluations, returning an integer .

formula

Assume that and . Then the expected number of fitness evaluations is . Let denote the unknown Hamming distance of x to the optimum. The probabilities for the events and are at most and , respectively. If , these probabilities become and .

Proof:

A run of in which the statement is not executed uses exactly T fitness evaluations. The probability that one execution of the for-loop leads to the execution of the statement is at most by a simple union bound argument and (2). If this number is less than 1, then an expected total number of times the for-loop is started, given an expected total number of at most fitness evaluations.

When and , the probability for a restart is , the expected number of fitness evaluations becomes . Consequently, conditioning on none of the ui in Lemma 10 having a value outside changes the probabilities computed there by at most an additive term.

The main argument of how such an estimator for the number of 0s in a bit string can be used to derive a good black-box algorithm is reused in a later section in Theorem 20. Thus, we make the following definition.

Definition 12:

Let f be a pseudo-Boolean function, and let p be a function that maps non-negative integers to non-negative integers. Let g be an algorithm that takes as input a bit string x and a natural number and uses unary unbiased queries to f. We call g a p-estimator using f if, for all bit strings x, , and for all we have

  • ;

  • .

Lemma 13:

Let f be a pseudo-Boolean function such that, for some p, there is a p-estimator using f. Then the unary unbiased black-box complexity of f is .

Proof:

Let us consider a run of Algorithm 5.

formula

We first assume that all calls of the p estimator g return a value that lies in times the a-value of the first argument.

During a run of the algorithm, performs a biased random walk on the state space . The walk ends when the state 0 is reached. From the definition of a p-estimator we derive the following bounds on the transition probabilities. From state a, with probability at least , we move to state . This is the probability that a 1-bit flip reduces the Hamming distance to the optimum times a lower bound on the probability that we correctly identify both the resulting a-value and the a-value of our current solution x. With probability at most , we move from state a to state . If we neither move to or , we stay in a. Observe that when conditioning on not staying in a, then with probability at least we go to and with probability at most , we go to (these are coarse estimates).

Our first goal is to show that the expected number of different visits to each particular state a is constant. To this aim, we may regard a speedy version of the random walk ignoring transitions from a to itself. In other words, we may condition on actually moving in each step to a different state. For any state , let ei denote the expected number of visits to a starting the walk from i. To be more precise, we count the number of times we leave state a in the walk starting at i. We easily observe the following. For , we have , simply because we know that the walk at some time will reach a (because 0 is the only absorbing state). Hence we can split the walk started in i into two parts, one from the start until the first visit to a (this contributes zero to ei) and the other from the first visit to a until reaching the one absorbing state (this contributes ea to ei). For , we have , where qi denotes the probability that a walk started in i visits 0 prior to a. This implies that . Consequently, for the state a itself, we may use the pessimistic estimates on the transition probabilities and derive . To prove that , it thus suffices to show that is bounded from below by an absolute constant greater than zero. This follows easily from the gambler’s ruin theorem (see, e.g., Jansen, 2013, Theorem A.4). Consider a game in which the first player starts with dollars, the second with s = 1 dollar. In each round of the game, the first player wins with probability , the second with probability . The winner of the round receives one dollar from the other player. The games ends when one player has no money left. In this game, the probability that the second player runs out of money before the first, is exactly
formula
which in our game is . Using the pessimistic estimates of the transition probabilities, we hence see that as desired.

We have just shown that the expected number of times the algorithm has to leave a state “” is constant. Since the probability of leaving this state in one iteration of the while-loop is at least , and by the definition of an estimator one iteration takes an expected number of fitness evaluations, we see that the expected total number of fitness evaluations spent in state “” is , with all constants hidden in the O-notation being absolute constants independent of a and n. Consequently, the expected total number of fitness evaluations in one run of the algorithm is at most .

So far we assumed that all g(y,T) calls return a value that is in , and conditional on this, proved an expected optimization time of . By the definition of an estimator, the probability that we receive a value outside this interval is . Consequently, the probability that this happens within the first fitness evaluations is at most . A simple Markov bound shows that Algorithm 5 after fitness evaluations (assuming the implicit constant high enough), with probability at least , has found the optimum. Hence Remark 1 proves the claim.

Theorem 14:

Let . The unary unbiased black-box complexity of is .

Proof:

By Corollary 11, is a p-estimator for with a sufficiently large constant. To see this, note that is increasing for all . Consequently, Lemma 13 shows the claim.

6  Extreme Jump Functions

In this section, we regard the most extreme case of jump functions where all search points have fitness zero, except for the optimum and search points having exactly 1s. Surprisingly, despite some additional difficulties, we still find polynomial time black-box algorithms.

Throughout this section, let n be even (we comment on the case of odd n in Section 6.5). We call a jump function an extreme jump function if . Consequently, this function is zero except for the optimum (where it has the value n) and for bit strings having 1s (where it has the value ).

The information-theoretic argument of Droste et al. (2006) immediately gives a lower bound of for the unbiased black-box complexities of extreme jump functions. The intuitive argument is that an unrestricted black-box algorithm needs to learn n bits of information but receives only a constant amount of information per query.

Lemma 15:

For all arities k, the k-ary unbiased black-box complexity of an extreme jump function is .

Proof:

Since an extreme jump function takes only three values, Theorem 2 in Droste et al. (2006) gives a lower bound of for the unrestricted black-box complexity of the set of all extreme jump functions. The latter is a lower bound for the unbiased black-box complexity of a single extreme jump function (see end of Section 3).

6.1  Upper Bounds on Extreme Jump Functions

In the following three sections, we derive several upper bounds for the black-box complexities of extreme jump functions.

For an extreme jump function, we cannot distinguish between having OneMax values of and until we have encountered the optimum or its inverse. More precisely, let be a finite sequence of search points not containing the all-1s and all-0s strings. Define to be the inverse of for all i. Then both these sequences of search points yield exactly the same fitness values. Hence the only way we could find out on which side of the symmetry point we are would be by querying a search point having no or n 1s. However, if we know such a search point, we are done anyway.

Despite these difficulties, we develop a linear time ternary unbiased black-box algorithm. Then we show that restricting ourselves to binary variation operators at most increases the black-box complexity to . For unary operators, the good news is that polynomial time optimization of extreme jump functions is still possible, though the best complexity we find is only .

To ease the language, let us denote by a symmetricized version of taking into account this difficulty. Also, let us define the sign of x to be if , , if , and if . In other words, is the sign of .

6.2  Ternary Unbiased Optimization of Extreme Jump Functions

When ternary operators are allowed, we easily obtain an unbiased black-box complexity of , which is best possible by Lemma 1515. The reason for this fast optimization progress is that we may test individual bits. Assume that we have a search point u with value . If we flip a certain bit in u, then from the fitness of this offspring, we learn the value of this bit. If the new fitness is , then the value is as well, and the bit originally had the value 1. If the new fitness is zero, then the new value is , and the original bit was set to zero. We thus can learn all bit values and flip those bits that do not have the desired value.

One difficulty to overcome is that we never have a search point where we know that its value is . We overcome this by generating a search point with fitness and flipping a single bit. This yields a search point with value either or . Implementing this strategy in a sufficiently symmetric way, we end up with a search point having value either n or 0 and in the latter case output its complement.

Theorem 16:

For , the k-ary unbiased black-box complexity of extreme jump functions is .

Proof:

We show that Algorithm 6 optimizes any extreme jump function with black-box queries using only operators of arity at most 3. This algorithm uses the three operators , and , introduced in the proof of Theorem 6, as well as the following unbiased operator.

  • : Given a bit string x, flips all bits in x. This is a deterministic operator.

formula

Note that a uniformly sampled bit string has exactly 1s with probability . Consequently, the expected total number of queries is at most .

Let us now analyze the correctness of our algorithm. Let x be the initial bit string with fitness different from 0. This is either the optimal bit string, in which case nothing is left to be done, or a bit string with fitness . In the latter case, consider the first for-loop. For each , has a Hamming distance of i to x; in fact, the sequence is a path in the hypercube flipping each bit exactly once. Thus, for each i, differs from x in exactly one position, and for each position there is exactly one i such that differs from x in that position. We can thus use the to address the individual bits, and we will call the bit where x and differ the ith bit.

We use now as a baseline and check which other bits in x contribute to the OneMax value of x in the same way (both are 0 or both are 1), as follows. The bit string is obtained from x by flipping the first and ith bit. Thus, the fitness of is 0 if and only if the first and the ith bit contribute to the OneMax value of x in the same way; otherwise it is .

Thus, b is the bit string with either all bits set to 0 or all bits set to 1. This means that we are either done after the last loop or after taking the complement of b.

6.3  Binary Unbiased Optimization of Extreme Jump Functions

In this section, we prove that the unbiased 2-ary black-box complexity of extreme jump functions is . With 2-ary operators only, it seems impossible to implement the strategy used in the previous section, which relies on being able to copy particular bit values into the best-so-far solution.

To overcome this difficulty, we follow a hill-climbing approach. We first find a search point m with d-value 0 by repeated sampling. We copy this into our current-best search point x and try to improve x to a new search point by flipping a random bit in which x and m are equal (this needs a 2-ary operation), hoping to gain a search point with d-value equal to . The main difficulty is to estimate the d-value of , which is necessary to decide whether we keep this solution as new current-best or whether we try again.

Using binary operators, we can exploit the fact that . For example, we can flip of the bits in which and m differ. If this yields an individual with fitness , then clearly has not the targeted d-value of . Unfortunately, we detect this shortcoming only when the bit that marks the difference of x and is not among the bits flipped. This happens only with probability . Consequently, this approach may take iterations to decide between the cases and .

We can reduce this time to logarithmic using the following trick. Recall that the main reason for the slow decision procedure is that the probability of not flipping the newly created bit is so small. This is due to the fact that the only way to gain information about is by flipping almost all bits so as to possibly reach a fitness of . We overcome this difficulty by in parallel keeping a second search point y that has the same d-value as x but is on the other side of m. To ease the language in this overview, let us assume that . Let and . Then we aim at keeping a y such that , , , and . With this at hand, we can easily evaluate the d-value of . Assume that was created by flipping exactly one of the bits in which x and y agree. Let u be created by flipping in exactly of the bits in which and y differ. If , then surely , and thus . If , then with probability the bit in which and x differ is not flipped, leading to , visible from a fitness equal to . Hence, with probability at least , we detect the undesired outcome . Unfortunately, there is no comparably simple certificate for , so we have to repeat the previous test times to be sufficiently sure (in the case no failure is detected) that . Overall, this leads to an almost linear complexity of .

To make this idea precise, let us call a pair of search points opposing if they have opposite signs, that is, if . We call an opposing k-pair for some integer k if x and y are opposing, , and . Clearly, in this case, one of x and y has a value of , while the other has one of . To further ease the language, let us call bits having the value 1 good, those having the value 0 bad. Then the definition of an opposing k-pair implies that in one of x or y all the bit positions in which x and y differ are good, whereas in the other, they are all bad. The remaining positions contain the same number of good and bad bits.

There are some additional technicalities to overcome. For example, since we cannot decide the sign of a search point, it is nontrivial to generate a first opposing pair. Our solution is to generate different x and y in Hamming distance 1 from m and create offspring of x and y via a mixing crossover that inherits from x exactly one of the two bits in which x and y differ, and the other from y. If x and y are opposing, then this offspring with probability 1 has a fitness of . Otherwise, it has a fitness of only with probability . Consequently, a polylogarithmic number of such tests with sufficiently high probability distinguishes the two cases.

Before giving the precise algorithms, let us define the operators used. We use the operator , introduced in the proof of Theorem 6, as well as the following operators.

  • . For two search points x and y and an integer k, the operator generates a search point by randomly flipping k bits in x among those bits where x and y disagree. If x and y disagree in fewer than k bits, a random bit string is returned.

  • . If x and y disagree in exactly two bits, then a bit string is returned that inherits exactly one of these bits from x and one from y and that is equal to both x and y in all other bit positions. If x and y do not disagree in exactly two bits, a random bit string is returned.

We are now in a position to formally state the algorithms. We start with the key routine (Algorithm 7), which, from an opposing k-pair computes a Hamming neighbor of x with . Applying this function to both and , we obtain an opposing -pair in the main algorithm (Algorithm 8).

formula
formula
Lemma 17:

Let . The function is binary unbiased. Assume that it is called with an opposing k-pair . Let X be a geometrically distributed random variable with success probability and denote by T the random variable counting the number of fitness evaluations in one run of . Then the following holds.

  1. T is stochastically dominated by , also if we condition on the output satisfying .

  2. With probability at least , the output satisfies , , and .

Proof:

Since the operators and are 2-ary and unbiased, is a 2-ary unbiased algorithm. Also note that any generated in line 3 necessarily has and , the latter because and is obtained from x by flipping a bit in which x and y agree. Hence the main challenge is to distinguish between the two cases that and .

We first argue that the inner while-loop terminates surely with status = success if , and with probability terminates with status = failure if . Assume first that . Since and y then are opposing and , the bits and y differ in are all good bits in (and thus bad bits in y) or vice versa. Consequently, flipping any of them in surely reduces its d-value to 2, leading to . Hence “status = success” is never changed to “status = failure.” Assume now that . Then and y differ in good bits and one bad bit, or in bad bits and one good bit. Therefore, with probability , all bits flipped in the creation of u are of the same type. In this case, the values of u and differ by , implying . Thus, a single iteration of the while-loop sets the status variable to failure with probability at least , as desired when . The probability that this does not happen in one of the up to iterations is at most .

We now analyze the statement in line 3. If is an opposing k-pair, then exactly half of the bits in which x and y agree are good and the other half are bad. In either case, flipping one of the agreeing bits in x has a chance of exactly of increasing the d-value (and the same chance of of decreasing it). By what we proved about the inner while-loop, we see that the random variable describing the number of iterations of the outer while-loop is stochastically dominated by a geometric random variable with success probability (it would be equal to such a geometric random variable if the inner while-loop would not with small probability accept a failure as success). Since each execution of the inner while-loop leads to at most fitness evaluations, this proves the first part of (1). If we condition on the output satisfying , then indeed the inner while-loop does not misclassify an . Consequently, the number of iterations in the outer while-loop has distribution X, and again T is dominated by .

For the failure probability estimate, we use the following blunt estimate. Since the statement with probability produces an that we view as success (and that will become the output of the function finally), with probability at least there will be at most iterations of the outer while-loop each generating a failure-. Each of them has a chance of at most of being misclassified as success. Consequently, the probability that returns a failure- is at most .

Theorem 18:

Algorithm 8 is a 2-ary unbiased black-box algorithm for extreme jump functions. It finds the optimum of an unknown extreme jump function with probability within fitness evaluations.

Proof:

As in the previous section, line 1 of Algorithm 8 found a search point m having fitness after fitness evaluations with probability .

Lines 2–11 are devoted to generating an opposing 1-pair . Since m has exactly good and bad bits, x has a value of or , each with probability exactly . Independent of this outcome, y in line 7 has a chance of of having the opposite value, which means that is an opposing 1-pair.

Observe that if x and y are opposing, then with probability 1 has a fitness of , simply because both possible outcomes of have this fitness. If x and y are not opposing, then x and y have both one good bit more than m or both have one more bad bit. Consequently, the one outcome of that is different from m has a d-value of 2, visible from a fitness different from . We conclude that the inner while-loop using at most fitness evaluations surely ends with status = success if is an opposing 1-pair, and with probability ends with status = failure if not. A coarse estimate thus shows that after fitness evaluations spent in lines 2–11, which involves at least executions of the outer while-loop, with probability we exit the outer while-loop with an opposing 1-pair .

We now argue that if is an opposing k-pair right before line 13 is executed, then with probability at least , the new created in line 15 is an opposing -pair. This follows from twice applying Lemma 17 and noting that the condition in the statement of Lemma 17 implies that after executing line 15. Consequently, a simple induction shows that with probability at least , the pair when leaving the for-loop is an opposing 1-pair, and thus exactly one of x and y is the optimum.

For the runtime statement, let us again assume that all opposing k-pairs are indeed created as discussed. We already argued that up to before line 12, with probability , we spent at most fitness evaluations. We then perform calls to functions, each leading to a number of fitness evaluations that are, independent from what happened in the other calls, dominated by times a geometric random variable with success probability . By Lemma 1.20 of Doerr (2011), we may assume in the following runtime estimate that the number of fitness evaluations in each such call is indeed times such an (independent) geometric random variable. By Theorem 1.14 of Doerr (2011), the probability that a sum of such geometric random variables deviates from its expectation by more than a factor of 2 is . Consequently, with high probability the total number of fitness evaluations is .

Note that by using different constants in the