## Abstract

We extend previous results concerning black box search algorithms, presenting new theoretical tools related to no free lunch (NFL) where functions are restricted to some benchmark (that need not be permutation closed), algorithms are restricted to some collection (that need not be permutation closed) or limited to some number of steps, or the performance measure is given. Minimax distinctions are considered from a geometric perspective, and basic results on performance matching are also presented.

## 1. Introduction

Black box search algorithms are applied to search and optimization problems without exploiting a priori information about the objective function (such algorithms acquire information about the objective function by evaluating it). Evolutionary algorithms, and other nature-inspired search algorithms, are frequently implemented as black box algorithms, and have been regarded as robust optimization techniques (Holland, 1975; Goldberg, 1989; Mitchell, 1998; R. Haupt and S. Haupt, 2004). In this paper, we refer to nonrepeating/nonrevisiting black box search algorithms simply as algorithms.

The original no free lunch (NFL) theorems were expressed within a probabilistic framework and were used to investigate whether black box algorithms could differ in robustness (Wolpert and Macready, 1995). Sir Francis Bacon observed: “Men suppose that their reason has command over their words; still it happens that words in return exercise authority on reason” (Allibone, 1879). That opinion is apropos to NFL; it has been demonstrated that probability is inadequate to confirm unconstrained NFL results in the general case (Auger and Teytaud, 2007, 2010). In that sense, probability is an unfortunate historical artifact. With one exception, this paper abandons probability, preferring a set-theoretic framework which obviates measure-theoretic limitations by dispensing with probability altogether (Schumacher, 2000; Rowe et al., 2009).^{1} That exception is the analysis of randomized algorithms.

The original NFL theorems imply that, if an algorithm's performance were plotted against all objective functions (over a fixed but arbitrary finite domain and codomain), the resulting performance graph would simply be a permutation of the performance graph of any other algorithm. In that sense, no algorithm—including classical genetic algorithms—can be more robust than any other. Results for infinite domains or codomains have a similar nature but are of a more technical character, see Rowe et al. (2009). NFL-like results can be found in diverse areas, including machine learning (Domingos, 1998; Wolpert, 2001), induction and combinatorial problems (Woodward and Neil, 2003), multi-objective optimization (Corne and Knowles, 2003), Boolean functions (Griffiths and Orponen, 2005), discrete Laplace operators (Wardetzky et al., 2007), and others.

Strong preconditions in the original NFL theorems constrained their application. Droste et al. (1999) observed that the likelihood of encountering optimization problems is nonuniform; if average performance is to be a meaningful statistic to summarize aggregate optimization behavior, then it should be a weighted average. Therefore, NFL—which assumed to be a uniform distribution^{2}—was inappropriate for that purpose.

Later NFL variants were established which assumed that the set of functions, the distribution of functions, or the relationship between those functions and algorithms considered had special properties: permutation closure in the case of the Sharpened NFL (Schumacher, 2000), specialized distributions in the case of the nonuniform NFL^{3} (Streeter, 2003; Igel and Toussaint, 2004), and focused sets in the case of focused NFL (Whitley and Rowe, 2008).

It has been observed that permutation closure is an assumption rarely satisfied (Igel and Toussaint, 2001; Streeter, 2003), which constrains application of sharpened NFL and nonuniform NFL. On the one hand, such observations do not invalidate NFL results, but caution against their misinterpretation or misapplication. On the other hand, Droste et al. (2002) provide counterpoint to the idea that robust search flourishes where NFL does not by establishing an almost NFL theorem (the success of search strategies on functions is paid for with bad behavior on many other functions of similar complexity). Whatever implications that NFL-like theorems have for search, the mathematical properties concerning algorithms which such theorems reveal need not be conflated with applications. The position that a mathematical formalization and investigation of NFL-like properties may be conducted unencumbered by applications has precedent (Rowe et al., 2009), and that is the outlook taken by this paper.

Previous introductory remarks roughly sketch the context for the contributions made by this paper. Given widespread use of benchmarks to assess performance, it is natural to consider NFL from a benchmark perspective. Nevertheless, little as of yet has been said concerning focused sets. Whitley and Rowe (2008) study focused sets—benchmarks compatible with NFL results for given algorithms limited to steps—but do not completely characterize them. In this paper, results concerning focused sets are simplified and extended, and the sharpened NFL emerges as a special case. Rowe and Vose (2011) obtain results concerning the use of arbitrary benchmarks to evaluate randomized algorithms. The underpinnings of their analysis—which their paper used but did not establish—are presented in this paper. We show a sharpened NFL is a few steps from their results, and then generalize to NFL results specific to particular performance measures. Performance matching and minimax distinctions (Wolpert and Macready, 1997) are also briefly considered.

## 2. Theoretical Background

*y*denote

_{i}*f*(

*x*). The sets and , are arbitrary but fixed, while the function

_{i}*f*may vary. A trace

*T*corresponding to

*f*is a sequence of elements from

*f*(a sequence is a function mapping

*i*to

*s*), where the

_{i}*x*components are unique. To subscript (the name of) a trace

*T*with an integer designates it to have elements, A performance vector is a sequence of values from . We use the following notation: The performance vector associated with

*T*is

*T*. The range of any sequence

_{y}*S*is denoted by . In particular, . Trace

*T*is total if (equivalently, ). A trace that is not total is said to be partial. Let be the set of all partial traces corresponding to

*f*, and let the set of all partial traces be A search operator is a function such that . A nonrepeating/nonrevisiting deterministic black box search algorithm corresponds to a search operator

*g*, and will be referred to simply as an algorithm. Algorithm applied to function

*f*is denoted by , and maps traces to traces where | denotes concatenation. In procedural terms, algorithm runs on function

*f*by beginning with the empty trace , and repeatedly applying ; we denote by the total trace produced (by running on

*f*to convergence). Note that . Algorithms and are regarded as equal if and only if for all

*f*.

A trend in NFL theory is the progression from permutations of the search space (Radcliffe and Surry, 1995) to group actions on functions and algorithms. This can be seen at least as far back as Schumacher (2000). We will use the notation to refer to the set of permutations (bijections) on . Given a permutation , the permutation of *f* by is the function defined by . Thus permutations of may also be considered as permutations via . Such are therefore regarded as elements of .

A key observation dating back at least as far as Radcliffe and Surry (1995) is the Uniqueness Theorem.

*If , then , i.e., is injective. Hence, the map is invertible for every algorithm *.

Although the function does not necessarily map onto the set of all traces, the Uniqueness Theorem together with the Completeness Theorem (below) imply is a bijection from to (the Completeness Theorem is actually a corollary of Theorem 1, because the finite domain and codomain of have the same size).

*The map is surjective*.

The map is abbreviated by , its inverse is denoted by , and juxtaposition denotes composition of such maps. Thus (composing with yields a bijection on ) and in particular is a member of the group of permutations of under composition.

*x*components of

*T*by applying to each of them while leaving the

*y*components unchanged. The permutation of by is the algorithm corresponding to search operator defined by , where

*g*is the search operator of . It follows that for all . The relationship between the permutation of an algorithm and the permutation of a function was given by Schumacher (2000) as the Duality Theorem.

Projecting traces in the displayed equality above to their *y* components yields the following Corollary:

Thus, the set of performance vectors is independent of the algorithm which generated it (iff is permutation closed) and any performance measure of that set of performance vectors is likewise independent of the algorithm.

## 3. Focused No Free Lunch

Focused NFL (Whitley and Rowe, 2008) concerns benchmarks. The basic question is: if attention is restricted to a subset of algorithms, then for what benchmarks (sets of functions) must the algorithms have equal performance? Benchmark is focused with respect to a set of algorithms iff is independent of . In other words, algorithms in have the same set of performance vectors over the benchmark. In particular, every performance measure—which therefore is a function of the same set of performance vectors—must necessarily yield the same result over the benchmark, for each algorithm in . It is possible, however, that algorithms in may exhibit the same performance on different functions of the benchmark. Note that sharpened NFL can be rephrased as follows:

*Benchmark is focused wrt the set of all algorithms iff it is permutation closed*.

The main result in Whitley and Rowe (2008) is Lemma 1, which asserts:

*Let be a set of algorithms and let . is the smallest benchmark containing f which is focused wrt *.

Theorem 6 connects with the concept of closure under permutation; is a group of permutations with respect to which is closed. The implication in Whitley and Rowe's result (Theorem 6 above) can be strengthened to an iff as follows.

*Benchmark is focused wrt a set of algorithms iff it is closed wrt *.

It should be noted that the focused NFL demonstrated above (FNFL) can be obtained as a corollary of Whitley and Rowe's Lemma 1 (remarks following their Lemma 1 intimate how). Their result is strengthened in the sense that whereas they presented an implication, FNFL is stated as an iff. Moreover, a significant contribution of FNFL to NFL theory is clarity and simplicity in both statement and proof.

### 3.1. Algorithms Limited to Steps

To further generalize NFL, Whitley and Rowe (2008) consider algorithms restricted to steps. Their results are expressed in terms of a pseudocoded computer algorithm. We simplify, extend, and express those results from a set-theoretic point of view.

*f*for steps is . Extend to multisets

^{4}by A benchmark is focused wrt a set of algorithms and integer iff the multiset is independent of . Of particular relevance is the set of permutations defined for by

Suppose one were interested in constructing a benchmark containing *f* which is focused wrt and . The following corollary provides one answer.

Corollary 2 generalizes the corresponding algorithmic construction presented in Whitley and Rowe (2008); whereas they restrict attention to path-search algorithms—those for which is independent of *f*—there is no such restriction in Corollary 2. Theorem -step focused NFL (FNFL) has no counterpart in Whitley and Rowe (2008), but it reflects (in a sense clarified by the following Corollary and Theorem) a cyclic aspect of focused benchmarks which, if not formalized, was certainly alluded to in their paper.

Observe that the benchmarks of Corollary 2 are cyclic. In fact, every focused benchmark can be decomposed into a disjoint union of cyclic benchmarks.

*f*

_{1}=

*f*

_{0}, then . Otherwise, continue to choose (by appealing to the invariant) such that until . At that point, there exists

*h*such that Let be the collection of functions on the left-hand sides displayed above. Reindexing according to shows to be cyclic. Moreover, the invariant is preserved because removing from the (previous) index set () has the effect of removing identical objects (the ) from both sides of the invariant.

Combining previous results yields the following decomposition theorem.

## 4. Randomized Algorithms and Benchmarks

*f*is with probability . In procedural terms, randomized algorithm runs on

*f*by choosing with probability and then applying algorithm to

*f*. Note that the collection of randomized algorithms contains the set of (deterministic) algorithms. Randomized algorithms are identified with elements of the simplex defined by Given and , the randomized algorithm is defined by Procedurally, running on

*f*amounts to choosing with probability and then running on

*f*. Random algorithms and are equivalent, denoted by , iff for all and every trace

*T*,

*m*is extended to randomized algorithms in the natural way; the performance of on

*f*as measured by

*m*is Note that

*m*is polymorphic: measure

*m*maps performance vectors to values, whereas the performance of on

*f*as measured by

*m*is the corresponding expected value.

*m*is indicated by subscript,

*is linear in its first argument*.

### 4.1. Benchmark Symmetries

^{5}is (the usual conventions are employed: and ). An immediate consequence of the Expected Duality Theorem (above) is Theorem 3 of Rowe and Vose (2011):

The above is a FNFL-type result: *Every randomized algorithm in has the same expected average performance over benchmark*.

The support of random algorithm is the set of algorithms for which . A deterministic algorithm is a randomized algorithm whose support is a singleton set; such support is called trivial. A nontrivial randomized algorithm has nontrivial support. A deterministic algorithm has a unique nonzero component ; to streamline notation, may be used to denote such . Measure *m* is constant wrt benchmark and support set iff for all .

*m*is not constant wrt and ). Since is an interior point of with respect to the subspace topology induced from (because is the support of ), there exists some open set

*B*such that . Since linear is not constant on

*B*(it otherwise would be constant on ), it attains values larger and smaller than on

*B*(Roberts and Varberg, 1973).

*expression*is true, and 0 otherwise.

^{6}Note that

*m*is nonconstant wrt and , since the following depends on choice of ,

### 4.2. Performance Measures

*m*as well as the set of test functions , and an FNLF-type result emerges which likewise depends upon

*m*. Given benchmark , the collection of benchmark invariants

^{7}is

*is a group (under composition) for every measure m and benchmark . For every randomized algorithm , the randomized algorithms in all have the same expected average performance over*.

Theorem 11 is a FNLF-type result as follows. Let be for some and some measure *m*. For every randomized algorithm , the -step expected average performance—as measured by *m*—of the collection of algorithms is identical over benchmark . It should be appreciated that the benchmark invariants are completely independent of , no assumptions whatsoever have been made concerning the support of , and the choice of is arbitrary.

It is interesting to contrast Theorem 12 with the closing remarks of Section 3.1. Even with , it does not follow that is invariant under the action of ; it is not generally true that when . But the message of Theorem 12 is precisely that invariance is regained, at the price of trading equality, =, for similarity, , defined by having identical invariants; .

^{8}The following is a direct consequence of Theorem 11 and Linearity.

### 4.3. Matching Performance

*m*; Wolpert and Macready (1997) point out negative examples which they term minimax distinctions. This section analyzes the following predicate, which asserts that the particular measure

*m*admits no minimax distinctions. where . It turns out that this predicate makes an interesting geometric claim. Let denote the vector of ones, and let be the binary infix predicate asserting that its vector arguments are parallel.

*T*denotes transpose. Let where . Observe that hence . Conversely, given , define , , by and note that which implies . Since

**1**

^{T}

*w*=0, it follows that is Thus, both and belong to the simplex , and therefore represent randomized algorithms. Furthermore, . It follows that quantifying over and can be replaced with quantifying over in the sense that predicate Equation (1)—which according to the above is —is equivalent to Define the vector and the set of vectors

*S*(

*w*) by Using the notation above—and trading quantification over in Equation (4) for quantification over

*u*below—transforms Equation (4) into which asserts that for every , any is orthogonal to some . It follows that Equation (5) is equivalent to the assertion that for all Observe that

*co*- whereas

*co*- is either 1 or 2. Hence, the only way Equation (6) can be true—since the union is finite—is that

*co*- for some ; thus

*u*is parallel to . In other words, for some .

Consider in the proof above that were *co*- always 2, the complement of the union on the right-hand side of Equation (6) is dense and open in (subspaces of are closed). Since is a dense subfield of , that complement contains rational points. Therefore, the components of randomized algorithms can be restricted to without altering Theorem 13 or its proof (except to say ). If *m* is likewise required to take values in , the will also be rational (i.e., everything is computable).

Moreover, the domain over which benchmarks are quantified—for the definition of minmax distinctions in Equation (1) and the condition in Equation (2)—is intentionally unspecified. That domain may be chosen arbitrarily without altering Theorem 13 or its proof. In particular, the benchmarks and may be restricted to satisfy an arbitrarily chosen predicate (for instance, they may be restricted to singleton sets).

The next result is that, for every measure, the performance of any randomized algorithm on any benchmark can be matched in a nontrivial way.

*For every measure m, randomized algorithm , and benchmark , there exist and such that where or may be chosen arbitrarily*.

*m*, , and , there exists for which when is nontrivial (nontriviality is essential; when

*m*is injective and is minimal, the Uniqueness Theorem implies that is equivalent to ). Let Appealing to Linearity and Corollary 3, However, if for some randomized algorithm , then Proposition 6 implies . Alternatively, one could appeal to Theorem 11 with and . However, that may also fail. If , then where If benchmark contains only the identity function, and the performance measure is it then follows that , because The example above demonstrates that the following cannot be improved.

*Let . For every measure m, benchmark , and nontrivial randomized algorithm , there exist infinitely many such that *.

.

Then , because is trivial, whereas is not. Moreover, infinitely many alternatives to are provided by as varies.

That leaves the case . Since , there exists an algorithm such that .

Let . Since is a surrogate for —as far as performance is concerned—. Moreover, since they have different support. Infinitely many are provided by as varies.

Note that the components of randomized algorithms can be restricted to without altering Proposition 8 or its proof ( is necessarily rational if performance measures are required to take values in ).

## 5. Conclusion

Definitions permit and theorems address algorithms which are not Turing computable. It should be appreciated that general results which hold even for algorithms that need not be Turing computable necessarily specialize to algorithms that are. Readers who embrace transfinite induction will no doubt appreciate the probability-free treatment in Section 3 which is conducted within a framework amenable to generalization (Rowe et al., 2009).

As a concession to readers who desire computability, and were kept finite, and conclusions were stated as if randomized algorithms had components in (for instance, Propositions 4 and 8 could have claimed uncountable many if components in were permitted). Moreover, care was taken to point out the fact that Theorem 13 and Proposition 8 in no way require nonrational components.

In keeping with the outlook that the investigation of NFL-like properties need not be conflated with or encumbered by applications, this paper has focused on simplifying and extending theoretical results, and presenting new mathematical tools (Theorems, Corollaries, Lemmas, Propositions). Focused NFL (Theorem 7) simplifies and extends the previous account by Whitley and Rowe (2008), and FNFL (Theorem 8) establishes the analogue for algorithms restricted to steps. Corollary 2 and Theorem 9 formalize cyclic aspects of focused benchmarks noted in Whitley and Rowe (2008).

A trend in NFL theory is the progression from permutations of the search space to group actions on functions and algorithms; our results and methods have for the most part followed that path. A series of propositions sort out how permutations act on randomized algorithms, establish linearity of expected average performance, and lead up to Expected Duality (Theorem 10). The demonstration of NFL for randomized algorithms by Rowe and Vose (2011) is first reviewed, and then sharpened (Proposition 4). By placing attention on the particular measure used, benchmark symmetries are extended to benchmark invariants whose properties admit a FNFL-type interpretation (Theorem 11). The theoretical machinery of Rowe and Vose (2011) is also generalized.

The paper concludes with a geometric characterization for minimax distinctions (Wolpert and Macready, 1997) and basic results on performance matching.

## Acknowledgments

The authors would like to thank Suzanne Sadedin, Marte Ramírez, and anonymous referees for comments on this manuscript. Ideas leading to Theorem 13 were initially formulated while M. D. Vose was visiting The University of Birmingham; he is grateful for the gracious support provided by the School of Computer Science, and for valuable discussions with Jon E. Rowe and Alden H. Wright during that visit. This work was supported in part by NIH grant R01GM056693, and by a Howard Hughes Medical Institute Collaborative Innovation Award.

## References

## Notes

^{1}

In the original context where probabilistic language is adequate (finite domains and codomains), it adds no essential value; probabilistic jargon may be expanded into set-theoretic definitions and simplified away.

^{3}

An interesting—though widely unrecognized—fact is that nonuniform NFL can be obtained from sharpened NFL by a suitable choice of performance measure (Rowe et al., 2009).

^{4}

A multiset—delimited with doubled brackets —extends the set concept by allowing elements to appear more than once.

^{5}

Previous use of *G* is superseded by the notation defined here.

^{6}

Schumacher (2000) introduced this measure, but with different notation.

^{7}

Previous use of *G* is superseded by the notation defined here.

^{8}

Previous use of is superseded by the notation defined here.