## Abstract

We extend previous results concerning black box search algorithms, presenting new theoretical tools related to no free lunch (NFL) where functions are restricted to some benchmark (that need not be permutation closed), algorithms are restricted to some collection (that need not be permutation closed) or limited to some number of steps, or the performance measure is given. Minimax distinctions are considered from a geometric perspective, and basic results on performance matching are also presented.

## 1.  Introduction

Black box search algorithms are applied to search and optimization problems without exploiting a priori information about the objective function (such algorithms acquire information about the objective function by evaluating it). Evolutionary algorithms, and other nature-inspired search algorithms, are frequently implemented as black box algorithms, and have been regarded as robust optimization techniques (Holland, 1975; Goldberg, 1989; Mitchell, 1998; R. Haupt and S. Haupt, 2004). In this paper, we refer to nonrepeating/nonrevisiting black box search algorithms simply as algorithms.

The original no free lunch (NFL) theorems were expressed within a probabilistic framework and were used to investigate whether black box algorithms could differ in robustness (Wolpert and Macready, 1995). Sir Francis Bacon observed: “Men suppose that their reason has command over their words; still it happens that words in return exercise authority on reason” (Allibone, 1879). That opinion is apropos to NFL; it has been demonstrated that probability is inadequate to confirm unconstrained NFL results in the general case (Auger and Teytaud, 2007, 2010). In that sense, probability is an unfortunate historical artifact. With one exception, this paper abandons probability, preferring a set-theoretic framework which obviates measure-theoretic limitations by dispensing with probability altogether (Schumacher, 2000; Rowe et al., 2009).1 That exception is the analysis of randomized algorithms.

The original NFL theorems imply that, if an algorithm's performance were plotted against all objective functions (over a fixed but arbitrary finite domain and codomain), the resulting performance graph would simply be a permutation of the performance graph of any other algorithm. In that sense, no algorithm—including classical genetic algorithms—can be more robust than any other. Results for infinite domains or codomains have a similar nature but are of a more technical character, see Rowe et al. (2009). NFL-like results can be found in diverse areas, including machine learning (Domingos, 1998; Wolpert, 2001), induction and combinatorial problems (Woodward and Neil, 2003), multi-objective optimization (Corne and Knowles, 2003), Boolean functions (Griffiths and Orponen, 2005), discrete Laplace operators (Wardetzky et al., 2007), and others.

Strong preconditions in the original NFL theorems constrained their application. Droste et al. (1999) observed that the likelihood of encountering optimization problems is nonuniform; if average performance is to be a meaningful statistic to summarize aggregate optimization behavior, then it should be a weighted average. Therefore, NFL—which assumed to be a uniform distribution2—was inappropriate for that purpose.

Later NFL variants were established which assumed that the set of functions, the distribution of functions, or the relationship between those functions and algorithms considered had special properties: permutation closure in the case of the Sharpened NFL (Schumacher, 2000), specialized distributions in the case of the nonuniform NFL3 (Streeter, 2003; Igel and Toussaint, 2004), and focused sets in the case of focused NFL (Whitley and Rowe, 2008).

It has been observed that permutation closure is an assumption rarely satisfied (Igel and Toussaint, 2001; Streeter, 2003), which constrains application of sharpened NFL and nonuniform NFL. On the one hand, such observations do not invalidate NFL results, but caution against their misinterpretation or misapplication. On the other hand, Droste et al. (2002) provide counterpoint to the idea that robust search flourishes where NFL does not by establishing an almost NFL theorem (the success of search strategies on functions is paid for with bad behavior on many other functions of similar complexity). Whatever implications that NFL-like theorems have for search, the mathematical properties concerning algorithms which such theorems reveal need not be conflated with applications. The position that a mathematical formalization and investigation of NFL-like properties may be conducted unencumbered by applications has precedent (Rowe et al., 2009), and that is the outlook taken by this paper.

Previous introductory remarks roughly sketch the context for the contributions made by this paper. Given widespread use of benchmarks to assess performance, it is natural to consider NFL from a benchmark perspective. Nevertheless, little as of yet has been said concerning focused sets. Whitley and Rowe (2008) study focused sets—benchmarks compatible with NFL results for given algorithms limited to steps—but do not completely characterize them. In this paper, results concerning focused sets are simplified and extended, and the sharpened NFL emerges as a special case. Rowe and Vose (2011) obtain results concerning the use of arbitrary benchmarks to evaluate randomized algorithms. The underpinnings of their analysis—which their paper used but did not establish—are presented in this paper. We show a sharpened NFL is a few steps from their results, and then generalize to NFL results specific to particular performance measures. Performance matching and minimax distinctions (Wolpert and Macready, 1997) are also briefly considered.

## 2.  Theoretical Background

Following Schumacher (2000), let be a function between finite sets (the set of all such functions is denoted by ) and let yi denote f(xi). The sets and , are arbitrary but fixed, while the function f may vary. A trace T corresponding to f is a sequence of elements from f (a sequence is a function mapping i to si),
where the x components are unique. To subscript (the name of) a trace T with an integer designates it to have elements,
A performance vector is a sequence of values from . We use the following notation:
The performance vector associated with T is Ty. The range of any sequence S is denoted by . In particular, . Trace T is total if (equivalently, ). A trace that is not total is said to be partial. Let be the set of all partial traces corresponding to f, and let the set of all partial traces be
A search operator is a function such that . A nonrepeating/nonrevisiting deterministic black box search algorithm corresponds to a search operator g, and will be referred to simply as an algorithm. Algorithm applied to function f is denoted by , and maps traces to traces
where | denotes concatenation. In procedural terms, algorithm runs on function f by beginning with the empty trace , and repeatedly applying ; we denote by the total trace produced (by running on f to convergence). Note that . Algorithms and are regarded as equal if and only if for all f.

A trend in NFL theory is the progression from permutations of the search space (Radcliffe and Surry, 1995) to group actions on functions and algorithms. This can be seen at least as far back as Schumacher (2000). We will use the notation to refer to the set of permutations (bijections) on . Given a permutation , the permutation of f by is the function defined by . Thus permutations of may also be considered as permutations via . Such are therefore regarded as elements of .

A key observation dating back at least as far as Radcliffe and Surry (1995) is the Uniqueness Theorem.

Theorem 1 (Uniqueness):

If , then , i.e., is injective. Hence, the map is invertible for every algorithm .

Although the function does not necessarily map onto the set of all traces, the Uniqueness Theorem together with the Completeness Theorem (below) imply is a bijection from to (the Completeness Theorem is actually a corollary of Theorem 1, because the finite domain and codomain of have the same size).

Theorem 2 (Completeness):

The map is surjective.

The map is abbreviated by , its inverse is denoted by , and juxtaposition denotes composition of such maps. Thus (composing with yields a bijection on ) and in particular is a member of the group of permutations of under composition.

Given , define the corresponding function which maps traces to traces by ; hence operates on the x components of T by applying to each of them while leaving the y components unchanged. The permutation of by is the algorithm corresponding to search operator defined by , where g is the search operator of . It follows that
for all . The relationship between the permutation of an algorithm and the permutation of a function was given by Schumacher (2000) as the Duality Theorem.
Theorem 3 (Duality):
For any algorithm , permutation , and function ,

Projecting traces in the displayed equality above to their y components yields the following Corollary:

Corollary 1:
For any algorithm and permutation,
where on the left-hand side above, is regarded as an element of .
One contribution made by sharpened NFL (Schumacher, 2000) was to establish the “only if” of the implication in the extension of NFL to permutation closed sets (Radcliffe and Surry, 1995). A set of functions is closed with respect to a set of permutations iff
The set is permutation closed iff it is closed with respect to .
Theorem 4 (Sharpened NFL):
Let .
iff is permutation closed.

Thus, the set of performance vectors is independent of the algorithm which generated it (iff is permutation closed) and any performance measure of that set of performance vectors is likewise independent of the algorithm.

## 3.  Focused No Free Lunch

Focused NFL (Whitley and Rowe, 2008) concerns benchmarks. The basic question is: if attention is restricted to a subset of algorithms, then for what benchmarks (sets of functions) must the algorithms have equal performance? Benchmark is focused with respect to a set of algorithms iff is independent of . In other words, algorithms in have the same set of performance vectors over the benchmark. In particular, every performance measure—which therefore is a function of the same set of performance vectors—must necessarily yield the same result over the benchmark, for each algorithm in . It is possible, however, that algorithms in may exhibit the same performance on different functions of the benchmark. Note that sharpened NFL can be rephrased as follows:

Theorem 5:

Benchmark is focused wrt the set of all algorithms iff it is permutation closed.

Definition 1:
For any set of algorithms, let be the subgroup of generated by

The main result in Whitley and Rowe (2008) is Lemma 1, which asserts:

Theorem 6:

Let be a set of algorithms and let . is the smallest benchmark containing f which is focused wrt .

Theorem 6 connects with the concept of closure under permutation; is a group of permutations with respect to which is closed. The implication in Whitley and Rowe's result (Theorem 6 above) can be strengthened to an iff as follows.

Theorem 7 (Focused NFL):

Benchmark is focused wrt a set of algorithms iff it is closed wrt .

Proof:
If is focused wrt , then for ,
By the Uniqueness Theorem, the membership predicates of the first and last displayed sets above must be identical. In particular, the generators of permute ; hence, it is closed wrt . Conversely, if is closed wrt , then the first and last sets displayed above are identical, whereas the other equalities are trivial.

It should be noted that the focused NFL demonstrated above (FNFL) can be obtained as a corollary of Whitley and Rowe's Lemma 1 (remarks following their Lemma 1 intimate how). Their result is strengthened in the sense that whereas they presented an implication, FNFL is stated as an iff. Moreover, a significant contribution of FNFL to NFL theory is clarity and simplicity in both statement and proof.

### 3.1.  Algorithms Limited to Steps

To further generalize NFL, Whitley and Rowe (2008) consider algorithms restricted to steps. Their results are expressed in terms of a pseudocoded computer algorithm. We simplify, extend, and express those results from a set-theoretic point of view.

Let project sequences to their first elements,
and to streamline notation, abbreviate by . Thus, the performance vector associated with the trace generated by applying algorithm to function f for steps is . Extend to multisets4 by
A benchmark is focused wrt a set of algorithms and integer iff the multiset
is independent of . Of particular relevance is the set of permutations defined for by
Theorem 8 (-step Focused NFL):
The benchmark is focused wrt a collection of algorithms and integer iff for some and all ,
Moreover, if the above holds for some , then it holds for all .
Proof:
Suppose is focused wrt and . Then for all , and ,
Conversely, if for some and all , then the first and last sets displayed above are identical (for ), whereas the other equalities are trivial. Hence, is focused wrt and , and the first part of the proof (above) implies that for all and .

Suppose one were interested in constructing a benchmark containing f which is focused wrt and . The following corollary provides one answer.

Corollary 2:
Let and . Given algorithms , let . The benchmark
is focused wrt and .
Proof:
Note that where is the smallest positive integer for which . In particular,
Therefore, the condition of -step focused NFL is satisfied.

Corollary 2 generalizes the corresponding algorithmic construction presented in Whitley and Rowe (2008); whereas they restrict attention to path-search algorithms—those for which is independent of f—there is no such restriction in Corollary 2. Theorem -step focused NFL (FNFL) has no counterpart in Whitley and Rowe (2008), but it reflects (in a sense clarified by the following Corollary and Theorem) a cyclic aspect of focused benchmarks which, if not formalized, was certainly alluded to in their paper.

Given algorithm and integer , define the equivalence relation on by
Definition 2:
A benchmark is cyclic wrt algorithm , integer , and iff for some positive integer ,

Observe that the benchmarks of Corollary 2 are cyclic. In fact, every focused benchmark can be decomposed into a disjoint union of cyclic benchmarks.

Lemma 1:
Let be focused wrt and . For every and every ,
where the union above is disjoint, and each (which may depend on , , , and ) is cyclic wrt , , and .
Proof:
Let , and inductively construct
such that
as follows. Note that the equality above—the invariant—holds when (by FNFL), and the construction process must terminate provided the are nonempty (since is finite). Choose . The invariant guarantees the existence of such that
if f1=f0, then . Otherwise, continue to choose (by appealing to the invariant) such that
until . At that point, there exists h such that
Let be the collection of functions on the left-hand sides displayed above. Reindexing according to
shows to be cyclic. Moreover, the invariant is preserved because removing from the (previous) index set () has the effect of removing identical objects (the ) from both sides of the invariant.

Combining previous results yields the following decomposition theorem.

Theorem 9:
Benchmark is focused wrt a set of algorithms and iff for some and all ,
where the union above is disjoint, and each (which may depend on , , , and ) is cyclic wrt , , and .
Proof:
Lemma 1 establishes one direction (of the iff). The converse follows by appealing to FNFL; since the are cyclic, it follows that
and hence the equality above holds after dropping the subscript on (because is a disjoint union of the ).
This section concludes with a formal demonstration that sharpened NFL can be obtained as a special case of FNFL. Observe that when , the equality condition in FNFL reduces to
which is equivalent—via the Uniqueness Theorem—to the invariance of under the action of (because the equality condition is quantified over all ). It follows that FNFL reduces to focused NFL when , provided invariance under for all is equivalent to invariance under . That is indeed the case, since the set of generators of is the union over all of . Finally, when and is the set of all algorithms, focused NFL reduces to sharpened NFL—via Corollary 1—since invariance under implies invariance under the generator
for all (i.e., is permutation closed). A technicality remains: is it possible that is permutation closed, yet not closed under ? Evidently not, since
for some (the last implication is Lemma 2 in Rowe et al., 2009).

## 4.  Randomized Algorithms and Benchmarks

A randomized algorithm is identified with a probability vector indexed over the set of algorithms; component is the probability the randomized algorithm (described by) behaves like ; the total trace resulting from applying to f is with probability . In procedural terms, randomized algorithm runs on f by choosing with probability and then applying algorithm to f. Note that the collection of randomized algorithms contains the set of (deterministic) algorithms. Randomized algorithms are identified with elements of the simplex defined by
Given and , the randomized algorithm is defined by
Procedurally, running on f amounts to choosing with probability and then running on f. Random algorithms and are equivalent, denoted by , iff for all and every trace T,
Proposition 1:
For all permutations , , and randomized algorithms ,
Moreover, where is the identity permutation.
Proof:
Note that for every search operator g. Hence and therefore .
A performance measure is a function . So as to streamline exposition, a performance measure will simply be called a measure. Measure m is extended to randomized algorithms in the natural way; the performance of on f as measured by m is
Note that m is polymorphic: measure m maps performance vectors to values, whereas the performance of on f as measured by m is the corresponding expected value.
Proposition 2:
Given randomized algorithm , function f, measure m, and permutation ,
Proof:
Expanding definitions, appealing to Proposition 1 to justify reindexing below (the special case implies is invertible), and using Corollary 1,
The expected performance of randomized algorithm over benchmark is
and is referred to as expected average performance (one might argue it is more natural to call it average expected performance, but the average and expectation commute). Explicit dependence on the measure m is indicated by subscript,
Proposition 3 (Linearity):

is linear in its first argument.

Proof:
Theorem 10 (Expected Duality):
For every measure m, randomized algorithm , benchmark , and permutation ,
Proof:
Expanding definitions and appealing to Proposition 2,

The results in Section 4 up to this point are theoretical foundations of the analysis presented in Rowe and Vose (2011) which their paper used but did not establish.

### 4.1.  Benchmark Symmetries

Given benchmark , the group of benchmark symmetries5 is
(the usual conventions are employed: and ). An immediate consequence of the Expected Duality Theorem (above) is Theorem 3 of Rowe and Vose (2011):
For every measure m, randomized algorithm , and benchmark ,
for all .

The above is a FNFL-type result: Every randomized algorithm in has the same expected average performance over benchmark.

Let be the randomized algorithm
It follows via Linearity (Proposition 3) that
Moreover if is permutation closed, then
for all randomized algorithms , (Rowe and Vose, 2011), in which case
that is, every randomized algorithm has the same expected average performance. While not the focus of Rowe and Vose (2011), they present the demonstration above. Before generalizing to NFL-type results specific to particular measures (in the next section), we make observations which sharpen this NFL result.

The support of random algorithm is the set of algorithms for which . A deterministic algorithm is a randomized algorithm whose support is a singleton set; such support is called trivial. A nontrivial randomized algorithm has nontrivial support. A deterministic algorithm has a unique nonzero component ; to streamline notation, may be used to denote such . Measure m is constant wrt benchmark and support set iff for all .

Proposition 4:
For every measure m, randomized algorithm , and benchmark , if m is nonconstant wrt and , then there exist infinitely many and with support such that
If m is constant wrt and , then
Proof:
Consider the first case, where for all . If , then
In the second case, is linear (by Proposition 3) and nonconstant on (because m is not constant wrt and ). Since is an interior point of with respect to the subspace topology induced from (because is the support of ), there exists some open set B such that . Since linear is not constant on B (it otherwise would be constant on ), it attains values larger and smaller than on B (Roberts and Varberg, 1973).
If benchmark is not permutation closed, then Proposition 4 applies as follows. For some , there exists such that . Let have support containing for some algorithm . Define the measure where is 1 if expression is true, and 0 otherwise.6 Note that m is nonconstant wrt and , since the following depends on choice of ,

### 4.2.  Performance Measures

Conventional use of the term benchmark includes the means by which performance is measured; a performance measure is itself a benchmark. Benchmark symmetries are generalized to depend upon the measure m as well as the set of test functions , and an FNLF-type result emerges which likewise depends upon m. Given benchmark , the collection of benchmark invariants7 is
Theorem 11:

is a group (under composition) for every measure m and benchmark . For every randomized algorithm , the randomized algorithms in all have the same expected average performance over.

Proof:
Since is finite, is a group if it is closed under composition. Suppose . Appealing to Proposition 1,
Appealing to Expected Duality, Linearity, and the fact that ,

Theorem 11 is a FNLF-type result as follows. Let be for some and some measure m. For every randomized algorithm , the -step expected average performance—as measured by m—of the collection of algorithms is identical over benchmark . It should be appreciated that the benchmark invariants are completely independent of , no assumptions whatsoever have been made concerning the support of , and the choice of is arbitrary.

Theorem 12:
For every measure m and benchmark,
Proof:
Let , and . Appealing to Expected Duality,
It follows (via Proposition 1) that . Hence, , since benchmark invariants form a group (Theorem 11). Thus, . Moreover,
(by Expected Duality), and therefore . Thus, .

It is interesting to contrast Theorem 12 with the closing remarks of Section 3.1. Even with , it does not follow that is invariant under the action of ; it is not generally true that when . But the message of Theorem 12 is precisely that invariance is regained, at the price of trading equality, =, for similarity, , defined by having identical invariants; .

We conclude with a generalization of the theoretical machinery used by Rowe and Vose (2011). Define to be8
The following is a direct consequence of Theorem 11 and Linearity.
Corollary 3:
For every measure m, randomized algorithm , and benchmark ,
Proposition 5:
For every measure m, randomized algorithm , and benchmark ,
Proof:
To streamline notation, abbreviate by G. Appealing to Proposition 1, and using the fact that G is a group (Theorem 11),
Thus, ; choosing , yields the first equality and choosing , yields the second. Finally (appealing to Theorem 12),
Proposition 6:
For every measure m, randomized algorithm , and benchmark ,
Proof:
Appealing to Proposition 5,

### 4.3.  Matching Performance

Consider the scenario where randomized algorithm outperforms on benchmark . Is there another benchmark for which outperforms by the same amount? The answer depends on the measure m; Wolpert and Macready (1997) point out negative examples which they term minimax distinctions. This section analyzes the following predicate, which asserts that the particular measure m admits no minimax distinctions.
1
where . It turns out that this predicate makes an interesting geometric claim. Let denote the vector of ones, and let be the binary infix predicate asserting that its vector arguments are parallel.
Theorem 13:
Let m be a measure, and for , let be the -dimensional vector having components . Measure m admits no minimax distinctions iff
2
Proof:
Note that is equivalent to
3
whose left-hand side is
Replacing with yields an expression for the right-hand side of Equation (3), which when combined with the above shows is equivalent to
Stated more succinctly in vector notation, the above is
where superscript T denotes transpose. Let where . Observe that
hence . Conversely, given , define , , by
and note that
which implies . Since 1Tw=0, it follows that is
Thus, both and belong to the simplex , and therefore represent randomized algorithms. Furthermore, . It follows that quantifying over and can be replaced with quantifying over in the sense that predicate Equation (1)—which according to the above is
—is equivalent to
4
Define the vector and the set of vectors S(w) by
Using the notation above—and trading quantification over in Equation (4) for quantification over u below—transforms Equation (4) into
5
which asserts that for every , any is orthogonal to some . It follows that Equation (5) is equivalent to the assertion that for all
6
Observe that co- whereas co- is either 1 or 2. Hence, the only way Equation (6) can be true—since the union is finite—is that co- for some ; thus u is parallel to . In other words,
for some .

Consider in the proof above that were co- always 2, the complement of the union on the right-hand side of Equation (6) is dense and open in (subspaces of are closed). Since is a dense subfield of , that complement contains rational points. Therefore, the components of randomized algorithms can be restricted to without altering Theorem 13 or its proof (except to say ). If m is likewise required to take values in , the will also be rational (i.e., everything is computable).

Moreover, the domain over which benchmarks are quantified—for the definition of minmax distinctions in Equation (1) and the condition in Equation (2)—is intentionally unspecified. That domain may be chosen arbitrarily without altering Theorem 13 or its proof. In particular, the benchmarks and may be restricted to satisfy an arbitrarily chosen predicate (for instance, they may be restricted to singleton sets).

The next result is that, for every measure, the performance of any randomized algorithm on any benchmark can be matched in a nontrivial way.

Proposition 7:

For every measure m, randomized algorithm , and benchmark , there exist and such that where or may be chosen arbitrarily.

Proof:
If is permutation closed, then any works with (sharpened NFL). Otherwise, for some ; let . Appealing to Proposition 1 and Expected Duality,
One might hope for a variant of Proposition 7 asserting that for every m, , and , there exists for which when is nontrivial (nontriviality is essential; when m is injective and is minimal, the Uniqueness Theorem implies that is equivalent to ). Let
Appealing to Linearity and Corollary 3,
However, if for some randomized algorithm , then Proposition 6 implies . Alternatively, one could appeal to Theorem 11 with
and . However, that may also fail. If , then where
If benchmark contains only the identity function, and the performance measure is
it then follows that , because
The example above demonstrates that the following cannot be improved.
Proposition 8:

Let . For every measure m, benchmark , and nontrivial randomized algorithm , there exist infinitely many such that .

Proof:
To simplify notation, let abbreviate . Appealing to the finiteness of , define
It follows from Linearity that is a nondecreasing function of which attains the value for some ; let be determined by .

.

Then , because is trivial, whereas is not. Moreover, infinitely many alternatives to are provided by as varies.

That leaves the case . Since , there exists an algorithm such that .

Let . Since is a surrogate for —as far as performance is concerned—. Moreover, since they have different support. Infinitely many are provided by as varies.

Let where
and is chosen sufficiently small to put the coefficients in the linear combination of , , defining above in the open interval (0, 1). Observe that
It follows that
Moreover, since they have different support. Infinitely many are provided by as varies.

Note that the components of randomized algorithms can be restricted to without altering Proposition 8 or its proof ( is necessarily rational if performance measures are required to take values in ).

## 5.  Conclusion

Definitions permit and theorems address algorithms which are not Turing computable. It should be appreciated that general results which hold even for algorithms that need not be Turing computable necessarily specialize to algorithms that are. Readers who embrace transfinite induction will no doubt appreciate the probability-free treatment in Section 3 which is conducted within a framework amenable to generalization (Rowe et al., 2009).

As a concession to readers who desire computability, and were kept finite, and conclusions were stated as if randomized algorithms had components in (for instance, Propositions 4 and 8 could have claimed uncountable many if components in were permitted). Moreover, care was taken to point out the fact that Theorem 13 and Proposition 8 in no way require nonrational components.

In keeping with the outlook that the investigation of NFL-like properties need not be conflated with or encumbered by applications, this paper has focused on simplifying and extending theoretical results, and presenting new mathematical tools (Theorems, Corollaries, Lemmas, Propositions). Focused NFL (Theorem 7) simplifies and extends the previous account by Whitley and Rowe (2008), and FNFL (Theorem 8) establishes the analogue for algorithms restricted to steps. Corollary 2 and Theorem 9 formalize cyclic aspects of focused benchmarks noted in Whitley and Rowe (2008).

A trend in NFL theory is the progression from permutations of the search space to group actions on functions and algorithms; our results and methods have for the most part followed that path. A series of propositions sort out how permutations act on randomized algorithms, establish linearity of expected average performance, and lead up to Expected Duality (Theorem 10). The demonstration of NFL for randomized algorithms by Rowe and Vose (2011) is first reviewed, and then sharpened (Proposition 4). By placing attention on the particular measure used, benchmark symmetries are extended to benchmark invariants whose properties admit a FNFL-type interpretation (Theorem 11). The theoretical machinery of Rowe and Vose (2011) is also generalized.

The paper concludes with a geometric characterization for minimax distinctions (Wolpert and Macready, 1997) and basic results on performance matching.

## Acknowledgments

The authors would like to thank Suzanne Sadedin, Marte Ramírez, and anonymous referees for comments on this manuscript. Ideas leading to Theorem 13 were initially formulated while M. D. Vose was visiting The University of Birmingham; he is grateful for the gracious support provided by the School of Computer Science, and for valuable discussions with Jon E. Rowe and Alden H. Wright during that visit. This work was supported in part by NIH grant R01GM056693, and by a Howard Hughes Medical Institute Collaborative Innovation Award.

## References

Allibone
,
S. A.
. (
1879
).
Prose quotations from Socrates to Macaulay
.
Reprint Services Corp
.
Auger
,
A.
, and
Teytaud
,
O.
. (
2007
).
Continuous lunches are free!
Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO-2007
, pp.
916
922
.
Auger
,
A.
, and
Teytaud
,
O.
. (
2010
).
Continuous lunches are free plus the design of optimal optimization algorithms
.
Algorithmica
,
57
:
121
146
.
Corne
,
D.
, and
Knowles
,
J.
. (
2003
).
No free lunch and free leftovers theorems for multiobjective optimisation problems
. In
Evolutionary multi-criterion optimization. Lecture notes in computer science
, Vol.
2632
(pp.
327
341
).
Berlin
:
Springer
.
Domingos
,
P.
. (
1998
).
How to get a free lunch: A simple cost model for machine learning applications
.
Proceedings of the AAAI98/ICML98, Workshop on the Methodology of Applying Machine Learning
, pp.
1
7
.
Droste
,
S.
,
Jansen
,
T.
, and
Wegener
,
I.
. (
1999
).
Perhaps not a free lunch but at least a free appetizer
.
Proceedings of the First Genetic and Evolutionary Computation Conference, GECCO-1999
, pp.
833
839
.
Droste
,
S.
,
Jansen
,
T.
, and
Wegener
,
I.
. (
2002
).
Optimization with randomized search heuristics—The (a)nfl theorem, realistic scenarios, and difficult functions
.
Theoretical Computer Science
,
287
:
131
144
.
Goldberg
,
D.
. (
1989
).
Genetic algorithm in search, optimization and machine learning
.
:
.
Griffiths
,
E.
, and
Orponen
,
P.
. (
2005
).
Optimization, block designs and no free lunch theorems
.
Information Processing Letters
,
94
:
55
61
.
Haupt
,
R.
, and
Haupt
,
S.
. (
2004
).
Practical genetic algorithms
, 2nd ed.
New York
:
Wiley
.
Holland
,
J.
. (
1975
).
Adaptation in natural and artificial systems
.
The University of Michigan Press, Ann Arbor
.
Igel
,
C.
, and
Toussaint
,
M.
. (
2001
).
On classes of functions for which no free lunch results hold
.
Igel
,
C.
, and
Toussaint
,
M.
. (
2004
).
A no-free-lunch theorem for non-uniform distributions of target functions
.
Journal of Mathematical Modeling and Algorithms
,
3
(
4
):
313
322
.
Mitchell
,
M.
. (
1998
).
An introduction to genetic algorithms
, 3rd ed.
Cambridge, MA
:
MIT Press
.
,
N.
, and
Surry
,
P.
. (
1995
).
Fundamental limitations on search algorithms: Evolutionary computing in perspective
. In
van Leeuwen, J. (Ed.)
,
Computer science today. Lecture notes in computer science
, Vol.
1000
(pp.
275
291
).
Berlin
:
Springer
.
Roberts
,
A.
, and
Varberg
,
D.
. (
1973
).
Convex functions
.
New York
:
.
Rowe
,
J.
, and
Vose
,
M.
. (
2011
).
Unbiased black box search algorithms
. In
N. Krasnogor (Ed.)
,
Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation
, pp.
2035
2042
.
Rowe
,
J.
,
Vose
,
M.
, and
Wright
,
A.
. (
2009
).
Reinterpreting no free lunch
.
Evolutionary Computation
,
17
:
117
129
.
Schumacher
,
C.
. (
2000
).
Black box search—Framework and methods
.
PhD thesis, The University of Tennesee, Knoxville
.
Streeter
,
M.
. (
2003
).
Two broad classes of functions for which a no free lunch result does not hold
. In
Genetic and evolutionary computation, GECCO 2003. Lecture notes in computer science
, Vol.
2724
(pp.
1418
1430
).
Berlin
:
Springer
.
Wardetzky
,
M.
,
Mathur
,
S.
,
Kalberer
,
F.
, and
Grinspun
,
E.
. (
2007
).
Discrete Laplace operators: No free lunch
. In
A. Belyaev and M. Garland (Eds.)
,
Eurographics Symposium on Geometry Processing
, pp.
33
37
.
Whitley
,
D.
, and
Rowe
,
J.
. (
2008
).
Focused no free lunch theorems
.
Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, GECCO-2008
, pp.
811
818
.
Wolpert
,
D.
. (
2001
).
The supervised learning no-free-lunch theorems
. In
Proceedings of the 6th Online World Conference on Soft Computing in Industrial Applications
, pp.
25
42
.
Wolpert
,
D.
, and
,
M.
. (
1995
).
No free lunch theorems for search
.
(SFI-TR-95-02-010)
Wolpert
,
D.
, and
,
M.
. (
1997
).
No free lunch theorems for optimization
.
IEEE Transactions on Evolutionary Computation
,
1
(
1
):
67
82
.
Woodward
,
J.
, and
Neil
,
J.
. (
2003
).
No free lunch, program induction and combinatorial problems
. In
Genetic programming. Lecture notes in computer science
, Vol.
2610
(pp.
287
313
).
Berlin
:
Springer
.

## Notes

1

In the original context where probabilistic language is adequate (finite domains and codomains), it adds no essential value; probabilistic jargon may be expanded into set-theoretic definitions and simplified away.

2

Wolpert and Macready (1995) initially assumed a uniform distribution for objective functions. Whereas they later discuss nonuniform distributions (Wolpert and Macready, 1997), the nonuniformities they consider cannot ameliorate the limitations pointed out by Droste et al. (1999).

3

An interesting—though widely unrecognized—fact is that nonuniform NFL can be obtained from sharpened NFL by a suitable choice of performance measure (Rowe et al., 2009).

4

A multiset—delimited with doubled brackets —extends the set concept by allowing elements to appear more than once.

5

Previous use of G is superseded by the notation defined here.

6

Schumacher (2000) introduced this measure, but with different notation.

7

Previous use of G is superseded by the notation defined here.

8

Previous use of is superseded by the notation defined here.