We extend previous results concerning black box search algorithms, presenting new theoretical tools related to no free lunch (NFL) where functions are restricted to some benchmark (that need not be permutation closed), algorithms are restricted to some collection (that need not be permutation closed) or limited to some number of steps, or the performance measure is given. Minimax distinctions are considered from a geometric perspective, and basic results on performance matching are also presented.
Black box search algorithms are applied to search and optimization problems without exploiting a priori information about the objective function (such algorithms acquire information about the objective function by evaluating it). Evolutionary algorithms, and other nature-inspired search algorithms, are frequently implemented as black box algorithms, and have been regarded as robust optimization techniques (Holland, 1975; Goldberg, 1989; Mitchell, 1998; R. Haupt and S. Haupt, 2004). In this paper, we refer to nonrepeating/nonrevisiting black box search algorithms simply as algorithms.
The original no free lunch (NFL) theorems were expressed within a probabilistic framework and were used to investigate whether black box algorithms could differ in robustness (Wolpert and Macready, 1995). Sir Francis Bacon observed: “Men suppose that their reason has command over their words; still it happens that words in return exercise authority on reason” (Allibone, 1879). That opinion is apropos to NFL; it has been demonstrated that probability is inadequate to confirm unconstrained NFL results in the general case (Auger and Teytaud, 2007, 2010). In that sense, probability is an unfortunate historical artifact. With one exception, this paper abandons probability, preferring a set-theoretic framework which obviates measure-theoretic limitations by dispensing with probability altogether (Schumacher, 2000; Rowe et al., 2009).1 That exception is the analysis of randomized algorithms.
The original NFL theorems imply that, if an algorithm's performance were plotted against all objective functions (over a fixed but arbitrary finite domain and codomain), the resulting performance graph would simply be a permutation of the performance graph of any other algorithm. In that sense, no algorithm—including classical genetic algorithms—can be more robust than any other. Results for infinite domains or codomains have a similar nature but are of a more technical character, see Rowe et al. (2009). NFL-like results can be found in diverse areas, including machine learning (Domingos, 1998; Wolpert, 2001), induction and combinatorial problems (Woodward and Neil, 2003), multi-objective optimization (Corne and Knowles, 2003), Boolean functions (Griffiths and Orponen, 2005), discrete Laplace operators (Wardetzky et al., 2007), and others.
Strong preconditions in the original NFL theorems constrained their application. Droste et al. (1999) observed that the likelihood of encountering optimization problems is nonuniform; if average performance is to be a meaningful statistic to summarize aggregate optimization behavior, then it should be a weighted average. Therefore, NFL—which assumed to be a uniform distribution2—was inappropriate for that purpose.
Later NFL variants were established which assumed that the set of functions, the distribution of functions, or the relationship between those functions and algorithms considered had special properties: permutation closure in the case of the Sharpened NFL (Schumacher, 2000), specialized distributions in the case of the nonuniform NFL3 (Streeter, 2003; Igel and Toussaint, 2004), and focused sets in the case of focused NFL (Whitley and Rowe, 2008).
It has been observed that permutation closure is an assumption rarely satisfied (Igel and Toussaint, 2001; Streeter, 2003), which constrains application of sharpened NFL and nonuniform NFL. On the one hand, such observations do not invalidate NFL results, but caution against their misinterpretation or misapplication. On the other hand, Droste et al. (2002) provide counterpoint to the idea that robust search flourishes where NFL does not by establishing an almost NFL theorem (the success of search strategies on functions is paid for with bad behavior on many other functions of similar complexity). Whatever implications that NFL-like theorems have for search, the mathematical properties concerning algorithms which such theorems reveal need not be conflated with applications. The position that a mathematical formalization and investigation of NFL-like properties may be conducted unencumbered by applications has precedent (Rowe et al., 2009), and that is the outlook taken by this paper.
Previous introductory remarks roughly sketch the context for the contributions made by this paper. Given widespread use of benchmarks to assess performance, it is natural to consider NFL from a benchmark perspective. Nevertheless, little as of yet has been said concerning focused sets. Whitley and Rowe (2008) study focused sets—benchmarks compatible with NFL results for given algorithms limited to steps—but do not completely characterize them. In this paper, results concerning focused sets are simplified and extended, and the sharpened NFL emerges as a special case. Rowe and Vose (2011) obtain results concerning the use of arbitrary benchmarks to evaluate randomized algorithms. The underpinnings of their analysis—which their paper used but did not establish—are presented in this paper. We show a sharpened NFL is a few steps from their results, and then generalize to NFL results specific to particular performance measures. Performance matching and minimax distinctions (Wolpert and Macready, 1997) are also briefly considered.
2. Theoretical Background
A trend in NFL theory is the progression from permutations of the search space (Radcliffe and Surry, 1995) to group actions on functions and algorithms. This can be seen at least as far back as Schumacher (2000). We will use the notation to refer to the set of permutations (bijections) on . Given a permutation , the permutation of f by is the function defined by . Thus permutations of may also be considered as permutations via . Such are therefore regarded as elements of .
A key observation dating back at least as far as Radcliffe and Surry (1995) is the Uniqueness Theorem.
If , then , i.e., is injective. Hence, the map is invertible for every algorithm .
Although the function does not necessarily map onto the set of all traces, the Uniqueness Theorem together with the Completeness Theorem (below) imply is a bijection from to (the Completeness Theorem is actually a corollary of Theorem 1, because the finite domain and codomain of have the same size).
The map is surjective.
The map is abbreviated by , its inverse is denoted by , and juxtaposition denotes composition of such maps. Thus (composing with yields a bijection on ) and in particular is a member of the group of permutations of under composition.
Projecting traces in the displayed equality above to their y components yields the following Corollary:
Thus, the set of performance vectors is independent of the algorithm which generated it (iff is permutation closed) and any performance measure of that set of performance vectors is likewise independent of the algorithm.
3. Focused No Free Lunch
Focused NFL (Whitley and Rowe, 2008) concerns benchmarks. The basic question is: if attention is restricted to a subset of algorithms, then for what benchmarks (sets of functions) must the algorithms have equal performance? Benchmark is focused with respect to a set of algorithms iff is independent of . In other words, algorithms in have the same set of performance vectors over the benchmark. In particular, every performance measure—which therefore is a function of the same set of performance vectors—must necessarily yield the same result over the benchmark, for each algorithm in . It is possible, however, that algorithms in may exhibit the same performance on different functions of the benchmark. Note that sharpened NFL can be rephrased as follows:
Benchmark is focused wrt the set of all algorithms iff it is permutation closed.
The main result in Whitley and Rowe (2008) is Lemma 1, which asserts:
Let be a set of algorithms and let . is the smallest benchmark containing f which is focused wrt .
Theorem 6 connects with the concept of closure under permutation; is a group of permutations with respect to which is closed. The implication in Whitley and Rowe's result (Theorem 6 above) can be strengthened to an iff as follows.
Benchmark is focused wrt a set of algorithms iff it is closed wrt .
It should be noted that the focused NFL demonstrated above (FNFL) can be obtained as a corollary of Whitley and Rowe's Lemma 1 (remarks following their Lemma 1 intimate how). Their result is strengthened in the sense that whereas they presented an implication, FNFL is stated as an iff. Moreover, a significant contribution of FNFL to NFL theory is clarity and simplicity in both statement and proof.
3.1. Algorithms Limited to Steps
To further generalize NFL, Whitley and Rowe (2008) consider algorithms restricted to steps. Their results are expressed in terms of a pseudocoded computer algorithm. We simplify, extend, and express those results from a set-theoretic point of view.
Suppose one were interested in constructing a benchmark containing f which is focused wrt and . The following corollary provides one answer.
Corollary 2 generalizes the corresponding algorithmic construction presented in Whitley and Rowe (2008); whereas they restrict attention to path-search algorithms—those for which is independent of f—there is no such restriction in Corollary 2. Theorem -step focused NFL (FNFL) has no counterpart in Whitley and Rowe (2008), but it reflects (in a sense clarified by the following Corollary and Theorem) a cyclic aspect of focused benchmarks which, if not formalized, was certainly alluded to in their paper.
Observe that the benchmarks of Corollary 2 are cyclic. In fact, every focused benchmark can be decomposed into a disjoint union of cyclic benchmarks.
Combining previous results yields the following decomposition theorem.
4. Randomized Algorithms and Benchmarks
is linear in its first argument.
4.1. Benchmark Symmetries
The above is a FNFL-type result: Every randomized algorithm in has the same expected average performance over benchmark.
The support of random algorithm is the set of algorithms for which . A deterministic algorithm is a randomized algorithm whose support is a singleton set; such support is called trivial. A nontrivial randomized algorithm has nontrivial support. A deterministic algorithm has a unique nonzero component ; to streamline notation, may be used to denote such . Measure m is constant wrt benchmark and support set iff for all .
4.2. Performance Measures
is a group (under composition) for every measure m and benchmark . For every randomized algorithm , the randomized algorithms in all have the same expected average performance over.
Theorem 11 is a FNLF-type result as follows. Let be for some and some measure m. For every randomized algorithm , the -step expected average performance—as measured by m—of the collection of algorithms is identical over benchmark . It should be appreciated that the benchmark invariants are completely independent of , no assumptions whatsoever have been made concerning the support of , and the choice of is arbitrary.
It is interesting to contrast Theorem 12 with the closing remarks of Section 3.1. Even with , it does not follow that is invariant under the action of ; it is not generally true that when . But the message of Theorem 12 is precisely that invariance is regained, at the price of trading equality, =, for similarity, , defined by having identical invariants; .
4.3. Matching Performance
Consider in the proof above that were co- always 2, the complement of the union on the right-hand side of Equation (6) is dense and open in (subspaces of are closed). Since is a dense subfield of , that complement contains rational points. Therefore, the components of randomized algorithms can be restricted to without altering Theorem 13 or its proof (except to say ). If m is likewise required to take values in , the will also be rational (i.e., everything is computable).
Moreover, the domain over which benchmarks are quantified—for the definition of minmax distinctions in Equation (1) and the condition in Equation (2)—is intentionally unspecified. That domain may be chosen arbitrarily without altering Theorem 13 or its proof. In particular, the benchmarks and may be restricted to satisfy an arbitrarily chosen predicate (for instance, they may be restricted to singleton sets).
The next result is that, for every measure, the performance of any randomized algorithm on any benchmark can be matched in a nontrivial way.
For every measure m, randomized algorithm , and benchmark , there exist and such that where or may be chosen arbitrarily.
Let . For every measure m, benchmark , and nontrivial randomized algorithm , there exist infinitely many such that .
Then , because is trivial, whereas is not. Moreover, infinitely many alternatives to are provided by as varies.
That leaves the case . Since , there exists an algorithm such that .
Let . Since is a surrogate for —as far as performance is concerned—. Moreover, since they have different support. Infinitely many are provided by as varies.
Note that the components of randomized algorithms can be restricted to without altering Proposition 8 or its proof ( is necessarily rational if performance measures are required to take values in ).
Definitions permit and theorems address algorithms which are not Turing computable. It should be appreciated that general results which hold even for algorithms that need not be Turing computable necessarily specialize to algorithms that are. Readers who embrace transfinite induction will no doubt appreciate the probability-free treatment in Section 3 which is conducted within a framework amenable to generalization (Rowe et al., 2009).
As a concession to readers who desire computability, and were kept finite, and conclusions were stated as if randomized algorithms had components in (for instance, Propositions 4 and 8 could have claimed uncountable many if components in were permitted). Moreover, care was taken to point out the fact that Theorem 13 and Proposition 8 in no way require nonrational components.
In keeping with the outlook that the investigation of NFL-like properties need not be conflated with or encumbered by applications, this paper has focused on simplifying and extending theoretical results, and presenting new mathematical tools (Theorems, Corollaries, Lemmas, Propositions). Focused NFL (Theorem 7) simplifies and extends the previous account by Whitley and Rowe (2008), and FNFL (Theorem 8) establishes the analogue for algorithms restricted to steps. Corollary 2 and Theorem 9 formalize cyclic aspects of focused benchmarks noted in Whitley and Rowe (2008).
A trend in NFL theory is the progression from permutations of the search space to group actions on functions and algorithms; our results and methods have for the most part followed that path. A series of propositions sort out how permutations act on randomized algorithms, establish linearity of expected average performance, and lead up to Expected Duality (Theorem 10). The demonstration of NFL for randomized algorithms by Rowe and Vose (2011) is first reviewed, and then sharpened (Proposition 4). By placing attention on the particular measure used, benchmark symmetries are extended to benchmark invariants whose properties admit a FNFL-type interpretation (Theorem 11). The theoretical machinery of Rowe and Vose (2011) is also generalized.
The paper concludes with a geometric characterization for minimax distinctions (Wolpert and Macready, 1997) and basic results on performance matching.
The authors would like to thank Suzanne Sadedin, Marte Ramírez, and anonymous referees for comments on this manuscript. Ideas leading to Theorem 13 were initially formulated while M. D. Vose was visiting The University of Birmingham; he is grateful for the gracious support provided by the School of Computer Science, and for valuable discussions with Jon E. Rowe and Alden H. Wright during that visit. This work was supported in part by NIH grant R01GM056693, and by a Howard Hughes Medical Institute Collaborative Innovation Award.
In the original context where probabilistic language is adequate (finite domains and codomains), it adds no essential value; probabilistic jargon may be expanded into set-theoretic definitions and simplified away.
An interesting—though widely unrecognized—fact is that nonuniform NFL can be obtained from sharpened NFL by a suitable choice of performance measure (Rowe et al., 2009).
A multiset—delimited with doubled brackets —extends the set concept by allowing elements to appear more than once.
Previous use of G is superseded by the notation defined here.
Schumacher (2000) introduced this measure, but with different notation.
Previous use of G is superseded by the notation defined here.
Previous use of is superseded by the notation defined here.