Abstract

No Free Lunch (NFL) theorems have been developed in many settings over the last two decades. Whereas NFL is known to be possible in any domain based on set-theoretic concepts, probabilistic versions of NFL are presently believed to be impossible in continuous domains. This article develops a new formalization of probabilistic NFL that is sufficiently expressive to prove the existence of NFL in large search domains, such as continuous spaces or function spaces. This formulation is arguably more complicated than its set-theoretic variants, mostly as a result of the numerous technical complications within probability theory itself. However, a probabilistic conceptualization of NFL is important because stochastic optimization methods inherently need to be evaluated probabilistically. Thus the present study fills an important gap in the study of performance of stochastic optimizers.

1  Introduction

The No Free Lunch (NFL) theorems were originally proposed in a probabilistic form, stating loosely that on average all nonrepeating black-box algorithms perform equally. In recent years, this probabilistic interpretation has come into question due to the difficulty of expressing probabilistic NFL in infinite search domains. The issue became acute with the claim of Auger and Teytaud (2007, 2010) that “continuous lunches are free.” They asserted that there is no way of averaging fitness functions that produces the NFL property for a continuous search domain. This assertion was described by Rowe et al. (2009) as a “failure of the probabilistic framework.” Duéñez-Guzmán and Vose (2013) concluded that “probability is inadequate to affirm unconstrained NFL results in the general case,” and even that “probability is an unfortunate historical artifact.”

The goal of this article is to set the record straight with respect to probabilistic NFL. Although we cannot argue with Rowe et al. in their claim that “probabilistic languagecomplicates both the statement and the proof of NFL results” (Rowe et al., 2009), we believe that this complication is necessary. In terms of applications, NFL is a statement about what kinds of performance are possible in optimization algorithms. From this point of view, NFL for stochastic optimization cannot be divorced from the probabilistic context. Though Rowe et al. (2009) have shown that set-theoretic results can be applied to a probabilistic context, it is not always natural to express general non-NFL performance evaluation results without reference to probability theory.

Much of the confusion stems from the explicit claim of Auger and Teytaud (2010) that “there is no random fitness for which all algorithms are equivalentwhen [the search space] is a continuous domain.” In fact, this claim is possible only due to the particularly restrictive conditions that these authors placed on the definition of a random fitness (see Section 2.2). We will construct a large family of NFL evaluation scenarios in continuous settings, demonstrating that the claim of “no continuous NFL” overstates the actual situation unnecessarily.

In sum, this article should dispose of the claim that “probability is inadequate to affirm unconstrained NFL results in the general case.” The revised version of probabilistic NFL expounded in this article provides novel insight into the nature of NFL from a probabilistic and functional analytic point of view.

2  Overview of Results

This section gives an overview of the approach and results of this article and provides an intuitive sense for what is proven in technical detail below.

2.1  Search and Optimization

In general terms, an iterative search method picks points from a search domain in a sequence with the goal of finding a point with a desired value. The search domain is denoted as a set , and the set of values is another set . The search domain may consist of real numbers, integers, graphs, or anything else. The values set can be any distinct objects: real numbers, integers, or farm animals.

A search problem associates values with search points. There are many kinds of problems. One simple problem chooses values based on a single function , so that if is the set of positive integers and a set of farm animals, then for each integer , is the associated farm animal. Such a problem might arise in a database recording the animals owned by a farm.

A search goal is needed to complete the setting. If the goal is to find the unique identifier of a sheep named Dolly, a search method might use a secondary index based on the name of each animal to locate the correct sheep in logarithmic time, or to say that no such sheep exists. In this case, this search problem with this goal has an efficiently computable solution.

In an optimization problem, the value set is an ordered set, either totally or partially. The ordering indicates preference, and the goal to find the search point with the least or the greatest value. Many if not most evolutionary algorithms are applied to optimization. Often, the problem is defined by a single fitness (or cost or objective) function that provides fixed values for each search point. This setting is called static optimization.

A search method proposes a sequence of search points, and a search problem provides the associated values for each point. The method may randomize its choice of search points, choosing the next point according to a probability distribution that depends on the set of prior points and their associated values. Such a method has been called a randomized search heuristic (Vose, 1999). The search problem may also be randomized, and with a (deterministic) search goal the search becomes a game-theoretic game, as explored by Lockett (2015).

Typically, randomness within the search problem is substantially constrained. Here, the problem is restricted to a fitness measure, defined in Section 3.6. A fitness measure is a probability distribution over all possible fitness functions from the search domain to the set of fitness values . If the same search point is proposed multiple times, it must still receive the same value each time. A more general form of randomness not considered in this article turns up in practice when a fitness function is subject to noise.

2.2  No Free Lunch as Symmetry of Values

NFL looks at the trajectories of search points and fitness values produced by running a search method on a search problem. NFL is usually expressed in a way that is dependent on the search problem but not the search goal. To discuss NFL, one need not make any assumptions about the value set other than that it contains distinct points. In particular, NFL does not care whether the set of values is ordered; such questions are not relevant without a search goal. The reason the idea of a search goal was introduced in the last subsection was to point out that contrary to the folklore surrounding NFL, traditional forms of NFL have little to do with any notion of progress in optimization.

Rather, NFL is a symmetry property applying to the sequence of observed values. One might imagine an iterative search method proceeding as follows:
formula
1
A sequence of such pairs is called a trace of the search. NFL refers to various situations in which the sequence of does not depend on the sequence . Note that such a property does not depend on any relative preference among the ’s.

Such situations arise in various contexts. Set-theoretic NFL describes sets of fitness functions such that composition of any function in the set with any permutation of the search domain yields another function inside the set; that is, the set is closed under permutation (c.u.p.) (Schumacher, 2000; Rowe et al., 2009). Closure under permutation is equivalent to NFL (Igel and Toussaint, 2004), since it guarantees that no matter what sequence of search points is chosen by the search method, the sequence of values that is obtained cannot say which fitness values will be observed next. Set-theoretic NFL applies to search domains and fitness spaces of any cardinality, though proving as much requires index sets with cardinality potentially greater than that of the integers.

Probabilistic NFL places a conditional probability distribution over fitness sequences given search sequences and states that the probability of observing a particular fitness sequence is independent of the (randomized) search sequence . Given c.u.p. sets, set-theoretic NFL implies probabilistic NFL when the fitness function is chosen uniformly over the set. However, probabilistic NFL may include notions of NFL not included by the set-theoretic version. Within probability theory, it is difficult but not impossible to express probabilistic NFL for spaces beyond countable cardinality due to the existence of unmeasurable sets. This subject is taken up in Sections 3 and 4.

In a previous article, Auger and Teytaud (2010) claimed that “continuous lunches are free.” This claim has been widely cited, but it overstates the results proven in their article and contradicts Theorems 28 and 30 below. Suppose that the search domain is the unit interval and the value space is the real line, . To demonstrate an NFL fitness measure, suppose that each fitness value above is chosen from a standard normal distribution independent of the search sequence and independent also of the prior values . That is, the search method chooses a point, and then the search problem randomly assigns a fitness value to that point that has no relationship to its previous choices or to the search point proposed. Thus the probability of the observing certain fitness values is independent of the search points proposed, satisfying the intuitive criterion for probabilistic NFL. This fitness measure has mutually independent and identically distributed coordinates. It is thus path independent by Theorem 28 and has the NFL property by Theorem 25. Furthermore, the fitness measure in question does exist as a consequence of the Kolmogorov Extension Theorem.

This example of continuous NFL is intuitive and can be approximated on a digital computer. But it does not have a proper median and so does not satisfy the conditions imposed on NFL by Auger and Teytaud (2010). A proper median is a real number such that half of the fitness values are strictly above it and half strictly below. Further, this number must be the same across all fitness functions. If complete fitness functions could be drawn from this NFL fitness measure above (which they cannot, since it is a Baire measure), they would be imbalanced with probability one. The requirement of a proper median is too strong, but the proof against continuous NFL depends on it. That continuous NFL must satisfy certain properties does not imply that it cannot exist at all.

The main difference with Auger and Teytaud (2010) concerns the optimization process itself. The present article treats optimization as a stochastic process in which only the trace is observed. In probability theory, an event is measurable if a probability can be assigned to it. A probability distribution says nothing about events that are not measurable. Such events cannot be observed. Auger and Teytaud (2010) require that the entire fitness function be observable, that is, that one can sample entire fitness functions. This Lebesgue process requirement is critical to their proof. In this article, the optimization method only observes the fitness values as they are assigned. If the method is run for countably many steps, it observes countably many fitness values in order. The complete fitness function is unobservable.

This article goes further than just showing that continuous NFL does exist in a probabilistic setting. After all, that fact is a logical consequence of the most recent version of set-theoretic NFL (Rowe et al., 2009). Rather, this article demonstrates equivalence between probabilistic NFL and a new concept of path independence, a probabilistic analog to c.u.p sets; it states that probabilistic NFL is invariant with respect to permutations in the search domain.

Path independence is not easy to prove, but it can be easily demonstrated for fitness measures that have mutually independent and identically distributed coordinates, such as the one above. Path independent fitness measures always have identically distributed coordinates (Theorem 29), but the converse is not true (Theorem 33). It is tempting to think that combining mutual independence and identically distributions would yield path independence and hence NFL. But this conclusion is incorrect, as proven for a special case in Theorem 34. NFL fitness measures can be deficient in a sense that c.u.p. sets cannot be. They exhibit correlations among search coordinates that do not disturb NFL by choosing probabilistically between two distinct NFL fitness measures.

Further, weighted combinations of NFL fitness measures are also NFL fitness measures (Theorem 31). NFL fitness measures without mutually independent coordinates can arise in exactly this way (Theorem 34). The following question then suggests itself: are there NFL fitness measures that cannot be generated from a convex combinations of uniform priors over sets that are c.u.p.? Or is path independence just the closure under vector operations of mutually independent and identically distributed coordinates? These questions are not answered in this article, but tools helpful to answer them are developed.

2.3  Motivation for the Formalism

This article considers optimization and search as processes generating a pair of randomized trajectories, one through the search domain and one through the fitness space . The optimization method chooses the search trajectory in response to the fitness trajectory, whereas the optimization problem chooses the fitness trajectory in response to the search trajectory. The optimization method tries to control the fitness trajectory, and NFL describes the situation when this trajectory is uncontrollable. Such processes are not confined to computations; they may describe, for example, any input-output sequence mappings. NFL applies to all such systems irrespective of computational aspects.

The formalism below considers an “optimizer” as a map from fitness trajectories to probability distributions over search trajectories. Intuitively, one can imagine observing a fitness trajectory and asking which search trajectory produced it. Usually, one thinks of an optimizer as examining past search points and fitness values in order to decide the next search point iteratively. Formally, however, this iterative point of view can be obtained from the formalism by extracting conditional probabilities, as briefly discussed in Section 3.5. More complete discussions can be found elsewhere (Lockett, 2015; Lockett and Miikkulainen, 2013; Lockett, 2013). In particular, the notation in Equation 4 represents how the optimizer chooses the point given a sequence of previous search points and values determined by a fitness function . The trace or history of an optimization is a sample from the history process from Equation 7, which takes values determined by the optimizer and fitness measure. One can examine the history process by sampling its coordinate projections.

The iterative nature of optimization is encoded in a totally ordered index set . If it is finite, optimization proceeds for a fixed number of steps. If it is countable, optimization runs in discrete steps forever. In either case, the trace can be observed with search points and fitness values at each step. Uncountable index sets were introduced to NFL by Rowe et al. (2009), where they were necessary for NFL proofs. Here, uncountable cardinality enables a way of thinking about optimization that transcends computation, though this perspective will not be fully developed. One can imagine exact gradient descent as an example with in which a system starts at a fixed point and follows derivatives to the nearest local optimum taking infinitesimal steps. A further possibility, not explored here, would be to separate the search process from its observation, so that an uncountably indexed search might be observed at finite intervals. Such problems occur naturally in robotics, for example.

Probability theory pertains to what can be observed. At the outset, one specifies the events that can be measured. For probabilistic NFL in this article, the only observable aspect of optimization is its trace (i.e., Equation 7). Complete fitness functions can only be observed if all possible values are contained in the trace. Otherwise, the actual fitness function is a latent, unobservable factor. Other choices are possible; Auger and Teytaud (2010) required the entire fitness function to be observable. The existence of NFL proven here follows from weaker observability requirements, resulting in a technical distinction between Borel and Baire optimizers. A Baire optimizer can observe countably only many optimization steps, whereas a Borel one can observe uncountably many. Thus a Borel optimizer is needed whenever is uncountable.

If the index set has countable cardinality, the history process can be approximately sampled one point at a time to generate an approximate trace. The word “approximate” is necessary because very often the spaces being searched consist of real numbers that cannot be represented directly in a digital computer. Not only that, but the probabilities being sampled are also real numbers with infinite representations, so that sampling is only approximate. Many methods claim to optimize real functions though they are computed with a finite subset of the rational numbers. This theory applies to arbitrary spaces. Of course, when a digital computer is used, finite representations are needed, and there are at most countably many programs available to approximate sets of optimizers that may have uncountable or higher cardinality depending on the space.

In a theoretical setting, it is not always necessary or useful to limit oneself solely to computable objects. A physical robot is one example of an optimization that is only partially digital. For arbitrarily indexed search processes, NFL is precisely the situation in which fitness trajectories decouple from the search trajectories supposedly driving them. This generalized concept of NFL may in fact have repercussions in mathematics well beyond search and optimization.

3  Formal Grounding: Optimization Methods and Problems

As discussed, NFL for optimization can be viewed as a statement about the invariance of trajectories observed in fitness space when running a search method on a search problem. The first step in studying NFL, then, is to formalize what is meant by the terms search method and search problem. This section develops an abstraction of optimization methods as functionals from the space of fitness functions to a space of probability measures. A similar construction, but restricted to observing countably many search points, was developed by Lockett and Miikkulainen (2013) and Lockett (2013).

3.1  Basic Setting

In this study, the domain of optimization is a Hausdorff topological space (that is, a space in which consistent open sets can be defined and in which any two distinct points are separated by some open sets). Optimization is performed with respect to a fitness function that maps points in to some topological space , called the fitness space, the fitness domain, or the co-domain. The space of all objective functions is , which is also a topological space under the product topology. As discussed above the space need not be ordered and represents settings including single-objective optimization, multiobjective optimization, and gradient-based methods. The topological setting in this article is used to induce probability spaces, and thus topological concepts are restricted to general facts about open, closed, and compact sets (see e.g., Munkres, 2000).

3.2  Index Sets

An iterative optimizer looks at points in the search domain in order. To provide a notion of order, a totally ordered index set will be used. is assumed to have a least element (here called 1, so that indices begin at 1), so that optimization has a starting point. The index set may have any cardinality, whether finite, countable, continuous, or larger, although the main focus here will be on index sets with at most countable cardinality. Traditionally, countably many steps are enough to formalize iterative optimization methods, but it is possible to imagine methods that follow continuous paths as well. Set-theoretic NFL requires arbitrary cardinality in order to exhaust the search domain in NFL proofs (Rowe et al., 2009). Cardinality plays a secondary role in this article.

For , the intervals of are denoted by for the closed interval, for the open interval, and and for the half-open intervals. If these intervals are obviously empty. The notation for an index set indicates the product space formed by taking one copy of for each . The index set may be larger or smaller than . Sometimes it is necessary to index only unique elements in . When this is needed, an index set will be used, defined so that .

A sequence is a function from some index set to some domain, denoted by parentheses, e.g. . For a subset , the restriction of to is denoted by . Restrictions to intervals are given by for example, or for . A sequence is finite if its index set has finite cardinality, countable if its index set has countable cardinality, and so on. For any function , indicates pointwise application of the function.

For any set , a permutation is a bijection on the set that rearranges its elements. If is a sequence, a permutation on yields as the sequence with its elements reordered according to . The notation means . Similarly, for any set and any function , the notation is the composition of and , .

For any and any topological space , a projection is a function given by . A projection is finite, countable, or uncountable depending on whether is finite, countable, or uncountable.

3.3  Probability Spaces

In the NFL theorems below, potential optima are sampled from Kolmogorov probability measures. A probability space is a triple where is a probability measure over the -algebra . A -algebra on a space is a set of subsets of for which probabilities can be meaningfully defined; it contains the empty set and and is closed under complements, countable unions, and finite intersections. Sets in a -algebra are called events, measurable events, or measurable sets. A probability measure is a nonnegative set function defined on all sets of a -algebra subject to the requirement that and for any countable sequence of events . The technical motivation for these definitions is given by for example, Billingsley (1986). The set of all probability measures over a -algebra will be written as .

A function defined from a probability space to a measurable space is called measurable if for every the inverse projection is contained in .

On any topological space , the Borel -algebra is the smallest -algebra containing the open and closed sets; probability measures defined on this -algebra are called Borel measures. A function is Borel measurable if it is measurable from the Borel -algebra on its input space to a -algebra on its output space.

This article studies probability measures on product spaces used to sample countable projections of optimization histories. For an index set (usually with ), the Baire -algebra on , denoted , is the smallest -algebra that makes all countable projections into Borel measurable functions. Probability measures on the Baire -algebra are called Baire measures. A function is called Baire measurable if it is measurable from the Baire -algebra on its input space to a given -algebra on its output space.

If is at most countable, then the Baire and Borel -algebras are equivalent, i.e., . Otherwise, . In particular, the singletons are always contained in the Borel -algebra (that is, the singletons are Borel measurable), but the Baire -algebra may not contain the singletons (that is, the singletons are not Baire measurable). On the space , one cannot sample entire functions from a Baire measure, one can only observe a countable projection of fitness values. Borel measures, however, can be sampled directly.

For this reason, it is worthwhile to mention some facts relating Baire and Borel measures. In many spaces, every Baire measure can be extended to a unique regular Borel measure that agrees with it on all Baire measurable sets. Such spaces are called Mařík spaces or simply Mařík. It has long been known that every normal and countably paracompact space is Mařík (1957). The following remark gives more familiar examples, considered particularly when .

Remark 1:

A product space is Mařík in any of the following cases:

  1. is finite.

  2. is finite and is at most countable.

  3. is finite and is locally compact and -compact, for example, or .

  4. is a compact space, for example, or for .

  5. is a metric space, for example, is metric and is countable.

If has uncountable cardinality, then is not normal unless is compact. Thus, for instance, if and , then is not normal and thus not Mařík.

When is a Mařík space, it is possible to speak of sampling individual fitness functions. It is important to notice that although is not Mařík, the extended space is Mařík because is compact.

There is a well-defined notion of integration in any probability space based on the Lebesgue integral. This integral is written as or just for any set and any Borel measurable real-valued function . In probability theory, this integral is called the expectation of over , and the function is called a random variable. A stochastic process is a collection of random variables, here indexed by , which represents time. For further discussion, see Billingsley (1986); Halmos (1974); Chung and Williams (1990); Karatzas and Shreve (1991).

3.4  Kolmogorov Extensions

The Kolmogorov Extension Theorem (Kolmogorov, 1933) is a key tool for building Baire measures on product spaces. This version comes from Aliprantis and Border (2006) and uses tightness.

Definition 1 (Tightness):
A probability measure on a Borel measurable space is tight if for all ,
formula
2
That is, is the upper bound on the probability mass of all compact subsets.

Tightness and compactness are required here only in order to apply the Kolmogorov Extension Theorem. Every measure is tight in complete separable metric spaces such as , , , and other common spaces.

The next definition regards the consistency of a family of probability measures. Let be any set. Although will be used as an index set, it need not be ordered. Suppose we have a collection of Borel probability spaces . A subset of can be used to define subsets. A family of finite dimensional distributions over is a collection of probability measures indexed by all finite subsets of , e.g., . Each must be defined on the Borel product -algebra . Consistency requires overlapping subsequences to agree on probabilities.1

Definition 2 (Kolmogorov Consistency):
Suppose is a family of finite-dimensional distributions for a set . This family is consistent if for any other finite subset of and any ,
formula
3
which essentially says that members of the family defined on smaller index sets can be obtained by integrating out all extra indices from members defined on larger index sets.
Theorem 3.1 (Kolmogorov Extension Theorem):

Let be a family of Hausdorff topological spaces over a set , each equipped with their Borel -algebras. Let be a consistent family of finite-dimensional distributions for . Assume that each is tight. Then there is a unique Baire probability measure on that extends each .

As mentioned above, in the case of a Mařík space (Remark 1), this probability measure can be extended further to a regular Borel probability measure. The Kolmogorov Extension Theorem is used primarily in Section 4 to construct NFL optimization settings. The only requirements are consistency and tightness.

3.5  Optimization Methods

A Baire optimizer is a setwise Baire-measurable function that takes a fitness function and maps it to a Baire probability measure on the product space . “Setwise Baire-measurable” means that for all Baire events in , the map is Baire-measurable from to . Intuitively, the optimizer observes a fitness function and matches it with a sequence of search points chosen stochastically. Similarly, a Borel optimizer is a setwise Borel-measurable function that yields Borel probability measures, where “setwise Borel-measurable” means that is Borel-measurable for every Borel event . Every Borel optimizer is also a Baire optimizer. The two are equivalent when and are both at most countable.

Most, if not all, practical optimization methods can be placed inside this formalism by decomposing these formal optimizers into their step-by-step behavior. The process amounts to breaking up the probability measure into smaller parts representing small time scales such that the whole optimizer may be specified from its parts. A complete construction of Borel and Baire optimizers is avoided to save space. The construction requires a well-ordered subset of the index set . If the total order on is already a well-ordering (as when ), then this subset may be all of . In more general settings, the following definition applies.

Definition 3 (Sieve):

A subset of is a sieve if is well-ordered in and contains the least element 1, the maximal element , and all limit points of . For , denotes the next element of .

The element is added as a greatest element to (independent of ) so that the interval . Each index in the sieve has a next element, written . A particularly interesting case when is the sieve , which represents a population-based algorithm with population size . For , an example sieve might be , which examines the fitness at intervals of size . One could consider what happens as vanishes, making the sieve arbitrarily fine.

Breaking up the index set, one can define objects that transition the state of an optimizer just before time to its state just before . From these, an optimizer is constructed from a sieve and a collection of measurable and tight generators on that sieve, defined below. These properties produce a measure for each . The function is then a Baire optimizer and can be uniquely extended to a Borel optimizer when is Mařík.

Definition 4:

A generator set for a sieve is a collection with , where is the one element set .

Definition 5:

A generator set for a sieve is measurable if for all and the map is Borel measurable from to .

Definition 6:

A generator set for a sieve is tight if for all , and , the measure is tight.

Theorem 3.2:

Every measurable and tight generator set corresponds to a unique Baire optimizer that has the black-box property. If is a Mařík space (see Remark 1), then this Baire optimizer extends to a unique Borel optimizer as well.

Theorem 9 can be used to convert the familiar one-step behavior of black-box optimization methods into their infinite-time behavior. In the case where and , several examples of such constructions were developed by Lockett and Miikkulainen (2013); Lockett (2013, 2014). Since it is tangential to the NFL theorems below, a complete proof of Theorem 9 will not be offered, except to say that it involves a transfinite induction on integrals of the form
formula
4
for , , and Borel events in and , respectively. Such integrals are uniquely defined because the generator set is measurable, and they preserve tightness. With careful handling of the limit points of , a tight and consistent family of finite-dimensional distributions can be defined, to which the Kolmogorov Extension Theorem may be applied.

Given a Borel or Baire optimizer, it is also possible to construct generator sets using Radon-Nikodym derivatives. These generator sets take the form above only when the optimizer obeys a black-box property.

3.6  Optimization Problems

In this article, an optimization problem is a probability measure used to select fitness functions. The space of fitness functions is , and either Borel or Baire probability measures on this space may be used. A Baire fitness measure is defined as a probability measure on and a Borel fitness measure is a probability measure on . It is always possible to define Baire fitness measures, and if is a Mařík space (see Remark 1), then every Baire fitness measure extends uniquely to a Borel fitness measure. Fitness measures (Baire or Borel) will generally be denoted by below.

Baire fitness measures must be treated with caution, since only countable projections are Baire measurable, as discussed in Section 3.3. That is, one can observe the behavior of a Baire fitness measure only ever at a countable number of search points. Thus one cannot always determine the true minimum or maximum of “sample” from a Baire fitness measure, nor is it always possible to assess properties such as continuity, differentiability, or integrability. One cannot simply assume that a Baire fitness measure exists that places probability one on continuous functions; the existence of such a measure needs to be proven.

It is much easier to work with Borel fitness measures, since one can directly measure individual functions (these are the singletons in ). However, Baire fitness measures are easier to construct using the Kolmogorov extension theorem. In general, the Borel fitness measures encountered below are obtained by unique extension from a Baire fitness measure in a Mařík space.

The remainder of this article considers the interaction of an optimizer and a fitness measure. The following text pairs Borel optimizers with Borel fitness measures and Baire optimizers with Baire fitness measures. In each case, different assumptions can be supported about the index set .

Definition 7 (Borel Setting):

A triple consisting of an optimizer, a fitness measure, and an index set is called a Borel setting if is a Borel optimizer, is a Borel fitness measure, and has the property that for any interval there is a countable increasing sequence contained in such that (that is, is first-countable).

Definition 8 (Baire Setting):

If is a Baire optimizer, is a Baire fitness measure, and is a countable index set, then is a Baire setting.

In the Borel setting, a Borel product measure is defined on . For any and , this measure must satisfy
formula
5
There is one unique probability measure that satisfies this equation, a consequence that follows from Carathéodory’s extension theorem by forming an algebra from the complements, finite unions, and intersections of all possible and requiring to satisfy the probability axioms on this algebra.

In the Baire setting, may be defined similarly, but resulting in a Baire product measure rather than a Borel one. The measure is now used to derive a stochastic process for the fitness values observed at unique search points.

3.7  The Fitness Process

In order to discuss NFL, a stochastic process called the fitness process will be derived corresponding to the sequence of fitness values observed at unique search points. NFL implies that the fitness process is independent of the search points.

Suppose that is a sequence of search points. For each , define so that is the projection of onto the coordinate in . The function is Borel and Baire measurable for all . A stochastic process can now be defined for each by
formula
6
where each element of the process is a random variable from to . Recall that a stochastic process is just a collection of indexed random variables (i.e., measurable functions).
When is generated by some optimizer , there are two sources of randomness, an optimizer and a fitness measure . Thus we consider Baire probability measures on the space . With this setting, for all , define so that as before. Once again, is both Borel and Baire measurable. To see why, note that is a composition of three maps , each of which is a finite projection and thus Baire-to-Baire or Borel-to-Borel measurable. Their composition is therefore both Borel and Baire measurable. Then there is a collection of Baire measurable random variables given by
formula
7
and is a stochastic process with values in . It will be called the history process since it contains the fitness history of an optimization run. Notice that the definition of does not depend on an optimizer or a fitness measure.

Given a Baire (or Borel) optimizer and a Baire (or Borel) fitness measure , each random variable in the history process induces a Borel measure on . This measure determines the fitness value of the search point stochastically. This process needs to be filtered for unique search points.

Filtering is accomplished using stopping times, which are defined with respect to filtrations. The natural filtration of a process is an increasing sequence of -algebras such that each is the smallest -algebra that makes a measurable function for each . Intuitively, represents the information that can be observed from the history process up to time .

A stopping time of the history process is a function such that for all , the set is a measurable event in the -algebra . That is, it may not be possible to identify in advance when a stopping time will stop, but it is always possible to determine whether it has already stopped. The element is a special element disjoint from and greater than any element in included to account for events that never happen.

Next, the unique coordinates are identified by a sequence of stopping times. Let be a function mapping a history and an index to the number of unique elements of in . The set is used because the number of unique elements is limited by the size of . If , then there are unique elements in in the first indices of (including index ). Define a sequence of stopping times such that . If the set under the infimum is empty, then by convention. In either the Borel or the Baire setting, each is a stopping time since can be determined by examining the first coordinates of (Theorem 12). Furthermore, the unique stopping times are ordered. If , then . Every stopping time induces a -algebra that includes events prior to the stopping time. For a stopping time , let be the set on which stops by time . The stopped -algebra is given by
formula
8
for the Baire setting. In a Borel setting is replaced by .
Next, it needs to be shown when there is also a random variable
formula
9
that maps the history process to the first coordinate such that . The variable represents the unique search point in the search history. Its existence indicates that the trajectory of unique fitness values can be measured, which is required to define NFL. If exists, it would be -measurable. However, there are two potential problems. First, the set must be proven to be -measurable (that is, must be Borel or Baire measurable). Secondly, must be proven to be measurable. Finally, the event that needs to be considered. Each of these issues is dealt with in turn.
Theorem 3.3:

For all , the stopping time is Borel measurable in the Borel setting and Baire measurable in the Baire setting with as the target -algebra for measurability in either case.

Proof:

To prove that is Borel (or Baire) measurable, it suffices to show that for all the sets and are Borel (or Baire) measurable, since these sets generate the Borel -algebra for the order topology on . The set is another special case, but since this set is the complement of , its measurability follows automatically when the above sets are proven measurable.

For the first type of set, note that is either the entire space (if ) or the empty set (if ) and is therefore Baire and Borel measurable. Assume for (transfinite) induction that is Borel (or Baire) measurable for all . Note the identities
formula
10
In the Borel setting the existence of a countable increasing sequence was used to reduce a potentially uncountable union to a countable one on the far right in Equation 10. In the Baire setting simply enumerates the interval . Because this union is a countable union of measurable sets, the set on the far left of Equation 10 is Borel (or Baire) measurable. To complete the induction, it remains to show that is measurable.

Define and so that where and counts unique elements for . Applying these maps in sequence yields since both indicate the set of sequences indexed by that contain unique elements. The measurability of will follow if for all the are Borel (or Baire) measurable and the set is Borel measurable. The functions are projections with the same cardinality as . In the Baire setting, is at most countable, and so the are Baire measurable. In the Borel setting, all projections are Borel measurable. Finally, is a continuous function from to with the order topology. It is therefore Borel measurable, and since is a closed (and thus Borel) set, so is .

Once is measurable, Proposition 14 will demonstrate the measurability of whenever all take on values inside with probability one.

Definition 9 (Non-Repeating Optimizer):

A Borel or Baire setting is non-repeating if with -probability one for all . The setting is eventually non-repeating if with -probability one for all .

Proposition 3.4:

If a Borel or Baire setting is eventually non-repeating, each element of the collection is a -valued random variable, and is a stochastic process with natural filtration .

Proof:
The claim is that is -measurable for all with . For every and every , define sets
formula
11
From Equation 8, the claim will hold if is in . Define the capped stopping time , which is a stopping time and is measurable because is measurable by Theorem 12. But whereas is measurable with respect to the whole process , is measurable because it depends on only the first indices of .

Because lies in with probability one for all , it suffices to show that the stopped projection is Borel or Baire measurable (depending on the setting) as a map to for any . First, the map is a composition of measurable projections and is therefore measurable to for every (where the first argument has and measurability in the first argument is Borel with respect to the order topology). Second, the map is measurable due to the measurability of . The composition of these two maps is , which must be measurable as the composition of measurable maps.

The index was arbitrary, so the collection is a collection of random variables, as desired.

The stochastic process is called the fitness process of an optimizer on a fitness measure . Let . The fitness process induces a measure on derived from . This measure is Baire or Borel depending on the setting. First define so that , and then for each measurable set
formula
12
This construction is a standard way of obtaining a measure on a product space from a stochastic process. The function is measurable whenever is; such constructions will be used again below. NFL is formulated in the next section as a statement about the fitness process and its induced measure.

4  No Free Lunch Theorems

NFL has been widely studied by many authors (Radcliffe and Surry, 1995; Wolpert and Macready, 1997; Droste et al., 1997; Culberson, 1998; Schumacher et al., 2001; Igel and Toussaint, 2004; Auger and Teytaud, 2007, 2010; Rowe et al., 2009). For a recent full review, see Igel (2014). In this section, a new version of probabilistic NFL is formulated that exists in continuous and function spaces, and conditions are given that are both necessary and sufficient for this version of NFL. Specifically, a fitness measure with the NFL property must yield probabilities that are independent of search paths.

4.1  NFL Basics

Wolpert and Macready (1995) introduced the concept of NFL as a property of search algorithms, and Wolpert and Macready (1997) extended the concept to optimization algorithms. Radcliffe and Surry (1995) had previously introduced an NFL-type theorem as well.

Several versions of NFL have been studied. The prior sections have laid the groundwork for probabilistic NFL, in which NFL is treated as a statement about the probability governing the fitness values observed. The original statements of NFL were probabilistic, but applied only to finite search domains and fitness spaces. Auger and Teytaud (2007, 2010) extended probabilistic NFL to continuous spaces. The following definition is introduced in this article.

Definition 10 (Probabilistic NFL):

A Borel (or Baire) fitness measure is of class P-NFL on an index set if the map is constant for all optimizers such that the setting is an eventually non-repeating Borel (or Baire) setting.

That is, a Borel (or Baire) fitness measure is of class P-NFL if its fitness process is identically distributed for every eventually non-repeating Borel (or Baire) optimizer, with limitations on the index set determined by the choice of Borel or Baire measures. As will be demonstrated, a fitness measure is of class P-NFL if the probability over fitness trajectories is constant across search paths.

4.2  Path Independence

NFL implies that information extracted from any series of fitness evaluations is insufficient to suggest search points where more desirable fitness values may be found. This section provides rigor to this intuition.

Definition 11 (Non-Repeating Sequence):

A sequence is non-repeating if the set has cardinality .

That is, a sequence is non-repeating if it does not repeat elements of until all elements of have been exhausted. Recall that limits to the size of .

In Section 3.7, the stochastic process of projections was defined. As in Equation 12, for each and each Borel (or Baire) fitness measure, this process induces a Borel (or Baire) measure on . To obtain this measure, define so that . Then let
formula
13
for each event of . A fitness measure is of class P-NFL if and only if this measure is independent of non-repeating , called path independence.
Definition 12 (Path Independence):

A Borel (or Baire) fitness measure is path independent over an index set if the map is constant for all non-repeating . If is not path independent, then it is path dependent.

The term “path independence” captures the intuition that there are no paths through the search space that provide more information about the fitness value of the next unique point. Only the number of fitness values requested matters.

Path independence is obviously related to class P-NFL, since it implies that whatever points an optimizer chooses to evaluate, the distribution over fitness values is the same. In fact, the two are equivalent. This claim is the central result of this article. To prove the claim, path independence is first shown to be necessary for class P-NFL, then sufficient. These results are combined in the NFL Identification Theorem below.

To demonstrate necessity, deterministic optimizers are used, especially deterministic optimizers that are fitness agnostic, that is, independent of the fitness values. These concepts are defined using the indicator function. For a set , the indicator has if , and otherwise. An indicator function is measurable whenever is measurable.

Definition 13 (Deterministic Optimizer):

A Borel (or Baire) optimizer is deterministic if there exists a function such that for each Borel (or Baire) event in . The function is called the deterministic core of .

Definition 14 (Fitness Agnostic Optimizer):

A Borel (or Baire) optimizer is fitness agnostic if the map is constant.

Definition 15 (Invariant Trajectory):

If a Borel (or Baire) optimizer is deterministic and fitness agnostic, then its deterministic core is constant and equal to some sequence , called the invariant trajectory of .

Proposition 4.1:

If the index set satisfies the conditions for a Borel (or Baire) setting, then for any there is a Borel (or Baire) optimizer that is deterministic and fitness agnostic and has as its invariant trajectory. Furthermore, for any Borel (or Baire) fitness measure , the triple is a Borel (or Baire) setting.

Proposition 4.2:

Suppose is a Borel (or Baire) setting, and that is deterministic and fitness agnostic with non-repeating invariant trajectory . Then the fitness process has .

Proof:
The definitions imply that is a non-repeating setting so that the fitness process and the history process coincide. For every Borel (or Baire) event , the following equalities hold with -probability one:
formula
14
where two sets are equal “-” if they have equal measure under . The last equality makes it possible to apply Equation 5 to since
formula
15
This equation implies
formula
16
That is, .
Lemma 4.3:

If a Borel (or Baire) fitness measure is of class P-NFL on an index set , it is path independent over .

Proof:

Assume is of class P-NFL on and path dependent. Take to be distinct non-repeating search histories such . Such trajectories must exist since is path dependent. Let and be fitness agnostic optimizers with invariant trajectory and respectively. Then Proposition 22 implies that , which contradicts the claim that is of class P-NFL.

Lemma 4.4:

If a Borel (or Baire) fitness measure is path independent over an index set , then it is of class P-NFL.

Proof:
Suppose is any Borel (or Baire) event of . From Equation 12,
formula
17
The proof centers on obtaining a Borel (or Baire) event such that , for which may be averaged out of Equation 17 using Equation 5. To this end, note that
formula
18
In this last equation, represents the sequence filtered for uniqueness. Now is non-repeating, so path independence implies that for any non-repeating sequence , with - and therefore -probability one,
formula
19
where in the second line the has been pulled outside the set since it does not appear on the right. Renaming this final set as
formula
20
Equation 5 can be applied to obtain
formula
21
where the right-hand side is independent of . Therefore the map is constant, and is of class P-NFL on .
Theorem 4.5 (NFL Identification Theorem):

A Borel (or Baire) fitness measure is of class P-NFL on an index set if and only if it is path independent over .

Proof:

This result combines Lemma 23 and Lemma 24.

The NFL Identification Theorem characterizes probabilistic NFL as a statement about fitness measures. Such a characterization yields a concept equivalent to class P-NFL that does not depend at all on optimizers. If each path is identified with its histogram as done by Igel and Toussaint (2004), then the necessity of path independence is either previously proven or at least strongly suggested for finite spaces by Igel and Toussaint (2004) and for infinite spaces by Rowe et al. (2009). To our knowledge, this article contains the first discussion of path independence as a probabilistic concept, and the first proof that path independence is also a sufficient condition for NFL.

4.3  Construction of NFL Fitness Measures

According to the NFL Identification Theorem, every path independent fitness measure is of class P-NFL. But it is not obvious how to construct path independent fitness measures. In this subsection, it is demonstrated that path independence follows whenever all coordinate projections are identically distributed and mutually independent. This fact was initially proven by English (1996, 2000) for NFL in finite spaces, but it is extended below for arbitrary Hausdorff spaces. Critically, it implies that NFL priors exist in general.

Consider the collection of coordinate projections such that is given by . Each is both Borel and Baire measurable. Given a Borel (or Baire) fitness measure , each induces a measure such that for any Borel subset of , . Furthermore, for any subset of , a projection is defined by . Then is always Borel measurable and is Baire measurable if has at most countable cardinality. These projections also induce Borel measures such that for a Borel subset of ,
formula
22
When is finite, is called a finite-dimensional distribution of . Next, two useful properties of and are introduced.
Definition 16 (Identically Distributed Coordinates):

A Borel (or Baire) fitness measure has identically distributed coordinates if for all the map is constant.

Definition 17 (Mutually Independent Coordinates):
A Borel (or Baire) fitness measure has mutually independent coordinates if for any finite and any collection of Borel sets of , the measure factorizes as
formula
23

Finite subsets are used in the definition of mutual independence to avoid a definition of uncountable products and because these are sufficient to prove Theorem 28 for Baire fitness measures and even most Borel fitness measures.

Fitness measures of class P-NFL can be constructed by guaranteeing tightness along with mutually independent and identically distributed coordinates. There is one limitation at present in that Borel fitness measures of class P-NFL can only be constructed in this way when is a Mařík space.

Theorem 4.6 (Construction of Path Independence):

If a Baire fitness measure has identically distributed and mutually independent coordinates and all of the finite-dimensional distributions of are tight, then is path independent over and therefore of class P-NFL on . The same holds for a Borel fitness measure if is a Mařík space.

Proof:
Suppose satisfies the conditions. Let be any finite subset of . Mutually independent and identically distributed coordinates together imply that for any collection of Borel sets of ,
formula
24
where is a fixed element of independent of . Importantly, Equation 24 uniquely determines based on Carathéodory’s extension theorem. Then is a consistent family of finite-dimensional distributions. Each is a coordinate restriction of a tight measure and is thus tight. By the Kolmogorov Extension Theorem, there is a unique Baire measure on that extends the , and it must hold that since is unique and its finite-dimensional distributions agree with those of .
Now let be any permutation of . For any subset of , define . Define an isomorphism from so that , and note that the restriction of to has range . For arbitrary subsets of , define . Revisiting Equation 24, notice that for the collection above,
formula
25
Due to uniqueness, it holds in general that .

It remains to identify the distribution with for some . The set need not be finite; it will in fact have . All such are uniquely defined as Kolmogorov extensions of some subset of . The extension theorem is necessary because may be uncountable and uncountable projections are not Baire measurable.

Now fix any such that (which may be strictly smaller than ). Choose to be any non-repeating sequence such that for all . It should be clear that exhausts , i.e., . For any Baire event of , a Baire event of can be defined such that
formula
26
which pairs each element with an element that agrees with it. Consequently, since
formula
27
Finally, for any permutation of , let . Observe that
formula
28
By Equation 27 and the fact that