Abstract
Infinite population models are important tools for studying population dynamics of evolutionary algorithms. They describe how the distributions of populations change between consecutive generations. In general, infinite population models are derived from Markov chains by exploiting symmetries between individuals in the population and analyzing the limit as the population size goes to infinity. In this article, we study the theoretical foundations of infinite population models of evolutionary algorithms on continuous optimization problems. First, we show that the convergence proofs in a widely cited study were in fact problematic and incomplete. We further show that the modeling assumption of exchangeability of individuals cannot yield the transition equation. Then, in order to analyze infinite population models, we build an analytical framework based on convergence in distribution of random elements which take values in the metric space of infinite sequences. The framework is concise and mathematically rigorous. It also provides an infrastructure for studying the convergence of the stacking of operators and of iterating the algorithm which previous studies failed to address. Finally, we use the framework to prove the convergence of infinite population models for the mutation operator and the -ary recombination operator. We show that these operators can provide accurate predictions for real population dynamics as the population size goes to infinity, provided that the initial population is identically and independently distributed.
1 Introduction
Evolutionary algorithms (EAs) are general purpose optimization algorithms with great successes in real-world applications. They are inspired by the evolutionary process in nature. A certain number of candidate solutions to the problem at hand are modeled as individuals in a population. The algorithm evolves the population by mutation, crossover, and natural selection so that individuals with more preferable objective function values have higher survival probability. By the “survival of the fittest” principle, it is likely that after many generations the population will contain individuals with high fitness values such that they are satisfactory solutions to the problem at hand.
Though conceptually simple, the underlying evolutionary processes and the behaviors of EAs remain to be fully understood. The difficulties lie in the fact that EAs are customizable population-based iterative stochastic algorithms, and the objective function also has great influence on their behaviors. A successful model of EAs should describe both the mechanisms of the algorithm and the influence from the objective function. One way to study EAs is to model them as dynamical systems. The idea is to pick a certain quantity of interest first, such as the distribution of the population or a certain statistic about it. Then, transitions in the state space of the picked quantity are modeled. A transition matrix (when the state space is finite) or a difference equation (when the state space is not finite) for Markov chain is derived to describe how the picked quantity changes between consecutive generations.
In order to characterize the population dynamics accurately, the state space of the Markov chain tends to grow rapidly as the population size increases. As a result, even for time-homogeneous EAs with moderate population size, the Markov chain is often too large to be analyzed or simulated. To overcome this issue, some researchers turn to studying the limiting behaviors of EAs as the population size goes to infinity. The idea is to exploit some kind of symmetry in the state space (such as all individuals having the same marginal distribution), and prove that in the limit the Markov chain can be described by a more compact model. Models built in this way are called infinite population models (IPMs).
In this article, we follow this line of research and study IPMs of EAs on continuous space. More specifically, we aim at rigorously proving the convergence of IPMs. In this study, by “convergence” we usually mean that IPMs characterize limiting behaviors of real EAs. ``An IPM converges loosely'' means that as the population size goes to infinity, the population dynamics of the real EA converge in a sense to the population dynamics predicted by this model. This usage is different from conventional ones where it means that the EA eventually locates and gets stuck in some local or global optima. Convergence results are the foundations and justifications of IPMs.
The main results of the article can be summarized as follows. First, we show that a widely cited research on convergence of IPM was problematic. It is mainly because the core assumption of exchangeability of individuals in their proof cannot lead to the convergence conclusion. Then, we build an analytical framework from a different perspective and show that it defines convergence in general settings. Then, to show the effectiveness of our framework, we prove the convergence of IPM of simple EA with mutation and crossover operators when the initial population follows an identical and independent distribution (i.i.d.). Finally, we discuss the results and point out that the convergence of IPM of simple EA with proportionate selection is yet to be developed.
In Section 2, we will examine Qi's model and show that their study is unsound and incomplete. First, we provide a counterexample to show that in the authors' proof a key assertion about the law of large numbers (LLN) for exchangeable random vectors is generally not true. Therefore, the whole proof is unsound. Furthermore, we show that the modeling assumption of exchangeability of individuals cannot yield the transition equation in general. This means that under the authors' modeling assumption, the conclusion (1) cannot be reached.
In addition, we show that the authors' proofs in Qi and Palmieri (1994a,b) are incomplete. The authors did not address the convergence of the stacking of operators and of recursively iterating the algorithm. In essence, the authors attempted to prove the convergence of the IPM for only one iteration step. Even if the proof for (1) is correct, it only shows that as the marginal p.d.f.s of the th population converges point-wise to , provided that the marginal p.d.f. of the th generation is . However, this convergence does not automatically hold for all subsequent generations. As a result, (1) cannot be iterated to make predictions for subsequent () generations.
Besides Qi and Palmieri (1994a,b), we found no other studies that attempted to prove the convergence of IPMs for EAs on continuous space. Therefore, in Section 3 we propose a general analytical framework. The novelty of our framework is that from the very start of the analysis, we model generations of the population as random elements taking values in the metric space of infinite sequences, and we use convergence in distribution instead of point-wise convergence to define the convergence of IPMs.
To illustrate the effectiveness of our framework, we perform convergence analysis of IPM of the simple EA in Sections 4 and 5. In Section 4, we adopted a “stronger” modeling assumption that individuals of the same generation in the IPM are identically and independently distributed (i.i.d.), and we gave sufficient conditions under which the IPM is convergent. For general EA, this assumption may seem restricted at first sight, but it turns out to be a reasonable one. In Section 5, we analyze the mutation operator and the -ary recombination operator. We show that these commonly used operators have the property of producing i.i.d. populations, in the sense that if the initial population is i.i.d., as the population size goes to infinity, in the limit all subsequent generations are also i.i.d. This means that for these operators, the transition equation in the IPM can predict the real population dynamics as the population size goes to infinity. We also show that our results hold even if these operators are stacked together and iterated repeatedly by the algorithm. Finally, in Section 6 we conclude the article and propose future research.
To be complete, regarding Qi and Palmieri (1994a,b), there is a comment by Yong et al. (1998) with a reply published. However, the comment was mainly about the latter part of Qi and Palmieri (1994a), where the authors analyzed the properties of EAs based on the IPM. It did not discuss the proof for the model itself. For IPMs of EAs on discrete optimization problems, extensive research was done by Vose et al. in a series of studies (Nix and Vose, 1992; Vose, 1999b, 1999a, 2004). The problems under consideration were discrete optimization problems with finite solution space. The starting point of the authors' analysis was to model each generation of the population as an “incidence vector,” which describes for each point in the solution space the proportion of the population it occupies. Based on this representation the authors derived transition equations between incidence vectors of consecutive generations and analyzed their properties as the population size goes to infinity. However, for EAs on continuous solution space, the analyses of Vose et al. are not immediately applicable. This is because for continuous optimization problems the solution space is not denumerable. Therefore, the population cannot be described by a finite-dimensional incidence vector.
2 Discussion of the Works of Qi et al.
In this section we analyze the results of Qi and Palmieri (1994a,b). We begin by introducing some preliminaries for the analysis. Then, in Section 2.2, following the notations and derivations in the authors' papers, we provide a counterexample to show that the convergence proof for the transition equation in Qi and Palmieri (1994a) is problematic. We further show that the modeling assumption of exchangeability cannot yield the transition equation in general. In Section 2.3, we show that the analyses in Qi and Palmieri (1994a,b) are incomplete. The authors did not prove the convergence of IPMs in the cases where operators are stacked together and the algorithm is iterated for multiple generations.
2.1 Preliminaries
After presenting the optimization problem and the algorithm, the authors proved the convergence of the IPM if the distributions of individuals in the population are exchangeable. It is the main result in Qi and Palmieri (1994a).
In Theorem 1, is the marginal p.d.f. of the th generation predicted by the IPM. It should be emphasized that in Qi and Palmieri (1994a,b), the authors proved this theorem under the assumption that simple EA has exchangeable individuals in the population. Though not explicitly stated in the theorem, the assumption of exchangeability is the core assumption in their proof and an integral part of their formulation of the theorem.
For analyses in this article, we use the concept of exchangeability in probability theory. Its definition and some basic facts are listed.
A finite set of random variables is said to be exchangeable if the joint distribution of is invariant with respect to permutations of the indices . A collection of random variables is said to be exchangeable if every finite subset of is exchangeable.
Definition 1 can also be extended to cover exchangeable random vectors or exchangeable random elements by replacing the term “random variables” in the definition with the respective term. One property of exchangeability is that if are exchangeable random elements, then the joint distributions of any distinct ones of them are always the same (Proposition 1.1.1 in Taylor et al., 1985). When this property indicates that have the same marginal distribution. Another property is that a collection of random elements are exchangeable if and only if they are conditionally independent and identically distributed (c.i.i.d.) given some -field (Theorem 1.2.2 in Taylor et al., 1985). Conversely, a collection of c.i.i.d. random elements are always exchangeable. Finally, it is obvious that i.i.d. random elements are exchangeable, but the converse is not necessarily true.
It can be seen that the simple EA generates c.i.i.d. individuals given the current population. Therefore, the individuals within the same generation are exchangeable, and they have the same marginal distribution. It is the core assumption of the proof in Qi and Palmieri (1994a,b) and Condition 3 in the Theorem 1.
2.2 Convergence Proof of the Transition Equation
2.2.1 On Assertion (11)
However, we use the following counterexample (modified from Example 1.1.1 and related discussions on pages 11–12 in Taylor et al. (1985), to show that assertion (16) is not true. Therefore, (11) is not true.
2.2.2 Counterexample
It can easily be verified that and satisfy (13) and (15). Since is bounded, for any . By the strong law of large numbers (SLLN) for i.i.d. random variables, ; therefore, (14) is also satisfied, that is, is the limit of as . However, because and is independent of , it can be seen that is not independent of except for some degenerate cases (e.g., when equals to a constant). In particular, in general is not independent of for any . Therefore, assertion (16) is not true. Equivalently, assertion (11) is not true. This renders the authors' proof for (12) invalid.
2.2.3 Further Analysis
In the following, we carry out further analysis to show that (12) cannot be true even considering other methods of proof and adding new sufficient conditions. Therefore, in general, Theorem 1 cannot be true.
To begin with, consider the random variable . We prove the following lemma.
as .
Now it is clear that if the only assumption is exchangeability, () is not true even considering other methods of proof. Of course, if (11) is true, and are independent, then () is true. However, as already shown by the counterexample, (11) is not true in general. Therefore, (), and equivalently Theorem 1, are in general not true.
A natural question then arises: Is it possible to introduce some reasonable sufficient conditions such that () can be proved? One of such conditions frequently used is that . However, the following analysis shows that given the modeling assumption of exchangeability, this condition is not true in general. Therefore, it cannot be introduced.
2.3 The Issue of the Stacking of Operators and Iterating the Algorithm
In the following, we discuss IPMs from another perspective and show that the proofs in Qi and Palmieri (1994a,b) are incomplete, as they only consider convergence of one iteration. Because discussion in this section relates very closely to our proposed framework, we will use our notation from here on in our article consistently, which is different from the previous two sections where we followed strictly the notations of Qi and Palmieri (1994a,b).
Consider an EA with only one operator. Let the operator be denoted by . When the population size is , denote this EA by and the operator it actually uses by . Let denote the th generation produced by . Then the transition rules between consecutive generations produced by can be described by . In Table 1, we write down the population dynamics of . Each row in Table 1 shows the population dynamics produced by . In the table is expanded as . Let denote the IPM, and denote the populations predicted by . Then we can summarize the results in Qi and Palmieri (1994a) in the following way.
However, apart from the fact that this proof is problematic, the authors' proof covers only one iteration step, corresponding to the column-wise convergence of the column in Table 1. The problem is that even if (21) is true, it does not automatically lead to the conclusion that for the arbitrary th step, as . In other words, one has to study whether the transition equation for one step can be iterated recursively to predict populations after multiple steps. In Table 1, this problem corresponds to whether other columns have similar column-wise convergence property when the convergence of the column is proved.
The issue of the stacking of operators is similar. Given some operator satisfying (20) and some operator satisfying as , it is not necessarily true that as . However, the authors in Qi and Palmieri (1994b) totally ignored this issue and combined the transition equations for selection, mutation and crossover together (in Section III of Qi and Palmieri 1994b) without any justification.
In addition, there are several statements in the authors' proofs in Qi and Palmieri (1994b) that are questionable. First, in the first paragraph of Appendix A (the proof for Theorem 1 in that paper), the authors considered a pair of parents and for the uniform crossover operator. and are “drawn from the population independently with the same density of .” Then, the authors claimed that “the joint density of and is therefore .” This is simply not true. Two individuals drawn independently from the same population are conditionally independent, they are not necessarily independent. In fact, without the i.i.d. assumption, it is very likely that individuals in the same population are dependent. Therefore, the joint density function of and is not necessarily , and the authors' proof for Theorem 1 in Qi and Palmieri (1994b) is dubious at best. On the other hand, if the authors' modeling assumption is i.i.d. of individuals for the uniform crossover operator, this assumption is incompatible with the modeling assumption of exchangeability in Qi and Palmieri (1994a) for selection and mutation. Therefore, combining the transition equations for all these three operators is problematic, because the i.i.d. assumption cannot hold beyond one iteration step.
Another issue in Qi and Palmieri (1994b) is that the uniform crossover operator produces two dependent offspring at the same time. As a result, after uniform crossover, the intermediate population is not even exchangeable because it has pair-wise dependency between individuals. Then the same incompatible assumption problem arises: that is, the transition equation for the uniform crossover operator cannot be combined with the transition equations for selection and mutation. Besides, the transition equation for the uniform crossover operator cannot be iterated beyond one step and still hold i.i.d. or exchangeability as its modeling assumption.
In summary, several issues arise from previous studies on IPMs for EAs on continuous optimization problems. Therefore, new frameworks and proof methods are needed for analyzing the convergence of IPMs and studying the issue of the stacking of operators and iterating the algorithm.
3 Proposed Framework
In this section, we present our proposed analytical framework. In constructing the framework we strive to achieve the following three goals.
The framework should be general enough to cover real-world operators and to characterize the evolutionary process of real EA.
The framework should be able to define the convergence of IPMs and serve as justifications of using them. The definition should match one's intuition and at the same time be mathematically rigorous.
The framework should provide an infrastructure to study the issue of the stacking of operators and iterating the algorithm.
The contents of this section roughly reflect the pursuit of the first two goals. The third goal is reflected in the sufficient conditions for convergence and i.i.d. IPM construction in Section 4 and the analyses of the simple EA in Section 5. More specifically, in Section 3.1, we introduce notations and preliminaries for the remainder of this article. In Section 3.2, we present our framework. In the framework, each generation is modeled by a random sequence. This approach unifies the spaces of random elements modeling populations of different sizes. In Section 3.3, we define the convergence of the IPM as convergence in distribution on the space of random sequences. We summarize and discuss our framework in Section 3.4.
3.1 Notations and Preliminaries
From now on we use to denote the set of nonnegative integers and the set of positive integers. For any two real numbers and , let be the smaller one of them and be the larger one of them. Let be random elements of some measurable space . We use to represent the law of . If and follow the same law, that is, for every , we write . Note that and have different meanings. In particular, indicates dependency between and .
We use the notation to represent the array . When , represents the infinite sequence . and represent the collections and , respectively. When the range is clear, we use and or and for short.
Let denote the solution space . This simplifies the notation system when we discuss the spaces and . In the following, we define metrics and -fields on , and and state properties of the corresponding measurable spaces.
is equipped with the ordinary metric . Let denote the Borel -field on generated by the open sets under . Together defines a measurable space.
Similarly, is equipped with the metric , and the corresponding Borel -field under is denoted by . Together is the measurable space for tuples.
Next, consider the space of infinite sequences . It is equipped with the metric . The Borel -field on under is denoted by . Then is the measurable space for infinite sequences.
Since is separable and complete, it can be proved that and are also separable and complete (Appendix M6 in Billingsley, 1999). In addition, because of separability, the Borel -fields and are equal to and , respectively. In other words, the Borel -fields and generated by the collection of open sets under the corresponding metrics coincide with the product -fields generated by all measurable rectangles () and all measurable cylinder sets (), respectively (Lemma 1.2 in Kallenberg, 2002). Therefore, from now on we write and for the corresponding Borel -fields. Finally, let , and denote the set of all random elements of , and , respectively.
Let be the natural projection: . Since given , defines a random element of projected from , we also use to denote the mapping: where . By definition, is the operator which truncates random sequences to random vectors. Given , we use to denote the projection of ; that is, .
3.2 Analytical Framework for EA and IPMs
In this section, we present an analytical framework for the EA and IPMs. First, the modeling assumptions are stated. We deal only with operators that generate c.i.i.d. individuals. Then, we present an abstraction of the EA and IPMs. This abstraction serves as the basis for building our framework. Finally, the framework is presented. It unifies the range spaces of the random elements and defines the convergence of IPMs.
3.2.1 Modeling Assumptions
We assume that the EA on the problem (23) is time homogeneous and Markovian, such that the next generation depends only on the current one, and the transition rule from the th generation to the th generation is invariant with respect to . We further assume that individuals in the next generation are c.i.i.d. given the current generation. As this assumption is the only extra assumption introduced in the framework, it may need some further explanation.
The main reason for introducing this assumption is to simplify the analysis. Conditional independence implies exchangeability, therefore individuals in the same generation are always exchangeable. As a result, it is possible to exploit the symmetry in the population and study the transition equations of marginal distributions. Besides, it is because of conditional independence that we can easily expand the random elements modeling finite-sized populations to random sequences, and therefore define convergence in distribution for random elements of the corresponding metric space. In addition, many real world operators in EAs satisfy this assumption, such as the proportionate selection operator and the crossover operator analyzed in Qi and Palmieri (1994a,b). Finally, we emphasize that exchangeability is a property that facilitates the analysis of convergence across multiple iterations and different operators. Because of the generational nature of EA, we want a property that leads to convergence for one iteration and also holds as a premise for the analysis of the next iteration. Exchangeability serves as this property in our analysis. By exchangeability the convergence results can be extended to further iterations and stacked with results from other operators.
However, we admit that there are some exceptions to our assumption. A most notable one may be the mutation operator, though it does not pose significant difficulties. The mutation operator perturbs each individual in the current population independently, according to a common conditional p.d.f. If the current population is not exchangeable, then after mutation the resultant population is not exchangeable, either. Therefore, it seems that mutation does not produce c.i.i.d. individuals. However, considering the fact that mutation is often used along with other operators, as long as these other operators generate c.i.i.d. populations, the individuals after mutation will be c.i.i.d., too. Therefore, a combined operator of mutation and any other operator satisfying the c.i.i.d. assumption can satisfy our assumption. An example can be seen in Qi and Palmieri (1994a), where mutation is analyzed together with proportionate selection. On the other hand, an algorithm which only uses mutation is very simple. It can be readily modeled and analyzed without much difficulty.
Perhaps more significant exceptions are operators such as selection without replacement, or the crossover operator which produces two dependent offspring at the same time. In fact, for these operators not satisfying the c.i.i.d. assumption, it is still possible to expand the random elements modeling finite-sized population to random sequences. For example, the random elements can be padded with some fixed constants or random elements of known distributions to form the random sequences. In this way, our definition of the convergence of IPMs can still be applied. However, whether in this scenario convergence in distribution for these random sequences can still yield meaningful results similar to the transition equation is another research problem. It may need further investigation. Nonetheless, our assumption is equivalent to the exchangeability assumption generally used in previous studies.
3.2.2 The Abstraction of EA and IPMs
Given the modeling assumptions, we develop an abstraction to describe the population dynamics of the EA and IPMs.
Let the EA with population size be denoted by , and the th generation it produces be modeled as a random element , where is a random element representing the th individual in . Without loss of generality, assume that the EA has two operators, and . In each iteration, the EA first employs on the current population to generate an intermediate population, on which it then employs to generate the next population. Notice that here and are just terms representing the operators in the real EA. They facilitate describing the evolutionary process. For , and are actually instantiated as functions from to , denoted by and , respectively. For example, if represents proportionate selection, the function is the actual operator in generating c.i.i.d. individuals according to the conditional probability (3). Of course, for the above abstraction to be valid, the operators used in should actually produce random elements in ; that is, the newly generated population should be measurable on . As most operators in real EAs satisfy this condition and this is the assumption implicitly taken in previous studies, we assume that this condition is automatically satisfied.
3.2.3 The Proposed Framework
As stated before, for each generation , the elements of the sequence and the limit are all random elements of different metric spaces. Therefore, the core of developing our model is to expand to random sequences, while ensuring that this expansion will not affect modeling the evolutionary process of the real EA. The result of this step is the sequence of random sequences for each , which completely describes the population dynamics of . For the population dynamics of , we just let .
The expansion of and the relationships between , , and are the core of our framework. In the following, we present them rigorously.
3.2.4 The Expansion of
We start by decomposing each of and to two operators. One operator is from to . It corresponds to how to convert random sequences to random vectors. A natural choice is the projection operator .
To model the evolutionary process, we also have to define how to expand random vectors to random sequences. In other words, we have to define the expansions of and , which are functions from to .
For an operator satisfying the condition that for any , the elements of are c.i.i.d. given , the expansion of is the operator , satisfying that for any ,
.
The elements of are c.i.i.d. given .
In Definition 2, the operator is the expansion of . Condition 1 ensures that can be safely replaced by . Condition 2 ensures that the paddings for the sequence are generated according to the same conditional probability distribution as that used by to generate new individuals. In other words, if the operator describes how generates each new individual from the current population, is equivalent to invoking independently on the current population for times, and is equivalent to invoking independently for infinite times. Finally, because satisfies the condition in the premise, the expansion always exists.
Essentially, (26) and (27) describe how the algorithm progresses in the order . It fully characterizes the population dynamics , and it is clear that the extra step of generating does not introduce modeling errors.
In summary, the relationships between , and are better illustrated in Figure 2. This is the core of our framework for modeling the EA and IPMs. For clarity, we also show the intermediate populations generated by (denoted by ), their expansions (denoted by ), and their counterparts generated by (denoted by ), respectively. How they fit in the evolutionary process can be clearly seen in the figure.
In Figure 2, a solid arrow with an operator on it means that the item at the arrow head equals the result of applying the operator on the item at the arrow tail. For example, from the figure it can be read that . Dashed arrow with a question mark on it signals the place to check whether convergence in distribution holds. For example, when , it should be checked whether converges to as .
Finally, one distinction needs special notice. For and (), consider the operators to generate and . It is clear that and are two different operators because their domains and ranges are all different. The distinction still exists when we consider , though it is more subtle and likely to be ignored. In Figure 2, if we consider the operator , it is clear that uses the same mechanism to generate new individuals as the one used in , and describes the same population dynamics as that generated by . However, if we choose , and are both functions from to . Therefore, checking domains and ranges are not enough to discern and . It is important to realize that the distinction between and lies in the contents of the functions. and use and individuals in the current population to generate the new population, respectively, although the new population contains infinite number of individuals. In short, and are the EA instantiated with different population sizes. Mathematically, the corresponding population dynamics are modeled by stochastic processes involving different operators, even though their domains and ranges may be the same. The same conclusion also holds for the operator .
3.3 Convergence of IPMs
Given the framework modeling the EA and IPMs, first, we define convergence in distribution for random elements of . This is standard material. Then, the convergence of IPMs is defined by requiring that the sequence converges to for every .
3.3.1 Convergence in Distribution
As are random elements of , in the following we define convergence in distribution for sequences of -valued random elements. Convergence in distribution is equivalent to weak convergence of induced probability measures of the random elements. We use the former theory because when modeling individuals and populations as random elements, the former theory is more intuitive and straightforward. The following materials are standard. They contain the definition of convergence in distribution for random elements, as well as some useful definitions and theorems which are used in our analysis of the simple EA. Most of the materials are collected from the theorems and examples in Sections 1–3 of Billingsley (1999). The definition of Prokhorov metric is collected from Section 11.3 in Dudley (2002).
Let be random elements defined on a hidden probability space taking values in some separable metric space . is coupled with the Borel -field . Let be a separable measurable space other than .
If the sequence satisfies the condition that for every bounded, continuous function , we say converges in distribution to , and write .
For , let . Then it is well known that convergence in distribution on separable metric spaces can be metricized by the Prokhorov metric.
Call a set in an -continuity set if , where is the boundary set of .
The following statements are equivalent.
.
for all closed set .
for all open .
for all -continuity set .
Suppose is a measurable function. Denote by the set of discontinuities of . If and , then .
Let be random elements of , be random elements of , then and are random elements of . Note that is separable.
If is independent of and is independent of for all , then if and only if and .
Let be random elements of .
if and only if for any .
Theorem 5 basically asserts that convergence in distribution for countably infinite dimensional random elements can be studied through their finite-dimensional projections. It is adapted from Example 1.2 and Example 2.4 in Billingsley (1999). In Billingsley (1999), the metric space under consideration is . However, as both and are separable, it is not difficult to adapt the proofs for to a proof for Theorem 5. Note that are random elements defined on taking values in , and for every . The same is true for .
3.3.2 Convergence of IPM
As convergence in distribution is properly defined, we can use the theory to define convergence of IPMs. The idea is that IPM is convergent (thus justifiable) if and only if it can predict the limit distribution of the population dynamics of for every generation as the population size goes to infinity. It captures the limiting behaviors of real EAs.
Definition 5 is essentially the core of our proposed framework. It defines the convergence of IPM and is rigorous and clear.
3.4 Summary
In this section, we built a framework to analyze the convergence of IPMs. The most significant feature of the framework is that we model the populations as random sequences, thereby unifying the ranges of the random elements in a common metric space. Then, we gave a rigorous definition for the convergence of IPMs based on the theory of convergence in distribution.
Our framework is general. It only requires that operators produce c.i.i.d. individuals. In fact, any EA and IPM satisfying this assumption can be put into the framework. However, to obtain meaningful results, the convergence of IPMs has to be proved. This may require extra analyses on IPM and the inner mechanisms of the operators. These analyses are presented in Sections 4 and 5.
Finally, there is one question worth discussing. In our framework, the expansion of operator is carried out by padding the finite population with c.i.i.d. individuals following the same marginal distribution. Then a question naturally arises: why not pad the finite population with some other random elements, or just with the constant 0? This idea deserves consideration. After all, if the expansion is conducted by padding 0s, the requirement of c.i.i.d. can be discarded, and the framework and the convergence of IPMs stay the same. However, we did not choose this approach. The reason is that padding the population with c.i.i.d. individuals facilitates the analysis of the IPM. For example, in our analysis in Sections 4 and 5, the sufficient conditions for the convergence of IPMs require us to consider , where is the operator under analysis. uses the first elements of to generate new individuals. Now if and is expanded from by padding 0s, does not make any sense because the individuals used by have 0s. This restricts our option in proving the convergence of IPMs.
4 Sufficient Conditions for Convergence of IPMs and I.I.D. IPM Construction
In this section, we give applicable sufficient conditions for convergence of IPMs. In Section 4.1, we give sufficient conditions for the convergence of IPMs. To appreciate the necessity, consider the framework in Figure 2. To prove the convergence of IPM, by Definition 5, we should check whether as for every . However, this direct approach is usually not viable. To manually check the convergence for all values of is wearisome and sometimes difficult. This is because as increases, the distributions of and change. Therefore, the method needed to prove as may be different from the method needed to prove as . Of course, after proving the cases for several values of , it may be possible to discover some patterns in the proofs, which can be extended to cover other values of , thus proving the convergence of the IPM. But this process is still tedious and uncertain.
However, this approach still seems difficult because we have to prove this pass-on relation (30) holds for every . In essence, this corresponds to whether the operators in IPM can be stacked together and iterated for any number of steps. This is the issue of the stacking of operators and iterating the algorithm. Therefore, in Section 4.1, we give sufficient conditions for this to hold. These conditions are important. If they hold, proving the convergence of the overall IPM can be broken down to proving the convergence of one iteration step of each operator in IPM. This greatly reduces the difficulty in deriving the proof.
To model real EAs, IPM has to be constructed reasonably. As shown in Section 2, exchangeability cannot yield the transition equation for the simple EA. This creates the research problem of finding a suitable modeling assumption to derive IPM. Therefore, in Section 4.2, we discuss the issue and propose to use i.i.d. as the modeling assumption in IPM.
4.1 Sufficient Conditions for Convergence of IPMs
To derive sufficient conditions for the convergence of the overall IPM, the core step is to derive conditions under which the operators in the IPM can be stacked and iterated.
As before, let and denote the EA with population size and the IPM under analysis, respectively. Let be an operator in the EA, and and be its corresponding expanded operators in and , respectively. Note that and generate random elements of . To give an example, and may correspond to and in Figure 2, respectively.
We define a property under which can be stacked with some other operator satisfying the same property without affecting the convergence of the overall IPM. In other words, for an EA using and as its operators, we can prove the convergence of IPM by studying and separately. We call this property “the stacking property.” It is worth noting that if , then this property guarantees that can be iterated for any number of times. Therefore, it also resolves the issue of iterating the algorithm.
Let be random elements in for . We have the following results.
Given , if for any converging sequence , as always holds, then we say that has the stacking property on .
If and have the stacking property on , then has the stacking property on .
For any converging sequence , because has the stacking property on , we have . Then, is also a converging sequence. Since has the stacking property on , then by definition we immediately have .
By Theorem 6, any composition of and has the stacking property on . In particular, has the stacking property on . The stacking property essentially guarantees that the convergence on can be passed on to subsequent generations.
For an EA consisting of a single operator , let be modeled by in the IPM, and have the stacking property on some space . If the initial populations of both EA and follow the same distribution for some , then converges.
By Theorems 6 and 7, we can prove the convergence of the overall IPM by proving that the operators in the IPM have the stacking property. Comparing with (30), it is clear that the stacking property is a sufficient condition. This is because the stacking property requires that converges to a point in for any converging sequence satisfying , while (29) requires the convergence to hold only for the specific converging sequence . Since is generated by the algorithm, it may have special characteristics regarding converging rate, distributions, etc. On the other hand, checking the stacking property may be easier than proving (29). This is because the stacking property is independent of the generation number .
Another point worth discussing is the introduction of in Definition 6. Of course, if we omit (or equivalently let ), the stacking property will become “stronger” because if it holds, the convergence of the IPM is proved for the EA starting from any initial population. However, in that case the condition is so restricted that the stacking property cannot be proved for many operators.
In Definition 6, it is required that as . The sequence under investigation is , which is a sequence of changing operators on a sequence of changing inputs . As both the operators and the inputs change, the convergence of may still be difficult to prove. Therefore, in the following, we further derive two sufficient conditions for the stacking property.
First, let , where . Then, we have the following sufficient conditions for the stacking property.
For a space and all converging sequences , if the following two conditions
, such that for all , uniformly as , i.e. as ,
as ,
are both met, then has the stacking property on .
For a space and all converging sequences , if the following two conditions
, such that for all , uniformly as , i.e. as ,
as ,
are both met, then has the stacking property on .
Since Theorems 8 and 9 are symmetric in and , proving one of them leads to the other. In the following, we prove Theorem 8. Recall that is the Prokhorov metric (Definition 4) and gets the maximal in the expression.
To understand these two theorems, consider the relationships between and illustrated by Figure 3. In the figure, the solid arrow represents the premise in Definition 6; that is, as . The double line arrow represents the direction to be proved for the stacking property on , i.e. as . The dashed arrows are the directions to be checked for Theorem 8 to hold. The wavy arrows are the directions to be checked for Theorem 9 to hold.
Now it is clear that Theorems 8 and 9 bring benefits. For example, for Theorem 8, instead of proving the convergence for a sequence generated by changing operators and inputs (), this sufficient condition considers the convergence of sequences generated by the same operator on changing inputs () and of the sequence generated by changing operators on the same input ().
4.2 The I.I.D. Assumption
In this section, we address the issue of how to construct IPM. This issue also corresponds to how to choose the space for the stacking property.
Before introducing the i.i.d. assumption, let us give an example. Consider the space . If the initial population follows some distribution from , then the population consists of all identical individuals. If an EA with proportionate selection and crossover operates on this initial population, then all subsequent populations stay the same as the initial population. An IPM of this EA can be easily constructed, and it can be easily proved that the stacking property holds as long as the EA chooses its initial population from . However, this is not a very interesting case. This is because is too small to model real EAs.
On the other hand, if , may be too big to derive meaningful results. This can be seen from our analysis in Section 2 which shows that under exchangeability it is not possible to derive transition equations of marginal distributions for the simple EA.
Therefore, choosing should strike a balance between the capacity and the complexity of the IPM. In the following analysis, we choose to be . IPMs of EAs are constructed using the i.i.d. assumption, and we prove the convergence of the overall IPM by proving that the operators in the IPM have the stacking property on .
We choose for the following reasons. First, in the real world, many EAs generate i.i.d. initial populations. Therefore this assumption is realistic. Secondly, i.i.d. random elements have the same marginal distributions. Therefore, IPM can be described by transition equations of marginal distributions. Finally, there is abundant literature on the converging laws and limit theorems of i.i.d. sequences. Therefore, the difficulty in constructing IPM can be greatly reduced compared with using other modeling assumptions.
In the following, we show how to construct IPM under the i.i.d. assumption. This process also relates to condition 2 in Theorem 8. It essentially describes how the IPM generates new populations.
To better understand the construction, it is important to realize that for both the input and the output are i.i.d. In other words, generates i.i.d. population dynamics to simulate the real population dynamics produced by , only that the transition equation in is derived by mimicking how generates each new individual on i.i.d. inputs and taking the population size to infinity. In fact, if the stacking property on is proved and the initial population is i.i.d., will always take i.i.d. inputs and produce i.i.d. outputs. The behaviors of on are well-defined. On the other hand, is not defined in the construction. This leaves us freedom. We can define freely to facilitate proving the stacking property of . In particular, for in Figure 3 can be defined freely to facilitate the analysis.
In fact, under the i.i.d. assumption, deriving the transition equation for most operators is the easy part. The more difficult part is to prove the stacking property of on . To give an example, consider the transition equation (5) constructed in Qi and Palmieri (1994a), which models the joint effects of proportionate selection and mutation. As our analysis in Section 2 shows, it does not hold under the assumption of exchangeability. However, if the modeling assumption is i.i.d., the transition equation can be immediately proved (see our analysis in Section 2). This also applies to the transition equation built by the same authors for the uniform crossover operator (in Theorem 1 of Qi and Palmieri, 1994b), where the transition equation is in fact constructed under the i.i.d. assumption. Therefore, in the following analyses, we do not refer to the explicit form of the transition equation, unless it is needed. We only assume that the transition equation is successfully constructed, and it has the form (32) which is derived from (31) as .
The construction of the IPM also relates partly to condition 2 in Theorem 8. Comparing with this condition, it can be seen that for a successfully constructed , the following two facts are proved in the construction (m.p.w. stands for point-wise convergence of marginal p.d.f.s.).
as .
.
Of course, these two facts are not sufficient for this condition to hold. One still needs to prove as . In other words, one has to consider convergence of finite dimensional distributions.
Finally, we sometimes use for if is clear in the context. For example (30) can be rewritten as , where takes 's place and means the population size which the operator is operating on.
5 Analysis of the Simple EA
5.1 Analysis of the Mutation Operator
Having derived sufficient conditions for the stacking property and constructed the IPM, we prove the convergence of the IPM of the mutation operator first. Mutation adds an i.i.d. random vector to each individual in the population. If the current population is , then the population after mutation satisfies for all , where is a random element decided by the mutation operator. As the content of the mutation operator does not depend on , we just write to represent . To give an example, may be the sequence with all mutually independent and for all , where is the multivariate normal distribution with mean and covariance matrix , and is the -dimensional identity matrix. Note that every time is invoked, it generates perturbations independently. For example, let and be two populations, then we can write for satisfying and are mutually independent and independent from .
Next, consider . Recall that as an IPM, simulates real population dynamics by taking i.i.d. inputs and producing i.i.d. outputs. If the marginal p.d.f.s of and are and , respectively, then generates i.i.d. individuals whose p.d.f.s are , where stands for convolution. Given the construction, we can prove the stacking property of .
Let be the mutation operator, and be the corresponding operator in the IPM constructed under the i.i.d. assumption, then has the stacking property on .
We use the notations and premises in Theorem 8. Refer to Figure 3. In particular, the sequence and the limit are given and as .
Noting that condition 1 in Theorem 8 is equivalent to , we prove this condition by proving that for all . Then by Theorem 5, condition 1 in Theorem 8 holds. Then, as both conditions in Theorem 8 are satisfied, this theorem is proved.
Now, we prove for all . First, note that for all . are i.i.d. and independent from . In addition, for every , .
Since , it is apparent that . Then by Theorem 5, we have and .
In the proof, we concatenate the input () and the randomness () of the mutation operator in a common product space, and represent as a continuous function in that space. This technique is also used when analyzing other operators.
5.2 Analysis of -ary Recombination
Consider the IPM . As stated in Section 4.2, we do not give the explicit form of the transition equation in . We assume that the IPM is successfully constructed, and the transition equation is derived by taking in (31). The reason for this approach is not only because deriving the transition equation is generally easier than proving the convergence of the IPM, but also the formulation in (36) and (37) encompasses many real-world -ary recombination operators. We do not delve into details of the mechanisms of these operators and derive a transition equation for each one of them. Instead, our approach is general in that as long as the IPM is successfully constructed, our analysis on the convergence of the IPM can always be applied.
The following theorem is the primary result of our analysis for the -ary recombination operator.
Let be the -ary recombination operator, and be the corresponding operator in the IPM constructed under the i.i.d. assumption, then has the stacking property on .
We use the notations and premises in Theorem 9. Refer to Figure 3. In particular, the sequence and the limit are given and as .
The overall idea to prove (38) is that we first prove the convergence in distribution for the selected parents; then because the recombination operator is continuous, (39) follows.
First, we decompose the operator . generates the c.i.i.d. outputs one by one. This generation process can also be viewed as first selecting the groups of parents at once from the first elements of the input (in total the intermediate output is parents not necessarily distinct), then producing the outputs one by one by using each group of parents. In the following, we describe this process mathematically.
To prove (44), we prove the following two conditions.
, such that for all , uniformly as ; that is, as .
as and is i.i.d.
These two conditions correspond to the conditions in Theorem 9. Since is from to , we cannot directly apply Theorem 9. However, it is easy to extend the proof of Theorem 9 to prove that these two conditions lead to as . Then, by (40) it is apparent that is a continuous function of its input and inner randomness. By concatenating the input and the inner randomness using the same technique as that used in the proof for Theorem 10, (44) can be proved. Then this theorem is proved.
In the remainder of the proof, we prove conditions 1 and 2. These conditions can be understood by replacing the top line with in Figure 3.
Proof of Condition 2
Since (recall that can be viewed both as a mapping from to and from to ), is continuous (see Example 1.2 in Billingsley, 1999). Since , by Theorem 3, . Apparently, is i.i.d. Therefore condition 2 is proved.
It is worth noting that this simple proof comes partly from our extension of to inputs . In fact, the only requirement for is (42); that is, should model on i.i.d. inputs. By defining to be , it can take non-i.i.d. inputs such as . Thus this condition can be proved. In Figure 3, this corresponds to our freedom of defining .
Proof of Condition 1
To prove condition 1, we first give another representation of , where and . This representation is based on the following mutually exclusive cases.
The parents chosen from by are distinct.
There are duplicates in the parents which are chosen from by .
We give a brief discussion of the proof. In our opinion, the most critical step of our proof is decomposing the -ary recombination operator to two suboperators: one is responsible for selecting parents (), the other is responsible for combining them (). In addition, for parent selection, the suboperator does not use the information of fitness values. Rather, it selects parents “blindly” according to its own rules (uniform sampling with replacement). This makes the operator easier to analyze because the way it selects parents does not rely on its input. Therefore, we can prove uniform convergence in (50).
Another point worth mentioning is the choice of Theorem 9 in our proof. Though Theorems 8 and 9 are symmetric, the difficulties of proving them are quite different. In fact, it is very difficult to prove the uniform convergence condition in Theorem 8.
Finally, our proof can be easily extended to cover -ary recombination operators using uniform sampling without replacement to select parents for each offspring. The overall proof framework roughly stays the same.
5.3 Summary
In this section, we analyzed the simple EA within the proposed framework. As the analysis shows, although the convergence of IPM is rigorously defined, actually proving the convergence for operators usually takes a lot of effort. We did analysis under the IPM construction and sufficient conditions from Section 4, and used various techniques to analyze the mutation operator and the -ary recombination operator. It can be seen that although the sufficient conditions can provide general directions for the proofs, there are still many details to be worked out in order to analyze different operators.
To appreciate the significance of our work, it is worth noting that in Qi and Palmieri (1994a,b), the convergence of the IPMs of the mutation operator, the uniform crossover operator and the proportionate selection operator was not properly proved, and the issue of stacking of operators and iterating the algorithm was not addressed at all. In this article, however, we have proved the convergence of IPMs of several general operators. Since these general operators cover the operators studied in Qi and Palmieri (1994a,b) as special cases, the convergence of the IPMs of mutation and uniform crossover are actually proved in this article. Besides, our proof does not depend on the explicit form of the transition equation of the IPM. As long as the IPM is constructed under the i.i.d. assumption, our proof is valid.
As a consequence of our result, consider the explicit form of the transition equation for the uniform crossover operator derived in Section II in Qi and Palmieri (1994b). As the authors' proof was problematic and incomplete, the derivation of the transition equation was not well founded. However, it can be seen that the authors' derivation is in fact equivalent to constructing the IPM under the i.i.d. assumption. Since we have already proved the convergence of IPM of the -ary crossover operator, the analysis in Qi and Palmieri (1994b) regarding the explicit form of the transition equation can be retained.
6 Conclusion and Future Research
In this article, we revisited the existing literature on the theoretical foundations of IPMs, and proposed an analytical framework for IPMs based on convergence in distribution for random elements taking values in the metric space of infinite sequences. Under the framework, commonly used operators such as mutation and recombination were analyzed. Our approach and analyses are new. There are many topics worth studying for future research.
Perhaps the most immediate topic is to analyze the proportionate selection operator in our framework. The reason that the mutation operator and the -ary recombination operator can be readily analyzed is partly because they do not use the information of the fitness value. Also to generate a new individual, these operators draw information from a fixed number of parents. On the other hand, to generate each new individual, the proportionate selection operator actually gathers and uses fitness values of the whole population. This makes analyzing proportionate selection difficult.
We think further analysis on proportionate selection can be conducted in the following two directions.
In the analyses we tried to prove the stacking property on for the IPM of proportionate selection. Apart from more efforts trying to prove/disprove this property, it is worth considering modifying the space . For example, we can incorporate the rate of convergence into the space. If we can prove the stacking property on where is the space of converging sequences with rate , it is also a meaningful result.
Another strategy is to bypass the sufficient conditions and return to Definition 5 to prove for every . This is the original method. In essence, it requires studying the convergence of nesting integrals.
Apart from proportionate selection, it is also worth studying whether other operators, such as ranking selection, can be analyzed in our framework. As many of these operators do not generate c.i.i.d. offspring, it makes deriving the IPM and proving its convergence difficult, if not impossible. In this regard, we believe new techniques of modeling and extensions of the framework are fruitful directions for further research.
Finally, it is possible to extend the concept of “incidence vectors” proposed by Vose to the continuous search space. After all, as noted by Vose himself, incidence vectors can also be viewed as marginal p.d.f.s of individuals. As a consequence, the cases of EAs on discrete and continuous solution spaces indeed do bear some resemblance. By an easy extension, the incidence vectors in the continuous space can be defined as functions with the form , where is the Dirac function and is the rational number representing the fraction that appears in the population. If similar analyses based on this extension can be carried out, many results in Nix and Vose (1992) and Vose (1999b,a, 2004) can be extended to the continuous space.