Infinite population models are important tools for studying population dynamics of evolutionary algorithms. They describe how the distributions of populations change between consecutive generations. In general, infinite population models are derived from Markov chains by exploiting symmetries between individuals in the population and analyzing the limit as the population size goes to infinity. In this article, we study the theoretical foundations of infinite population models of evolutionary algorithms on continuous optimization problems. First, we show that the convergence proofs in a widely cited study were in fact problematic and incomplete. We further show that the modeling assumption of exchangeability of individuals cannot yield the transition equation. Then, in order to analyze infinite population models, we build an analytical framework based on convergence in distribution of random elements which take values in the metric space of infinite sequences. The framework is concise and mathematically rigorous. It also provides an infrastructure for studying the convergence of the stacking of operators and of iterating the algorithm which previous studies failed to address. Finally, we use the framework to prove the convergence of infinite population models for the mutation operator and the k-ary recombination operator. We show that these operators can provide accurate predictions for real population dynamics as the population size goes to infinity, provided that the initial population is identically and independently distributed.

Evolutionary algorithms (EAs) are general purpose optimization algorithms with great successes in real-world applications. They are inspired by the evolutionary process in nature. A certain number of candidate solutions to the problem at hand are modeled as individuals in a population. The algorithm evolves the population by mutation, crossover, and natural selection so that individuals with more preferable objective function values have higher survival probability. By the “survival of the fittest” principle, it is likely that after many generations the population will contain individuals with high fitness values such that they are satisfactory solutions to the problem at hand.

Though conceptually simple, the underlying evolutionary processes and the behaviors of EAs remain to be fully understood. The difficulties lie in the fact that EAs are customizable population-based iterative stochastic algorithms, and the objective function also has great influence on their behaviors. A successful model of EAs should describe both the mechanisms of the algorithm and the influence from the objective function. One way to study EAs is to model them as dynamical systems. The idea is to pick a certain quantity of interest first, such as the distribution of the population or a certain statistic about it. Then, transitions in the state space of the picked quantity are modeled. A transition matrix (when the state space is finite) or a difference equation (when the state space is not finite) for Markov chain is derived to describe how the picked quantity changes between consecutive generations.

In order to characterize the population dynamics accurately, the state space of the Markov chain tends to grow rapidly as the population size increases. As a result, even for time-homogeneous EAs with moderate population size, the Markov chain is often too large to be analyzed or simulated. To overcome this issue, some researchers turn to studying the limiting behaviors of EAs as the population size goes to infinity. The idea is to exploit some kind of symmetry in the state space (such as all individuals having the same marginal distribution), and prove that in the limit the Markov chain can be described by a more compact model. Models built in this way are called infinite population models (IPMs).

In this article, we follow this line of research and study IPMs of EAs on continuous space. More specifically, we aim at rigorously proving the convergence of IPMs. In this study, by “convergence” we usually mean that IPMs characterize limiting behaviors of real EAs. ``An IPM converges loosely'' means that as the population size goes to infinity, the population dynamics of the real EA converge in a sense to the population dynamics predicted by this model. This usage is different from conventional ones where it means that the EA eventually locates and gets stuck in some local or global optima. Convergence results are the foundations and justifications of IPMs.

The main results of the article can be summarized as follows. First, we show that a widely cited research on convergence of IPM was problematic. It is mainly because the core assumption of exchangeability of individuals in their proof cannot lead to the convergence conclusion. Then, we build an analytical framework from a different perspective and show that it defines convergence in general settings. Then, to show the effectiveness of our framework, we prove the convergence of IPM of simple EA with mutation and crossover operators when the initial population follows an identical and independent distribution (i.i.d.). Finally, we discuss the results and point out that the convergence of IPM of simple EA with proportionate selection is yet to be developed.

To our knowledge, there are very few research efforts which directly studied the convergence of IPMs. Among them, Vose (1999b) and Qi and Palmieri (1994a,b) are the classic ones. We focus on Qi and Palmieri (1994a,b) in this article as their results are for EA on continuous solution space and most relevant here. In the first part of their study, the authors built a model of simple EA with only mutation and proportionate selection. A transition equation is constructed which describes how the probability density functions (p.d.f.s) of marginal distribution of the population change between consecutive generations. The authors proved that if individuals are exchangeable in a population, as the population size goes to infinity, the marginal p.d.f.s of populations of simple EA will converge point-wise to the p.d.f.s. calculated by the following transition equation:
(1)
where F is the solution space, fxk is the predicted marginal p.d.f. of the kth generation, g is the objective function to be maximized, and fw(x|y) is the conditional p.d.f. decided by the mutation operator. In the second part of the research, the authors further analyzed the crossover operator and modified the transition equation to include all three operators in the simple EA.

In Section 2, we will examine Qi's model and show that their study is unsound and incomplete. First, we provide a counterexample to show that in the authors' proof a key assertion about the law of large numbers (LLN) for exchangeable random vectors is generally not true. Therefore, the whole proof is unsound. Furthermore, we show that the modeling assumption of exchangeability of individuals cannot yield the transition equation in general. This means that under the authors' modeling assumption, the conclusion (1) cannot be reached.

In addition, we show that the authors' proofs in Qi and Palmieri (1994a,b) are incomplete. The authors did not address the convergence of the stacking of operators and of recursively iterating the algorithm. In essence, the authors attempted to prove the convergence of the IPM for only one iteration step. Even if the proof for (1) is correct, it only shows that as n the marginal p.d.f.s of the (k+1)th population converges point-wise to fxk+1(x), provided that the marginal p.d.f. of the kth generation is fxk(x). However, this convergence does not automatically hold for all subsequent generations. As a result, (1) cannot be iterated to make predictions for subsequent (>k+1) generations.

Besides Qi and Palmieri (1994a,b), we found no other studies that attempted to prove the convergence of IPMs for EAs on continuous space. Therefore, in Section 3 we propose a general analytical framework. The novelty of our framework is that from the very start of the analysis, we model generations of the population as random elements taking values in the metric space of infinite sequences, and we use convergence in distribution instead of point-wise convergence to define the convergence of IPMs.

To illustrate the effectiveness of our framework, we perform convergence analysis of IPM of the simple EA in Sections 4 and 5. In Section 4, we adopted a “stronger” modeling assumption that individuals of the same generation in the IPM are identically and independently distributed (i.i.d.), and we gave sufficient conditions under which the IPM is convergent. For general EA, this assumption may seem restricted at first sight, but it turns out to be a reasonable one. In Section 5, we analyze the mutation operator and the k-ary recombination operator. We show that these commonly used operators have the property of producing i.i.d. populations, in the sense that if the initial population is i.i.d., as the population size goes to infinity, in the limit all subsequent generations are also i.i.d. This means that for these operators, the transition equation in the IPM can predict the real population dynamics as the population size goes to infinity. We also show that our results hold even if these operators are stacked together and iterated repeatedly by the algorithm. Finally, in Section 6 we conclude the article and propose future research.

To be complete, regarding Qi and Palmieri (1994a,b), there is a comment by Yong et al. (1998) with a reply published. However, the comment was mainly about the latter part of Qi and Palmieri (1994a), where the authors analyzed the properties of EAs based on the IPM. It did not discuss the proof for the model itself. For IPMs of EAs on discrete optimization problems, extensive research was done by Vose et al. in a series of studies (Nix and Vose, 1992; Vose, 1999b, 1999a, 2004). The problems under consideration were discrete optimization problems with finite solution space. The starting point of the authors' analysis was to model each generation of the population as an “incidence vector,” which describes for each point in the solution space the proportion of the population it occupies. Based on this representation the authors derived transition equations between incidence vectors of consecutive generations and analyzed their properties as the population size goes to infinity. However, for EAs on continuous solution space, the analyses of Vose et al. are not immediately applicable. This is because for continuous optimization problems the solution space is not denumerable. Therefore, the population cannot be described by a finite-dimensional incidence vector.

In this section we analyze the results of Qi and Palmieri (1994a,b). We begin by introducing some preliminaries for the analysis. Then, in Section 2.2, following the notations and derivations in the authors' papers, we provide a counterexample to show that the convergence proof for the transition equation in Qi and Palmieri (1994a) is problematic. We further show that the modeling assumption of exchangeability cannot yield the transition equation in general. In Section 2.3, we show that the analyses in Qi and Palmieri (1994a,b) are incomplete. The authors did not prove the convergence of IPMs in the cases where operators are stacked together and the algorithm is iterated for multiple generations.

2.1  Preliminaries

In the authors' paper (Qi and Palmieri, 1994a), the problem to be optimized is
(2)
where F is the solution space and g is some given objective function. The analysis intends to be general; therefore, no explicit form of g is assumed. The algorithm to be analyzed is the simple EA with proportionate selection and mutation. Let Xk=(xkj)j=1N denote the kth generation produced by the EA, where N is the population size. To generate the (k+1)th population, proportionate selection produces intermediate populations Xk' following the conditional probability that
(3)
After selection, each individual in Xk' is mutated to generate individuals in Xk+1. Mutation follows the conditional probability that
(4)
Overall the algorithm is illustrated in Figure 1.
Figure 1:

The pseudocode of the simple EA.

Figure 1:

The pseudocode of the simple EA.

Close modal

After presenting the optimization problem and the algorithm, the authors proved the convergence of the IPM if the distributions of individuals in the population are exchangeable. It is the main result in Qi and Palmieri (1994a).

Theorem 1

(Theorem1inQiandPalmieri,1994a): Assume that the fitness function g(x) in (2) and the mutation operator of simple EA described by (4) satisfy the following conditions:

  1. 0<gming(x)gmax<,xF.

  2. supx,yRdfw(x|y)M<.

Then as n, the time history of the simple EA can be described by a sequence of random vectors (xk)k=0 with densities
(5)

In Theorem 1, fxk is the marginal p.d.f. of the kth generation predicted by the IPM. It should be emphasized that in Qi and Palmieri (1994a,b), the authors proved this theorem under the assumption that simple EA has exchangeable individuals in the population. Though not explicitly stated in the theorem, the assumption of exchangeability is the core assumption in their proof and an integral part of their formulation of the theorem.

For analyses in this article, we use the concept of exchangeability in probability theory. Its definition and some basic facts are listed.

Definition 1

(Exchangeablerandomvariables,Definition1.1.1inTayloretal.,1985): A finite set of random variables {xi}i=1n is said to be exchangeable if the joint distribution of (xi)i=1n is invariant with respect to permutations of the indices 1,2,,n. A collection of random variables {xα:αΓ} is said to be exchangeable if every finite subset of {xα:αΓ} is exchangeable.

Definition 1 can also be extended to cover exchangeable random vectors or exchangeable random elements by replacing the term “random variables” in the definition with the respective term. One property of exchangeability is that if {xi}i=1n are n exchangeable random elements, then the joint distributions of any 1kn distinct ones of them are always the same (Proposition 1.1.1 in Taylor et al., 1985). When k=1 this property indicates that {xi}i=1n have the same marginal distribution. Another property is that a collection of random elements are exchangeable if and only if they are conditionally independent and identically distributed (c.i.i.d.) given some σ-field G (Theorem 1.2.2 in Taylor et al., 1985). Conversely, a collection of c.i.i.d. random elements are always exchangeable. Finally, it is obvious that i.i.d. random elements are exchangeable, but the converse is not necessarily true.

It can be seen that the simple EA generates c.i.i.d. individuals given the current population. Therefore, the individuals within the same generation are exchangeable, and they have the same marginal distribution. It is the core assumption of the proof in Qi and Palmieri (1994a,b) and Condition 3 in the Theorem 1.

2.2  Convergence Proof of the Transition Equation

In this section, we analyze the proof of Theorem 1 and show that it is incorrect. The proof by Qi et al. is in Appendix A of Qi and Palmieri (1994a). In the proof, the authors assumed that individuals in the same generation are exchangeable; therefore, they have the same marginal distribution. After a series of derivation steps, the authors managed to obtain a transition equation between the density functions of xk+1i and Xk:
(6)
where in (6),
(7)
(8)
Eq. (6) is correct. It accurately describes how the marginal p.d.f. for any individual in the next generation can be calculated from the joint p.d.f. of individuals in the current generation. Noticing that ηkN is the average of the exchangeable random variables {g(xkj)}j=1N, by the LLN for exchangeable random variables, the authors asserted that
(9)
The authors further asserted that ηk is itself a random variable, satisfying
(10)
Eqs. (9) and (10) correspond to (A13) and (A14) in Appendix A of Qi and Palmieri (1994a), respectively. The authors' proof is correct until this step. However, the authors then asserted that
(11)
Based on this assertion the authors then proved that for all k and x,
(12)
As a result, the p.d.f. in (6) converges point-wisely to Eξk(x)Eηk, which is equal to the right hand side of (5). Hence the authors claimed that Theorem 1 is proved.

In the following, we provide a counterexample to show that assertion (11) is not true when N2 (N=1 is the degenerate case). Then, we carry out further analysis to show that under the modeling assumption of exchangeability, conclusion (12) or equivalently Theorem 1 cannot be true in general.

2.2.1  On Assertion (11)

We first reformulate the assertion. Since {xkl}l=1N are exchangeable, {g(xkl)}l=1N are exchangeable (Property 1.1.2 in Taylor et al., 1985). Let yl=g(xkl),l=1,,N. Then the premises of Theorem 1 imply
(13)
Let y=ηk. According to (8), (9), and (10), y has the properties that
Since g is a general function, there are no other restrictions for {yl}l=1N and y. Therefore, (11) is equivalent to the following assertion:
(16)

However, we use the following counterexample (modified from Example 1.1.1 and related discussions on pages 11–12 in Taylor et al. (1985), to show that assertion (16) is not true. Therefore, (11) is not true.

2.2.2  Counterexample

Let {zl}l=1 be a sequence of i.i.d. random variables satisfying
for all l. Let y be a random variable independent of {zl}l=1 satisfying
and
Finally, let yl=zl+y for all l.

It can easily be verified that {yl}l=1 and y satisfy (13) and (15). Since zl is bounded, E(|zl|)< for any l. By the strong law of large numbers (SLLN) for i.i.d. random variables, 1Nl=1Nzl0a.s.asN; therefore, (14) is also satisfied, that is, y is the limit of 1Nl=1Nyl as N. However, because 1Nl=1Nyl=y+1Nl=1Nzl and y is independent of {zl}l=1, it can be seen that 1Nl=1Nyl is not independent of y except for some degenerate cases (e.g., when y equals to a constant). In particular, in general yl=y+zl is not independent of y for any l. Therefore, assertion (16) is not true. Equivalently, assertion (11) is not true. This renders the authors' proof for (12) invalid.

2.2.3  Further Analysis

In the following, we carry out further analysis to show that (12) cannot be true even considering other methods of proof and adding new sufficient conditions. Therefore, in general, Theorem 1 cannot be true.

To begin with, consider the random variable ξk(x)ηkN. We prove the following lemma.

Lemma 1:

Eξk(x)ηkNEξk(x)ηk as N.

Proof:

According to (9), ηkNa.s.ηk, since gminηkNgmax, 0<gminηkgmax almost surely.

Since h(x)=1x is continuous on (0,),
Then

Finally, by the conditions in Theorem 1, 0ξk(x)ηkNMgmaxgmin. By the Lebesgue's Dominated Convergence Theorem (Proposition 11.30 in Port, 1994), Eξk(x)ηkNEξk(x)ηk as N.

Now by Lemma 1, (12) is equivalent to
(Δ)

Now it is clear that if the only assumption is exchangeability, () is not true even considering other methods of proof. Of course, if (11) is true, ξk(x) and ηk are independent, then () is true. However, as already shown by the counterexample, (11) is not true in general. Therefore, (), and equivalently Theorem 1, are in general not true.

A natural question then arises: Is it possible to introduce some reasonable sufficient conditions such that () can be proved? One of such conditions frequently used is that ηk=E[g(xkj)]. However, the following analysis shows that given the modeling assumption of exchangeability, this condition is not true in general. Therefore, it cannot be introduced.

For exchangeable random variables {g(xkl)}l=1N, we have
(17)
(18)
(19)
where V(x) is the variance of x and C(x,y) is the covariance of x and y. (17) is by the boundedness of ηkN and the Lebesgue's Dominated Convergence Theorem, (18) is by the exchangeability of {xkj}j=1N, and (19) is by the boundedness of g and pushing N to infinity. Now it is clear that if the only modeling assumption is exchangeability, there is no guarantee that Cg(xk1),g(xk2)=0. Therefore, in general ηkN does not converge to a constant. Thus this condition cannot be introduced as a sufficient condition for ().

2.3  The Issue of the Stacking of Operators and Iterating the Algorithm

In the following, we discuss IPMs from another perspective and show that the proofs in Qi and Palmieri (1994a,b) are incomplete, as they only consider convergence of one iteration. Because discussion in this section relates very closely to our proposed framework, we will use our notation from here on in our article consistently, which is different from the previous two sections where we followed strictly the notations of Qi and Palmieri (1994a,b).

Consider an EA with only one operator. Let the operator be denoted by H. When the population size is n, denote this EA by EAn and the operator it actually uses by Hn. Let Pkn=(xk,in)i=1n denote the kth generation produced by EAn. Then the transition rules between consecutive generations produced by EAn can be described by Pk+1n=Hn(Pkn). In Table 1, we write down the population dynamics of EAn. Each row in Table 1 shows the population dynamics produced by EAn. In the table Pkn is expanded as [Hn]k(P0n). Let EA denote the IPM, and Pk=[H]k(P0) denote the populations predicted by EA. Then we can summarize the results in Qi and Palmieri (1994a) in the following way.

Table 1:
Population dynamics of EAn under operator H.
012
EA1 P01 H1(P01) H1(H1(P01)) … 
     
EAn P0n Hn(P0n) Hn(Hn(P0n)) … 
     
EA P0 H(P0) H(H(P0)) … 
012
EA1 P01 H1(P01) H1(H1(P01)) … 
     
EAn P0n Hn(P0n) Hn(Hn(P0n)) … 
     
EA P0 H(P0) H(H(P0)) … 
Assume that the initial population comes from a known sequence of individuals, represented by P0=(xi)i=1. For EAn, its initial population P0n consists of the first n elements of P0; that is, P0n=(xi)i=1n. Let P0=P0. This setting represents the fact that EAn and EA use the same initial population. Hn can be viewed as operators on P0 which takes only the first n elements to produce the next generation. Then the authors essentially proved that
(20)
where m.p.w. stands for point-wise convergence of marginal p.d.f.s.

However, apart from the fact that this proof is problematic, the authors' proof covers only one iteration step, corresponding to the column-wise convergence of the k=1 column in Table 1. The problem is that even if (21) is true, it does not automatically lead to the conclusion that for the arbitrary kth step, [Hn]k(P0)m.p.w.[H]k(P0) as n. In other words, one has to study whether the transition equation for one step can be iterated recursively to predict populations after multiple steps. In Table 1, this problem corresponds to whether other columns have similar column-wise convergence property when the convergence of the k=1 column is proved.

To give an example, consider the column of k=2 in Table 1. To prove column-wise convergence, the authors need to prove that given (20),
(21)
(22)
as n. Comparing (20) with (21) and (22), (21) has the same sequence of operators but with a sequence of converging inputs, (22) has the same input but with a sequence of different operators. Therefore, they are not necessarily true even if (20) is proved. A similar problem exists when considering the arbitrary kth generation. We call this problem the issue of iterating the algorithm. As studies in Qi and Palmieri (1994a,b) ignored this issue, we believe their proofs are incomplete.

The issue of the stacking of operators is similar. Given some operator H satisfying (20) and some operator G satisfying Gn(P0)m.p.w.G(P0) as n, it is not necessarily true that Hn(Gn(P0))m.p.w.H(G(P0)) as n. However, the authors in Qi and Palmieri (1994b) totally ignored this issue and combined the transition equations for selection, mutation and crossover together (in Section III of Qi and Palmieri 1994b) without any justification.

In addition, there are several statements in the authors' proofs in Qi and Palmieri (1994b) that are questionable. First, in the first paragraph of Appendix A (the proof for Theorem 1 in that paper), the authors considered a pair of parents xk and xk' for the uniform crossover operator. xk and xk' are “drawn from the population independently with the same density of fxkfxk'.” Then, the authors claimed that “the joint density of xk and xk' is therefore fxk·fxk'.” This is simply not true. Two individuals drawn independently from the same population are conditionally independent, they are not necessarily independent. In fact, without the i.i.d. assumption, it is very likely that individuals in the same population are dependent. Therefore, the joint density function of xk and xk' is not necessarily fxk·fxk', and the authors' proof for Theorem 1 in Qi and Palmieri (1994b) is dubious at best. On the other hand, if the authors' modeling assumption is i.i.d. of individuals for the uniform crossover operator, this assumption is incompatible with the modeling assumption of exchangeability in Qi and Palmieri (1994a) for selection and mutation. Therefore, combining the transition equations for all these three operators is problematic, because the i.i.d. assumption cannot hold beyond one iteration step.

Another issue in Qi and Palmieri (1994b) is that the uniform crossover operator produces two dependent offspring at the same time. As a result, after uniform crossover, the intermediate population is not even exchangeable because it has pair-wise dependency between individuals. Then the same incompatible assumption problem arises: that is, the transition equation for the uniform crossover operator cannot be combined with the transition equations for selection and mutation. Besides, the transition equation for the uniform crossover operator cannot be iterated beyond one step and still hold i.i.d. or exchangeability as its modeling assumption.

In summary, several issues arise from previous studies on IPMs for EAs on continuous optimization problems. Therefore, new frameworks and proof methods are needed for analyzing the convergence of IPMs and studying the issue of the stacking of operators and iterating the algorithm.

In this section, we present our proposed analytical framework. In constructing the framework we strive to achieve the following three goals.

  1. The framework should be general enough to cover real-world operators and to characterize the evolutionary process of real EA.

  2. The framework should be able to define the convergence of IPMs and serve as justifications of using them. The definition should match one's intuition and at the same time be mathematically rigorous.

  3. The framework should provide an infrastructure to study the issue of the stacking of operators and iterating the algorithm.

The contents of this section roughly reflect the pursuit of the first two goals. The third goal is reflected in the sufficient conditions for convergence and i.i.d. IPM construction in Section 4 and the analyses of the simple EA in Section 5. More specifically, in Section 3.1, we introduce notations and preliminaries for the remainder of this article. In Section 3.2, we present our framework. In the framework, each generation is modeled by a random sequence. This approach unifies the spaces of random elements modeling populations of different sizes. In Section 3.3, we define the convergence of the IPM as convergence in distribution on the space of random sequences. We summarize and discuss our framework in Section 3.4.

3.1  Notations and Preliminaries

In the remainder of this article, we focus on the unconstrained continuous optimization problem
(23)
where g is some given objective function. Our framework is general enough such that it does not require other conditions on the objective function g. However, to prove the convergence of IPMs for mutation and recombination, conditions such as those in Theorem 1 are sometimes needed. We will introduce them when they are required.

From now on we use N to denote the set of nonnegative integers and N+ the set of positive integers. For any two real numbers a and b, let ab be the smaller one of them and ab be the larger one of them. Let x,y be random elements of some measurable space (Ω,F). We use L(x) to represent the law of x. If x and y follow the same law, that is, P(xA)=P(yA) for every AF, we write L(x)=L(y). Note that L(x)=L(y) and x=y have different meanings. In particular, x=y indicates dependency between x and y.

We use the notation (xi)i=mn to represent the array (xm,xm+1,,xn). When n=, (xi)i=m represents the infinite sequence (xm,xm+1,). {xi}i=mn and {xi}i=m represent the collections {xm,xm+1,,xn} and {xi|i=m,m+1,}, respectively. When the range is clear, we use (xi)i and {xi}i or (xi) and {xi} for short.

Let S denote the solution space Rd. This simplifies the notation system when we discuss the spaces Sn and S. In the following, we define metrics and σ-fields on S, Sn and S and state properties of the corresponding measurable spaces.

S is equipped with the ordinary metric ρ(x,y)=[i=1d(xi-yi)2]12. Let S denote the Borel σ-field on S generated by the open sets under ρ. Together (S,S) defines a measurable space.

Similarly, Sn is equipped with the metric ρn(x,y)=[i=1nρ2(xi,yi)]12, and the corresponding Borel σ-field under ρn is denoted by S'n. Together (Sn,S'n) is the measurable space for n tuples.

Next, consider the space of infinite sequences S={(x1,x2,)|xiS,iN+}. It is equipped with the metric ρ(x,y)=i=112i·ρ(xi,yi)1+ρ(xi,yi). The Borel σ-field on S under ρ is denoted by S'. Then (Sn,S') is the measurable space for infinite sequences.

Since S is separable and complete, it can be proved that Sn and S are also separable and complete (Appendix M6 in Billingsley, 1999). In addition, because of separability, the Borel σ-fields S'n and S' are equal to Sn and S, respectively. In other words, the Borel σ-fields S'n and S' generated by the collection of open sets under the corresponding metrics coincide with the product σ-fields generated by all measurable rectangles (Sn) and all measurable cylinder sets (S), respectively (Lemma 1.2 in Kallenberg, 2002). Therefore, from now on we write Sn and S for the corresponding Borel σ-fields. Finally, let M, Mn and M denote the set of all random elements of S, Sn and S, respectively.

Let πn:SSn be the natural projection: πn(x)=(x1,x2,,xn). Since given xM, (πnx):ΩSn defines a random element of Sn projected from S, we also use πn to denote the mapping: πn:MMn where πn(x)=(x1,x2,,xn). By definition, πn is the operator which truncates random sequences to random vectors. Given AS, we use πn(A) to denote the projection of A; that is, πn(A)={xSn:x=πn(y)forsomeyA}.

3.2  Analytical Framework for EA and IPMs

In this section, we present an analytical framework for the EA and IPMs. First, the modeling assumptions are stated. We deal only with operators that generate c.i.i.d. individuals. Then, we present an abstraction of the EA and IPMs. This abstraction serves as the basis for building our framework. Finally, the framework is presented. It unifies the range spaces of the random elements and defines the convergence of IPMs.

3.2.1  Modeling Assumptions

We assume that the EA on the problem (23) is time homogeneous and Markovian, such that the next generation depends only on the current one, and the transition rule from the kth generation to the (k+1)th generation is invariant with respect to kN. We further assume that individuals in the next generation are c.i.i.d. given the current generation. As this assumption is the only extra assumption introduced in the framework, it may need some further explanation.

The main reason for introducing this assumption is to simplify the analysis. Conditional independence implies exchangeability, therefore individuals in the same generation kN+ are always exchangeable. As a result, it is possible to exploit the symmetry in the population and study the transition equations of marginal distributions. Besides, it is because of conditional independence that we can easily expand the random elements modeling finite-sized populations to random sequences, and therefore define convergence in distribution for random elements of the corresponding metric space. In addition, many real world operators in EAs satisfy this assumption, such as the proportionate selection operator and the crossover operator analyzed in Qi and Palmieri (1994a,b). Finally, we emphasize that exchangeability is a property that facilitates the analysis of convergence across multiple iterations and different operators. Because of the generational nature of EA, we want a property that leads to convergence for one iteration and also holds as a premise for the analysis of the next iteration. Exchangeability serves as this property in our analysis. By exchangeability the convergence results can be extended to further iterations and stacked with results from other operators.

However, we admit that there are some exceptions to our assumption. A most notable one may be the mutation operator, though it does not pose significant difficulties. The mutation operator perturbs each individual in the current population independently, according to a common conditional p.d.f. If the current population is not exchangeable, then after mutation the resultant population is not exchangeable, either. Therefore, it seems that mutation does not produce c.i.i.d. individuals. However, considering the fact that mutation is often used along with other operators, as long as these other operators generate c.i.i.d. populations, the individuals after mutation will be c.i.i.d., too. Therefore, a combined operator of mutation and any other operator satisfying the c.i.i.d. assumption can satisfy our assumption. An example can be seen in Qi and Palmieri (1994a), where mutation is analyzed together with proportionate selection. On the other hand, an algorithm which only uses mutation is very simple. It can be readily modeled and analyzed without much difficulty.

Perhaps more significant exceptions are operators such as selection without replacement, or the crossover operator which produces two dependent offspring at the same time. In fact, for these operators not satisfying the c.i.i.d. assumption, it is still possible to expand the random elements modeling finite-sized population to random sequences. For example, the random elements can be padded with some fixed constants or random elements of known distributions to form the random sequences. In this way, our definition of the convergence of IPMs can still be applied. However, whether in this scenario convergence in distribution for these random sequences can still yield meaningful results similar to the transition equation is another research problem. It may need further investigation. Nonetheless, our assumption is equivalent to the exchangeability assumption generally used in previous studies.

3.2.2  The Abstraction of EA and IPMs

Given the modeling assumptions, we develop an abstraction to describe the population dynamics of the EA and IPMs.

Let the EA with population size n be denoted by EAn, and the kth (kN) generation it produces be modeled as a random element Pkn=(xk,in)i=1nMn, where xk,inM is a random element representing the ith individual in Pkn. Without loss of generality, assume that the EA has two operators, G and H. In each iteration, the EA first employs G on the current population to generate an intermediate population, on which it then employs H to generate the next population. Notice that here G and H are just terms representing the operators in the real EA. They facilitate describing the evolutionary process. For EAn, G and H are actually instantiated as functions from Mn to Mn, denoted by Gn and Hn, respectively. For example, if G represents proportionate selection, the function Gn:MnMn is the actual operator in EAn generating n c.i.i.d. individuals according to the conditional probability (3). Of course, for the above abstraction to be valid, the operators used in EAn should actually produce random elements in Mn; that is, the newly generated population should be measurable on (Sn,Sn). As most operators in real EAs satisfy this condition and this is the assumption implicitly taken in previous studies, we assume that this condition is automatically satisfied.

Given these notations, the evolutionary process of EAn can be described by the sequence (Pkn)k=0, where the initial population P0n is known and the generation of Pkn,kN+ follows the recurrence equation
(24)
Then understanding the population dynamics of the EA can be achieved by studying the distributions and properties of Pkn.
Let the IPM of the EA be denoted by EA. The population dynamics it produces can be described by the sequence (PkM)k=0, where P0 is known and the generation of Pk,kN+ follows the recurrence equation
(25)
in which G,H:MM are operators in EA modeled after G and H. Then, the convergence of EA basically requires that (Pkn)n=1 converges to Pk for every generation k.

3.2.3  The Proposed Framework

As stated before, for each generation kN, the elements of the sequence (Pk1,Pk2,) and the limit Pk are all random elements of different metric spaces. Therefore, the core of developing our model is to expand Pkn to random sequences, while ensuring that this expansion will not affect modeling the evolutionary process of the real EA. The result of this step is the sequence of random sequences (QknM)k=0 for each nN+, which completely describes the population dynamics of EAn. For the population dynamics of EA, we just let Qk=Pk.

The expansion of Pkn and the relationships between Pkn, Qkn, and Qk are the core of our framework. In the following, we present them rigorously.

3.2.4  The Expansion of Pkn

We start by decomposing each of Gn and Hn to two operators. One operator is from S to Sn. It corresponds to how to convert random sequences to random vectors. A natural choice is the projection operator πn.

To model the evolutionary process, we also have to define how to expand random vectors to random sequences. In other words, we have to define the expansions of Gn and Hn, which are functions from Sn to S.

Definition 2

(Theexpansionofoperator): For an operator Tn:MnMn satisfying the condition that for any xMn, the elements of Tn(x) are c.i.i.d. given x, the expansion of Tn is the operator T˜n:MnM, satisfying that for any xMn,

  1. Tn(x)=(πnT˜n)(x).

  2. The elements of T˜n(x) are c.i.i.d. given x.

In Definition 2, the operator T˜n is the expansion of Tn. Condition 1 ensures that Tn can be safely replaced by πnT˜n. Condition 2 ensures that the paddings for the sequence are generated according to the same conditional probability distribution as that used by Tn to generate new individuals. In other words, if the operator T˙n:MnM describes how Tn generates each new individual from the current population, Tn is equivalent to invoking T˙n independently on the current population for n times, and T˜n is equivalent to invoking T˙n independently for infinite times. Finally, because Tn satisfies the condition in the premise, the expansion T˜n always exists.

By Definition 2, the operators in EAn can be decomposed as Gn=πnG˜n and Hn=πnH˜n, respectively. Then, the evolutionary process of EAn can be described by the sequence of random sequences [Qkn=(yk,in)i=0M]k=0, satisfying the recurrence equation
(26)
where Pkn follows the recurrence equation (24), and Q0n=(P0n,0,0,). It can also be proved that
(27)

Essentially, (26) and (27) describe how the algorithm progresses in the order ,Qkn,Pkn,Qk+1n,Pk+1n,. It fully characterizes the population dynamics (Pkn)k, and it is clear that the extra step of generating Qkn does not introduce modeling errors.

For EA, because PkM, there is no need for expansion. For convenience we simply let
(28)
for kN.

In summary, the relationships between Pkn, Qkn and Qk are better illustrated in Figure 2. This is the core of our framework for modeling the EA and IPMs. For clarity, we also show the intermediate populations generated by G (denoted by P'kn), their expansions (denoted by Q'kn), and their counterparts generated by G (denoted by Q'k), respectively. How they fit in the evolutionary process can be clearly seen in the figure.

Figure 2:

Relationships between Pkn, Qkn, and Qk.

Figure 2:

Relationships between Pkn, Qkn, and Qk.

Close modal

In Figure 2, a solid arrow with an operator on it means that the item at the arrow head equals the result of applying the operator on the item at the arrow tail. For example, from the figure it can be read that Q1n=H˜n(P'0n). Dashed arrow with a question mark on it signals the place to check whether convergence in distribution holds. For example, when k=2, it should be checked whether (Q2n)n=1 converges to Q2 as n.

Finally, one distinction needs special notice. For EAm and EAn (mn), consider the operators to generate Pkm and Pkn. It is clear that Gm:MmMm and Gn:MnMn are two different operators because their domains and ranges are all different. The distinction still exists when we consider Qkn, though it is more subtle and likely to be ignored. In Figure 2, if we consider the operator G^n=πnG˜n:MM, it is clear that G^n uses the same mechanism to generate new individuals as the one used in Gn=G˜nπn, and Q'kn=G^n(Qkn) describes the same population dynamics as that generated by P'kn=Gn(Pkn). However, if we choose mn, G^m and G^n are both functions from M to M. Therefore, checking domains and ranges are not enough to discern G^m and G^n. It is important to realize that the distinction between G^m and G^n lies in the contents of the functions. G^m and G^n use m and n individuals in the current population to generate the new population, respectively, although the new population contains infinite number of individuals. In short, EAm and EAn are the EA instantiated with different population sizes. Mathematically, the corresponding population dynamics are modeled by stochastic processes involving different operators, even though their domains and ranges may be the same. The same conclusion also holds for the operator H.

3.3  Convergence of IPMs

Given the framework modeling the EA and IPMs, first, we define convergence in distribution for random elements of S. This is standard material. Then, the convergence of IPMs is defined by requiring that the sequence (Qk1,Qk2,) converges to Qk for every kN.

3.3.1  Convergence in Distribution

As Qkn are random elements of S, in the following we define convergence in distribution for sequences of S-valued random elements. Convergence in distribution is equivalent to weak convergence of induced probability measures of the random elements. We use the former theory because when modeling individuals and populations as random elements, the former theory is more intuitive and straightforward. The following materials are standard. They contain the definition of convergence in distribution for random elements, as well as some useful definitions and theorems which are used in our analysis of the simple EA. Most of the materials are collected from the theorems and examples in Sections 13 of Billingsley (1999). The definition of Prokhorov metric is collected from Section 11.3 in Dudley (2002).

Let x,y,xn,nN+ be random elements defined on a hidden probability space (Ω,F,P) taking values in some separable metric space T. T is coupled with the Borel σ-field T. Let (T',T') be a separable measurable space other than (T,T).

Definition 3

(Convergenceindistribution): If the sequence (xn)n=1 satisfies the condition that Eh(xn)Eh(x) for every bounded, continuous function h:TR, we say (xn)n=1 converges in distribution to x, and write xndx.

For ε>0, let Aε={yT:d(x,y)<εforsomexA}. Then it is well known that convergence in distribution on separable metric spaces can be metricized by the Prokhorov metric.

Definition 4
(Prokhorovmetric): For two random elements x and y, the Prokhorov metric is defined as

Call a set A in T an x-continuity set if P(xA)=0, where A is the boundary set of A.

Theorem 2

(ThePortmanteautheorem): The following statements are equivalent.

  1. xndx.

  2. lim supnP(xnF)P(xF) for all closed set FT.

  3. lim infnP(xnG)P(xG) for all open GT.

  4. P(xnA)P(xA) for all x-continuity set AT.

Theorem 3

(Themappingtheorem): Suppose h:(T,T)(T',T') is a measurable function. Denote by Dh the set of discontinuities of h. If xndx and P(Dh)=0, then h(xn)dh(x).

Let a,an be random elements of T, b,bn be random elements of T', then (ab)T and (anbn)T are random elements of T×T'. Note that T×T' is separable.

Theorem 4

(Convergenceindistributionforproductspaces): If a is independent of b and an is independent of bn for all nN+, then (anbn)Td(ab)T if and only if anda and bndb.

Theorem 4 is adapted from Theorem 2.8 (ii) in Billingsley (1999).

Let z,zn,nN+ be random elements of S.

Theorem 5

(Finite-dimensionalconvergence):zndz if and only if πm(zn)dπm(z) for any mN+.

Theorem 5 basically asserts that convergence in distribution for countably infinite dimensional random elements can be studied through their finite-dimensional projections. It is adapted from Example 1.2 and Example 2.4 in Billingsley (1999). In Billingsley (1999), the metric space under consideration is R. However, as both R and S are separable, it is not difficult to adapt the proofs for R to a proof for Theorem 5. Note that πm(z) are random elements defined on (Ω,F,P) taking values in (Sm,Sm), and P[πm(z)A]=P(zA×S×S×) for every ASm. The same is true for πm(zn).

3.3.2  Convergence of IPM

As convergence in distribution is properly defined, we can use the theory to define convergence of IPMs. The idea is that IPM is convergent (thus justifiable) if and only if it can predict the limit distribution of the population dynamics of EAn for every generation kN as the population size n goes to infinity. It captures the limiting behaviors of real EAs.

Definition 5

(ConvergenceofIPMs): An infinite population model EA is convergent if and only if for every kN, QkndQk as n, where Qkn, Qk and the underling Pkn, Pk are generated according to (26), (28), (24), and (25).

Definition 5 is essentially the core of our proposed framework. It defines the convergence of IPM and is rigorous and clear.

3.4  Summary

In this section, we built a framework to analyze the convergence of IPMs. The most significant feature of the framework is that we model the populations as random sequences, thereby unifying the ranges of the random elements in a common metric space. Then, we gave a rigorous definition for the convergence of IPMs based on the theory of convergence in distribution.

Our framework is general. It only requires that operators produce c.i.i.d. individuals. In fact, any EA and IPM satisfying this assumption can be put into the framework. However, to obtain meaningful results, the convergence of IPMs has to be proved. This may require extra analyses on IPM and the inner mechanisms of the operators. These analyses are presented in Sections 4 and 5.

Finally, there is one question worth discussing. In our framework, the expansion of operator is carried out by padding the finite population with c.i.i.d. individuals following the same marginal distribution. Then a question naturally arises: why not pad the finite population with some other random elements, or just with the constant 0? This idea deserves consideration. After all, if the expansion is conducted by padding 0s, the requirement of c.i.i.d. can be discarded, and the framework and the convergence of IPMs stay the same. However, we did not choose this approach. The reason is that padding the population with c.i.i.d. individuals facilitates the analysis of the IPM. For example, in our analysis in Sections 4 and 5, the sufficient conditions for the convergence of IPMs require us to consider Γm(Qkn), where Γ is the operator under analysis. Γm uses the first m elements of Qkn to generate new individuals. Now if m>n and Qkn is expanded from Pkn by padding 0s, Γm(Qkn) does not make any sense because the m individuals used by Γm have (m-n) 0s. This restricts our option in proving the convergence of IPMs.

In this section, we give applicable sufficient conditions for convergence of IPMs. In Section 4.1, we give sufficient conditions for the convergence of IPMs. To appreciate the necessity, consider the framework in Figure 2. To prove the convergence of IPM, by Definition 5, we should check whether QkndQk as n for every kN. However, this direct approach is usually not viable. To manually check the convergence for all values of k is wearisome and sometimes difficult. This is because as k increases, the distributions of Qkn and Qk change. Therefore, the method needed to prove QkndQk as n may be different from the method needed to prove Qk+1ndQk+1 as n. Of course, after proving the cases for several values of k, it may be possible to discover some patterns in the proofs, which can be extended to cover other values of k, thus proving the convergence of the IPM. But this process is still tedious and uncertain.

In view of this, a “smarter” way to prove the convergence of IPM may be the following method. First, the convergence of IPM for one iteration step for each operator is proved. Then, the results are combined and extended to cover the whole population dynamics. The idea is that if the convergence holds for one generation number k, then it can be passed on automatically to all subsequent generations. For example, in Figure 2, consider the operators G and G˜nπn. The first step is to prove that
(29)
In other words, G can model G˜nπn for one iteration step. Then, after obtaining similar results for H and H˜nπn, we combine the results together and the convergence of the overall IPM is proved.

However, this approach still seems difficult because we have to prove this pass-on relation (30) holds for every k. In essence, this corresponds to whether the operators in IPM can be stacked together and iterated for any number of steps. This is the issue of the stacking of operators and iterating the algorithm. Therefore, in Section 4.1, we give sufficient conditions for this to hold. These conditions are important. If they hold, proving the convergence of the overall IPM can be broken down to proving the convergence of one iteration step of each operator in IPM. This greatly reduces the difficulty in deriving the proof.

To model real EAs, IPM has to be constructed reasonably. As shown in Section 2, exchangeability cannot yield the transition equation for the simple EA. This creates the research problem of finding a suitable modeling assumption to derive IPM. Therefore, in Section 4.2, we discuss the issue and propose to use i.i.d. as the modeling assumption in IPM.

4.1  Sufficient Conditions for Convergence of IPMs

To derive sufficient conditions for the convergence of the overall IPM, the core step is to derive conditions under which the operators in the IPM can be stacked and iterated.

As before, let EAn and EA denote the EA with population size n and the IPM under analysis, respectively. Let Γ be an operator in the EA, and Γn:MM and Γ:MM be its corresponding expanded operators in EAn and EA, respectively. Note that Γn and Γ generate random elements of S. To give an example, Γn and Γ may correspond to πnG˜n and G in Figure 2, respectively.

We define a property under which Γ can be stacked with some other operator Ψ satisfying the same property without affecting the convergence of the overall IPM. In other words, for an EA using Ψ and Γ as its operators, we can prove the convergence of IPM by studying Ψ and Γ separately. We call this property “the stacking property.” It is worth noting that if Φ=Γ, then this property guarantees that Γ can be iterated for any number of times. Therefore, it also resolves the issue of iterating the algorithm.

Let Aα be random elements in M for αN+{}. We have the following results.

Definition 6

(Thestackingproperty): Given UM, if for any converging sequence AndAU, Γn(An)dΓ(A)U as n always holds, then we say that Γ has the stacking property on U.

Theorem 6:

If Ψ and Γ have the stacking property on U, then ΨΓ has the stacking property on U.

Proof:

For any converging sequence AndAUM, because Γ has the stacking property on U, we have Γn(An)dΓ(A)U. Then, (Γn(An))n is also a converging sequence. Since Ψ has the stacking property on U, then by definition we immediately have (ΨnΓn)(An)d(ΨΓ)(A)U.

By Theorem 6, any composition of Ψ and Γ has the stacking property on U. In particular, (Γ)m has the stacking property on U. The stacking property essentially guarantees that the convergence on U can be passed on to subsequent generations.

Theorem 7

(Sufficientcondition1): For an EA consisting of a single operator Γ, let Γ be modeled by Γ in the IPM, EA and Γ have the stacking property on some space UM. If the initial populations of both EA and EA follow the same distribution PX for some XU, then EA converges.

Proof:

Note that for EAn and EA, the kth populations they generate are (Γn)k(X) and (Γ)k(X), respectively. By Theorem 6, (Γ)k has the stacking property on U. Because the sequence (X,X,) converges to XU, by Definition 6, (Γn)k(X)d(Γ)k(X)U as n. Since this holds for any kN, by Definition 5, EA converges.

By Theorems 6 and 7, we can prove the convergence of the overall IPM by proving that the operators in the IPM have the stacking property. Comparing with (30), it is clear that the stacking property is a sufficient condition. This is because the stacking property requires that (Γn(An))n converges to a point in U for any converging sequence (An)n satisfying (An)ndAU, while (29) requires the convergence to hold only for the specific converging sequence (Qkn)n. Since (Qkn)n is generated by the algorithm, it may have special characteristics regarding converging rate, distributions, etc. On the other hand, checking the stacking property may be easier than proving (29). This is because the stacking property is independent of the generation number k.

Another point worth discussing is the introduction of U in Definition 6. Of course, if we omit U (or equivalently let U=M), the stacking property will become “stronger” because if it holds, the convergence of the IPM is proved for the EA starting from any initial population. However, in that case the condition is so restricted that the stacking property cannot be proved for many operators.

In Definition 6, it is required that Γn(An)dΓ(A)U as n. The sequence under investigation is (Γn(An))n, which is a sequence of changing operators (Γn)n on a sequence of changing inputs (An)n. As both the operators and the inputs change, the convergence of (Γn(An))n may still be difficult to prove. Therefore, in the following, we further derive two sufficient conditions for the stacking property.

First, let Bα,β=Γβ(Aα), where α,βN+{}. Then, we have the following sufficient conditions for the stacking property.

Theorem 8

(Sufficientcondition2): For a space U and all converging sequences AndAU, if the following two conditions

  1. MN+, such that for all m>M, Bn,mdB,m uniformly as n, i.e. supm>Mρd(Bn,m,B,m)0 as n,

  2. B,mdB,U as m,

are both met, then Γ has the stacking property on U.

Theorem 9

(Sufficientcondition3): For a space U and all converging sequences AndAU, if the following two conditions

  1. NN+, such that for all n>N, Bn,mdBn, uniformly as m, i.e. supn>Nρd(Bn,m,Bn,)0 as m,

  2. Bn,dB,U as n,

are both met, then Γ has the stacking property on U.

Since Theorems 8 and 9 are symmetric in m and n, proving one of them leads to the other. In the following, we prove Theorem 8. Recall that ρd is the Prokhorov metric (Definition 4) and gets the maximal in the expression.

Proof:
ε>0, by condition 1 in Theorem 8, N s.t. supm>Mρd(Bn,m,B,m)<12ε for all n>N. By condition 2 in Theorem 8, M˜ s.t. ρd(B,m,B,)<12ε for all m>M˜. Now for all l>MNM˜,
Therefore, Bn,ndB, as n.

To understand these two theorems, consider the relationships between Aα and Bα,β illustrated by Figure 3. In the figure, the solid arrow represents the premise in Definition 6; that is, AndAU as n. The double line arrow represents the direction to be proved for the stacking property on U, i.e. Bn,ndB,U as n. The dashed arrows are the directions to be checked for Theorem 8 to hold. The wavy arrows are the directions to be checked for Theorem 9 to hold.

Figure 3:

Relationships between Aα and Bα,β.

Figure 3:

Relationships between Aα and Bα,β.

Close modal

Now it is clear that Theorems 8 and 9 bring benefits. For example, for Theorem 8, instead of proving the convergence for a sequence generated by changing operators and inputs (Bn,ndB,), this sufficient condition considers the convergence of sequences generated by the same operator on changing inputs (Bn,mdB,m) and of the sequence generated by changing operators on the same input (B,mdB,).

The reason we introduce M and N in Theorems 8 and 9, respectively, is to exclude some of the starting columns and rows in Figure 3, if necessary. This is useful in proving the convergence of the IPM of the k-ary recombination operator.

4.2  The I.I.D. Assumption

In this section, we address the issue of how to construct IPM. This issue also corresponds to how to choose the space U for the stacking property.

Before introducing the i.i.d. assumption, let us give an example. Consider the space U={xM|P[x=(c,c,)]=1forsomecS}. If the initial population follows some distribution from U, then the population consists of all identical individuals. If an EA with proportionate selection and crossover operates on this initial population, then all subsequent populations stay the same as the initial population. An IPM of this EA can be easily constructed, and it can be easily proved that the stacking property holds as long as the EA chooses its initial population from U. However, this is not a very interesting case. This is because U is too small to model real EAs.

On the other hand, if U={xM|xisexchangeable}, U may be too big to derive meaningful results. This can be seen from our analysis in Section 2 which shows that under exchangeability it is not possible to derive transition equations of marginal distributions for the simple EA.

Therefore, choosing U should strike a balance between the capacity and the complexity of the IPM. In the following analysis, we choose U to be UI={xM|xisi.i.d.}. IPMs of EAs are constructed using the i.i.d. assumption, and we prove the convergence of the overall IPM by proving that the operators in the IPM have the stacking property on UI.

We choose UI for the following reasons. First, in the real world, many EAs generate i.i.d. initial populations. Therefore this assumption is realistic. Secondly, i.i.d. random elements have the same marginal distributions. Therefore, IPM can be described by transition equations of marginal distributions. Finally, there is abundant literature on the converging laws and limit theorems of i.i.d. sequences. Therefore, the difficulty in constructing IPM can be greatly reduced compared with using other modeling assumptions.

In the following, we show how to construct IPM under the i.i.d. assumption. This process also relates to condition 2 in Theorem 8. It essentially describes how the IPM generates new populations.

Let the operator in the EA be Γ, and the corresponding operator in EAm be Γm:MM. Recall that in our framework we only study EAs consisting of c.i.i.d. operators, therefore Γm generates c.i.i.d. outputs by using the first m elements of its input. The process that Γm generates each output can be described by the conditional p.d.f. fΓm(x|y1,y2,,ym). Let a=(ai)i=1M be the input and b=(bi)i=1=Γm(a) be the output, then the distribution of b can be completely described by its finite-dimensional p.d.f.s
(30)
for every lN+.
To derive the IPM Γ for Γ, consider the case when l=1 and aUI in (30). Noting that in this case fπm(a)(y1,,ym)=i=1mfa1(yi), we have
(31)
Now taking m, (32) in the limit becomes the transition equation describing how Γ generates each new individual. Let the transition equation be
(32)
and let c=(ci)i=1=Γ(a). Then how Γ generates l individuals can be described by the finite-dimensional p.d.f.s of c:
(33)
for every lN+. Overall, (34) describes the mapping Γ:UIUI.

To better understand the construction, it is important to realize that for Γboth the input and the output are i.i.d. In other words, Γ generates i.i.d. population dynamics to simulate the real population dynamics produced by Γ, only that the transition equation in Γ is derived by mimicking how Γ generates each new individual on i.i.d. inputs and taking the population size to infinity. In fact, if the stacking property on UI is proved and the initial population is i.i.d., Γ will always take i.i.d. inputs and produce i.i.d. outputs. The behaviors of Γ on UI are well-defined. On the other hand, Γ(AUI) is not defined in the construction. This leaves us freedom. We can define Γ(AUI) freely to facilitate proving the stacking property of Γ. In particular, Bn, for nN+ in Figure 3 can be defined freely to facilitate the analysis.

In fact, under the i.i.d. assumption, deriving the transition equation for most operators is the easy part. The more difficult part is to prove the stacking property of Γ on UI. To give an example, consider the transition equation (5) constructed in Qi and Palmieri (1994a), which models the joint effects of proportionate selection and mutation. As our analysis in Section 2 shows, it does not hold under the assumption of exchangeability. However, if the modeling assumption is i.i.d., the transition equation can be immediately proved (see our analysis in Section 2). This also applies to the transition equation built by the same authors for the uniform crossover operator (in Theorem 1 of Qi and Palmieri, 1994b), where the transition equation is in fact constructed under the i.i.d. assumption. Therefore, in the following analyses, we do not refer to the explicit form of the transition equation, unless it is needed. We only assume that the transition equation is successfully constructed, and it has the form (32) which is derived from (31) as m.

The construction of the IPM also relates partly to condition 2 in Theorem 8. Comparing with this condition, it can be seen that for a successfully constructed Γ, the following two facts are proved in the construction (m.p.w. stands for point-wise convergence of marginal p.d.f.s.).

  1. B,mm.p.w.B, as m.

  2. B,UI.

Of course, these two facts are not sufficient for this condition to hold. One still needs to prove B,mdB, as m. In other words, one has to consider convergence of finite dimensional distributions.

Finally, we sometimes use x for x1,,xl if l is clear in the context. For example (30) can be rewritten as fπl(b)(x)=Smi=1lfΓm(xi|y)·fπm(a)(y)dy, where l takes m's place and means the population size which the operator is operating on.

In this section, we use the sufficient conditions to prove the convergence of IPMs for various simple EA operators. The operators of mutation and k-ary recombination are readily analyzed in Sections 5.1 and 5.2, respectively. In Section 5.3, we summarize this section and discuss our results.

5.1  Analysis of the Mutation Operator

Having derived sufficient conditions for the stacking property and constructed the IPM, we prove the convergence of the IPM of the mutation operator first. Mutation adds an i.i.d. random vector to each individual in the population. If the current population is AM, then the population after mutation satisfies L[B=Γm(A)]=L(A+X) for all mN+, where XUI is a random element decided by the mutation operator. As the content of the mutation operator does not depend on m, we just write Γ to represent Γm. To give an example, X may be the sequence (x1,x2,) with all xiM mutually independent and xiN(0,Id) for all iN+, where N(a,B) is the multivariate normal distribution with mean a and covariance matrix B, and Id is the d-dimensional identity matrix. Note that every time Γ is invoked, it generates perturbations independently. For example, let A1 and A2 be two populations, then we can write Γ(Ai)=Ai+Xi for i=1,2 satisfying L(X1)=L(X2)=L(X) and {Xi}i=1,2 are mutually independent and independent from {Ai}i=1,2.

Next, consider Γ. Recall that as an IPM, Γ simulates real population dynamics by taking i.i.d. inputs and producing i.i.d. outputs. If the marginal p.d.f.s of A and X are fa and fx, respectively, then Γ(A) generates i.i.d. individuals whose p.d.f.s are fa*fx, where * stands for convolution. Given the construction, we can prove the stacking property of Γ.

Theorem 10

(Mutation): Let Γ be the mutation operator, and Γ be the corresponding operator in the IPM constructed under the i.i.d. assumption, then Γ has the stacking property on UI.

Proof:

We use the notations and premises in Theorem 8. Refer to Figure 3. In particular, the sequence (An) and the limit A are given and AndAUI as n.

Apparently,
Therefore, condition 2 in Theorem 8 is satisfied.

Noting that condition 1 in Theorem 8 is equivalent to Γ(An)dΓ(A), we prove this condition by proving that πi[Γ(An)]dπi[Γ(A)] for all iN+. Then by Theorem 5, condition 1 in Theorem 8 holds. Then, as both conditions in Theorem 8 are satisfied, this theorem is proved.

Now, we prove πi[Γ(An)]dπi[Γ(A)] for all iN+. First, note that Γ(Aα)=Aα+Xα for all αN{}. {XαM} are i.i.d. and independent from {AαM}. In addition, for every α, L(Xα)=L(X).

Since L(Xα)=L(X), it is apparent that XndX. Then by Theorem 5, we have πi(Xn)dπi(X) and πi(An)dπi(A).

Consider the product space Si×Si. It is both separable and complete. Since πi(Aα) and πi(Xα) are independent, by Theorem 4, it follows that
(34)
Note that
(35)
where I is the identity matrix of appropriate dimension and h:Si×SiSi is a function satisfying h(xy)=IIxy. Apparently h is continuous. Then by (34), (35), and Theorem 3, πi[Γ(An)]dπi[Γ(A)] for any iN+.

In the proof, we concatenate the input (An) and the randomness (Xn) of the mutation operator in a common product space, and represent Γ as a continuous function in that space. This technique is also used when analyzing other operators.

5.2  Analysis of k-ary Recombination

Consider the k-ary recombination operator and denote it by Γ. In EAm, the operator is denoted by Γm. Γm works as follows. To generate a new individual, it first samples k individuals from the current m-sized population randomly with replacement. Assume the current population consists of {xi}x=1m, and the selected k parents are {yi}i=1k, then {yi}i=1k follows the probability:
(36)
After the k parents are selected, Γm produces a new individual x following the formula
(37)
where {Ui}i=1k are random elements of Rd×d (recall that x and yi are random elements of S=Rd modeling individuals in our framework). {Ui}i=1k are also independent of {yi}i, and the joint distribution of (Ui)i is decided by the inner mechanism of Γ. Overall, Γm generates the next population by repeatedly using this procedure to generate new individuals independently.
Our formulation seems strange at first sight, but it covers many real world recombination operators. For example, consider k=2 and U1=U2=12I. This operator is the crossover operator taking the mean of its two parents. On the other hand, if k=2 and the distributions of U1 and U2 satisfy
where Diag constructs a diagonal matrix from its inputs, {si} are i.i.d. random variables taking values in {0,1} satisfying P(si=0)=P(si=1)=1/2, then this operator is the uniform crossover operator which sets value at each position from the two parents with probability 12.

Consider the IPM Γ. As stated in Section 4.2, we do not give the explicit form of the transition equation in Γ. We assume that the IPM is successfully constructed, and the transition equation is derived by taking m in (31). The reason for this approach is not only because deriving the transition equation is generally easier than proving the convergence of the IPM, but also the formulation in (36) and (37) encompasses many real-world k-ary recombination operators. We do not delve into details of the mechanisms of these operators and derive a transition equation for each one of them. Instead, our approach is general in that as long as the IPM is successfully constructed, our analysis on the convergence of the IPM can always be applied.

The following theorem is the primary result of our analysis for the k-ary recombination operator.

Theorem 11

(k-aryrecombination): Let Γ be the k-ary recombination operator, and Γ be the corresponding operator in the IPM constructed under the i.i.d. assumption, then Γ has the stacking property on UI.

Proof:

We use the notations and premises in Theorem 9. Refer to Figure 3. In particular, the sequence (An) and the limit A are given and AndAUI as n.

We prove that
(38)
as n for any iN+. Then by Theorem 5, the conclusion follows.

The overall idea to prove (38) is that we first prove the convergence in distribution for the k·i selected parents; then because the recombination operator is continuous, (39) follows.

First, we decompose the operator πiΓm:MMi. πiΓm generates the i c.i.i.d. outputs one by one. This generation process can also be viewed as first selecting the i groups of k parents at once from the first m elements of the input (in total the intermediate output is k·i parents not necessarily distinct), then producing the i outputs one by one by using each group of k parents. In the following, we describe this process mathematically.

Consider Φm:MMk·i. Let x=(xj)j=1M and y=(yj)j=1k·i=Φm(x). Let Φm be described by the probability
(39)
In essence, Φm describes how to select the k·i parents from x.
Consider Ψ:Mk·iMi. Let
Let v=(vj)j=1i=Ψ(u). Let Ψ be described by
(40)
in which L[(Uj,l)l=1k]=L[(Ul)l=1k], where {Ul} are decided by the recombination operator Γ as in (37), and (Uj,l)l=1k are independent for different j. In essence, Ψ describes how to generate the i individuals from the k·i parents.
Now it is obvious that πiΓm=ΨΦm. Therefore,
(41)
for all mN+ and αN+{}.
Next, consider πiΓ:MMi. Let Φ=πk·i, we prove that
(42)
Eq. (42) is almost obvious because both operators generate i.i.d. outputs, and both marginal p.d.f.s of the outputs follow the same distribution decided by Ψ on k i.i.d. parents from A. In other words, ΨΦ is a model of πiΓ on i.i.d. inputs. The outputs they generate on the same i.i.d. input follow the same distribution.
Since AUI, by (42),
(43)
Then (38) is equivalent to
(44)
as n for any iN+.

To prove (44), we prove the following two conditions.

  1. NN+, such that for all n>N, Φm(An)dΦ(An) uniformly as m; that is, supn>Nρd[Φm(An),Φ(An)]0 as m.

  2. Φ(An)dΦ(A) as n and Φ(A) is i.i.d.

These two conditions correspond to the conditions in Theorem 9. Since Φα is from M to Mk·i, we cannot directly apply Theorem 9. However, it is easy to extend the proof of Theorem 9 to prove that these two conditions lead to Φn(An)dΦ(A) as n. Then, by (40) it is apparent that Ψ is a continuous function of its input and inner randomness. By concatenating the input and the inner randomness using the same technique as that used in the proof for Theorem 10, (44) can be proved. Then this theorem is proved.

In the remainder of the proof, we prove conditions 1 and 2. These conditions can be understood by replacing the top line with Φm in Figure 3.

Proof of Condition 2

Since Φ=πk·i:SSk·i (recall that πk·i can be viewed both as a mapping from S to Sk·i and from M to Mk·i), Φ is continuous (see Example 1.2 in Billingsley, 1999). Since AndA, by Theorem 3, Φ(An)dΦ(A). Apparently, Φ(A) is i.i.d. Therefore condition 2 is proved.

It is worth noting that this simple proof comes partly from our extension of ΨΦ to inputs AUI. In fact, the only requirement for Φ is (42); that is, ΨΦ should model πiΓ on i.i.d. inputs. By defining Φ to be πk·i, it can take non-i.i.d. inputs such as An. Thus this condition can be proved. In Figure 3, this corresponds to our freedom of defining Bn,,nN+.

Proof of Condition 1

To prove condition 1, we first give another representation of Φm(Aα), where m>k·i and αN+{}. This representation is based on the following mutually exclusive cases.

  1. The k·i parents chosen from Aα by Φm are distinct.

  2. There are duplicates in the k·i parents which are chosen from Aα by Φm.

Let sm,α be random variables taking values in {0,1}, with probability
(45)
Let xm,αMk·i follow the conditional distribution of the k·i parents when sm,α=1, and ym,αMk·i follow the conditional distribution of the k·i parents when sm,α=0, then Φm(Aα) can be further represented as
(46)
For our purpose, it is not necessary to explicitly describe the distribution of xm,α and ym,α. The only useful fact is that by exchangeability of Aα,
(47)
To put it another way, xm,α and Φ(Aα) both follow the same distribution of k·idistinct individuals from the current exchangeable population Aα. Also note that {sm,α}α are i.i.d. random variables. They are independent of xm,α and ym,α.
Now consider P[Φm(An)A] for any ASk·i. By conditioning on whether the k·i parents are distinct, we have
Then by (47),
(48)
Since p(m), P[Φ(An)A] and P(ym,nA) are all less than or equal to 1,
i.e. |P[Φm(An)A]-P[Φ(An)A]|1-p(m) for all A. Taking supremum over all A, we have
(49)
The left-hand side of (49) is the total variation distance between Φm(An) and Φ(An). It is an upper bound of the Prokhorov distance (see Gibbs and Su, 2002 for its definition and properties). Since the bound 1-p(m) is uniform with respect to n and p(m)1 as m, we have
(50)
This is exactly condition 1. Therefore this theorem is proved.
Or, if we do not want to use the total variance distance, we have the following result for any Φ(A)-continuity set ASk·i.
(51)
Since we already proved Φ(An)dΦ(A), by 4) in Theorem 2, |P[Φ(An)A]-P[Φ(A)A]|0. Then apparently (51) converges to 0. Noting that A is arbitrary, by applying 4) in Theorem 2 again, Φn(An)dΦ(A) is proved.

We give a brief discussion of the proof. In our opinion, the most critical step of our proof is decomposing the k-ary recombination operator to two suboperators: one is responsible for selecting parents (Φ), the other is responsible for combining them (Ψ). In addition, for parent selection, the suboperator does not use the information of fitness values. Rather, it selects parents “blindly” according to its own rules (uniform sampling with replacement). This makes the operator Φ easier to analyze because the way it selects parents does not rely on its input. Therefore, we can prove uniform convergence in (50).

Another point worth mentioning is the choice of Theorem 9 in our proof. Though Theorems 8 and 9 are symmetric, the difficulties of proving them are quite different. In fact, it is very difficult to prove the uniform convergence condition in Theorem 8.

Finally, our proof can be easily extended to cover k-ary recombination operators using uniform sampling without replacement to select parents for each offspring. The overall proof framework roughly stays the same.

5.3  Summary

In this section, we analyzed the simple EA within the proposed framework. As the analysis shows, although the convergence of IPM is rigorously defined, actually proving the convergence for operators usually takes a lot of effort. We did analysis under the IPM construction and sufficient conditions from Section 4, and used various techniques to analyze the mutation operator and the k-ary recombination operator. It can be seen that although the sufficient conditions can provide general directions for the proofs, there are still many details to be worked out in order to analyze different operators.

To appreciate the significance of our work, it is worth noting that in Qi and Palmieri (1994a,b), the convergence of the IPMs of the mutation operator, the uniform crossover operator and the proportionate selection operator was not properly proved, and the issue of stacking of operators and iterating the algorithm was not addressed at all. In this article, however, we have proved the convergence of IPMs of several general operators. Since these general operators cover the operators studied in Qi and Palmieri (1994a,b) as special cases, the convergence of the IPMs of mutation and uniform crossover are actually proved in this article. Besides, our proof does not depend on the explicit form of the transition equation of the IPM. As long as the IPM is constructed under the i.i.d. assumption, our proof is valid.

As a consequence of our result, consider the explicit form of the transition equation for the uniform crossover operator derived in Section II in Qi and Palmieri (1994b). As the authors' proof was problematic and incomplete, the derivation of the transition equation was not well founded. However, it can be seen that the authors' derivation is in fact equivalent to constructing the IPM under the i.i.d. assumption. Since we have already proved the convergence of IPM of the k-ary crossover operator, the analysis in Qi and Palmieri (1994b) regarding the explicit form of the transition equation can be retained.

In this article, we revisited the existing literature on the theoretical foundations of IPMs, and proposed an analytical framework for IPMs based on convergence in distribution for random elements taking values in the metric space of infinite sequences. Under the framework, commonly used operators such as mutation and recombination were analyzed. Our approach and analyses are new. There are many topics worth studying for future research.

Perhaps the most immediate topic is to analyze the proportionate selection operator in our framework. The reason that the mutation operator and the k-ary recombination operator can be readily analyzed is partly because they do not use the information of the fitness value. Also to generate a new individual, these operators draw information from a fixed number of parents. On the other hand, to generate each new individual, the proportionate selection operator actually gathers and uses fitness values of the whole population. This makes analyzing proportionate selection difficult.

We think further analysis on proportionate selection can be conducted in the following two directions.

  1. In the analyses we tried to prove the stacking property on UI for the IPM of proportionate selection. Apart from more efforts trying to prove/disprove this property, it is worth considering modifying the space UI. For example, we can incorporate the rate of convergence into the space. If we can prove the stacking property on UIU where U is the space of converging sequences with rate O(h(n)), it is also a meaningful result.

  2. Another strategy is to bypass the sufficient conditions and return to Definition 5 to prove QkndQk for every k. This is the original method. In essence, it requires studying the convergence of nesting integrals.

Apart from proportionate selection, it is also worth studying whether other operators, such as ranking selection, can be analyzed in our framework. As many of these operators do not generate c.i.i.d. offspring, it makes deriving the IPM and proving its convergence difficult, if not impossible. In this regard, we believe new techniques of modeling and extensions of the framework are fruitful directions for further research.

Finally, it is possible to extend the concept of “incidence vectors” proposed by Vose to the continuous search space. After all, as noted by Vose himself, incidence vectors can also be viewed as marginal p.d.f.s of individuals. As a consequence, the cases of EAs on discrete and continuous solution spaces indeed do bear some resemblance. By an easy extension, the incidence vectors in the continuous space can be defined as functions with the form ciδ(xi), where δ is the Dirac function and ci is the rational number representing the fraction that xi appears in the population. If similar analyses based on this extension can be carried out, many results in Nix and Vose (1992) and Vose (1999b,a, 2004) can be extended to the continuous space.

Billingsley
,
P
. (
1999
).
Convergence of probability measures
. 2nd ed.
New York
:
John Wiley & Sons
.
Dudley
,
R. M
. (
2002
).
Real analysis and probability
. 2nd ed.
Cambridge
:
Cambridge University Press
.
Gibbs
,
A. L.
, and
Su
,
F. E
. (
2002
).
On choosing and bounding probability metrics
.
International Statistical Review
,
70
(
3
):
419
435
.
Kallenberg
,
O
. (
2002
).
Foundations of modern probability
. 2nd ed.
New York
:
Springer
.
Nix
,
A.
, and
Vose
,
M. D
. (
1992
).
Modeling genetic algorithms with Markov chains
.
Annals of Mathematics and Artificial Intelligence
,
5
(
1
):
79
88
.
Port
,
S. C
. (
1994
).
Theoretical probability for applications
.
New York
:
John Wiley & Sons
.
Qi
,
X.
, and
Palmieri
,
F
. (
1994a
).
Theoretical analysis of evolutionary algorithms with an infinite population size in continuous space. Part I: Basic properties of selection and mutation
.
IEEE Transactions on Neural Networks
,
5
(
1
):
102
119
.
Qi
,
X.
, and
Palmieri
,
F
. (
1994b
).
Theoretical analysis of evolutionary algorithms with an infinite population size in continuous space. Part II: Analysis of the diversification role of crossover
.
IEEE Transactions on Neural Networks
,
5
(
1
):
120
129
.
Taylor
,
R. L.
,
Daffer
,
P. Z.
, and
Patterson
,
R. F
. (
1985
).
Limit theorems for sums of exchangeable random variables
.
Totowa, NJ
:
Rowman & Allanheld
.
Vose
,
M. D
. (
1999a
).
The simple genetic algorithm: Foundations and theory
.
Cambridge, MA
:
MIT Press
.
Vose
,
M. D.
(
1999b
). What are genetic algorithms? A mathematical prespective. In
L.
Davis
,
K.
De Jong
,
M.
Vose
, and
L.
Whitley
(Eds.),
Evolutionary algorithms
, pp.
251
276
.
The IMA Volumes in Mathematics and Its Applications
, vol.
111
.
New York
:
Springer
.
Vose
,
M. D.
(
2004
).
Infinite population GA tutorial
.
Technical Report ut-cs-04-533. The University of Tennessee, Knoxville
.
Yong
,
G.
,
Xiaofeng
,
Q.
, and
Palmieri
,
F
. (
1998
).
Comments on “Theoretical analysis of evolutionary algorithms with an infinite population size in continuous space. I. Basic properties of selection and mutation” [with reply]
.
IEEE Transactions on Neural Networks
,
9
(
2
):
341
343
.
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license, which permits copying and redistributing the material in any medium or format for noncommercial purposes only. For a full description of the license, please visit https://creativecommons.org/licenses/by-nc/4.0/legalcode.