## Abstract

Infinite population models are important tools for studying population dynamics of evolutionary algorithms. They describe how the distributions of populations change between consecutive generations. In general, infinite population models are derived from Markov chains by exploiting symmetries between individuals in the population and analyzing the limit as the population size goes to infinity. In this article, we study the theoretical foundations of infinite population models of evolutionary algorithms on continuous optimization problems. First, we show that the convergence proofs in a widely cited study were in fact problematic and incomplete. We further show that the modeling assumption of exchangeability of individuals cannot yield the transition equation. Then, in order to analyze infinite population models, we build an analytical framework based on convergence in distribution of random elements which take values in the metric space of infinite sequences. The framework is concise and mathematically rigorous. It also provides an infrastructure for studying the convergence of the stacking of operators and of iterating the algorithm which previous studies failed to address. Finally, we use the framework to prove the convergence of infinite population models for the mutation operator and the $k$-ary recombination operator. We show that these operators can provide accurate predictions for real population dynamics as the population size goes to infinity, provided that the initial population is identically and independently distributed.

## 1 Introduction

Evolutionary algorithms (EAs) are general purpose optimization algorithms with great successes in real-world applications. They are inspired by the evolutionary process in nature. A certain number of candidate solutions to the problem at hand are modeled as individuals in a population. The algorithm evolves the population by mutation, crossover, and natural selection so that individuals with more preferable objective function values have higher survival probability. By the “survival of the fittest” principle, it is likely that after many generations the population will contain individuals with high fitness values such that they are satisfactory solutions to the problem at hand.

Though conceptually simple, the underlying evolutionary processes and the behaviors of EAs remain to be fully understood. The difficulties lie in the fact that EAs are customizable population-based iterative stochastic algorithms, and the objective function also has great influence on their behaviors. A successful model of EAs should describe both the mechanisms of the algorithm and the influence from the objective function. One way to study EAs is to model them as dynamical systems. The idea is to pick a certain quantity of interest first, such as the distribution of the population or a certain statistic about it. Then, transitions in the state space of the picked quantity are modeled. A transition matrix (when the state space is finite) or a difference equation (when the state space is not finite) for Markov chain is derived to describe how the picked quantity changes between consecutive generations.

In order to characterize the population dynamics accurately, the state space of the Markov chain tends to grow rapidly as the population size increases. As a result, even for time-homogeneous EAs with moderate population size, the Markov chain is often too large to be analyzed or simulated. To overcome this issue, some researchers turn to studying the limiting behaviors of EAs as the population size goes to infinity. The idea is to exploit some kind of symmetry in the state space (such as all individuals having the same marginal distribution), and prove that in the limit the Markov chain can be described by a more compact model. Models built in this way are called infinite population models (IPMs).

In this article, we follow this line of research and study IPMs of EAs on *continuous* space. More specifically, we aim at rigorously proving the convergence of IPMs. In this study, by “convergence” we usually mean that IPMs characterize limiting behaviors of real EAs. ``An IPM converges loosely'' means that as the population size goes to infinity, the population dynamics of the real EA converge in a sense to the population dynamics predicted by this model. This usage is different from conventional ones where it means that the EA eventually locates and gets stuck in some local or global optima. Convergence results are the foundations and justifications of IPMs.

The main results of the article can be summarized as follows. First, we show that a widely cited research on convergence of IPM was problematic. It is mainly because the core assumption of exchangeability of individuals in their proof cannot lead to the convergence conclusion. Then, we build an analytical framework from a different perspective and show that it defines convergence in general settings. Then, to show the effectiveness of our framework, we prove the convergence of IPM of simple EA with mutation and crossover operators when the initial population follows an identical and independent distribution (i.i.d.). Finally, we discuss the results and point out that the convergence of IPM of simple EA with proportionate selection is yet to be developed.

*marginal*distribution of the population change between consecutive generations. The authors proved that if individuals are exchangeable in a population, as the population size goes to infinity, the marginal p.d.f.s of populations of simple EA will converge

*point-wise*to the p.d.f.s. calculated by the following transition equation:

In Section 2, we will examine Qi's model and show that their study is *unsound* and *incomplete*. First, we provide a counterexample to show that in the authors' proof a key assertion about the law of large numbers (LLN) for exchangeable random vectors is generally not true. Therefore, the whole proof is unsound. Furthermore, we show that the modeling assumption of exchangeability of individuals cannot yield the transition equation in general. This means that under the authors' modeling assumption, the conclusion (1) cannot be reached.

In addition, we show that the authors' proofs in Qi and Palmieri (1994a,b) are incomplete. The authors did not address the convergence of the stacking of operators and of recursively iterating the algorithm. In essence, the authors attempted to prove the convergence of the IPM for only *one* iteration step. Even if the proof for (1) is correct, it only shows that as $n\u2192\u221e$ the marginal p.d.f.s of the $(k+1)$th population converges point-wise to $fxk+1(x)$, provided that the marginal p.d.f. of the $k$th generation is $fxk(x)$. However, this convergence does not automatically hold for all subsequent generations. As a result, (1) cannot be iterated to make predictions for subsequent ($>k+1$) generations.

Besides Qi and Palmieri (1994a,b), we found no other studies that attempted to prove the convergence of IPMs for EAs on *continuous* space. Therefore, in Section 3 we propose a general analytical framework. The novelty of our framework is that from the very start of the analysis, we model generations of the population as random elements taking values in the metric space of infinite sequences, and we use convergence in distribution instead of point-wise convergence to define the convergence of IPMs.

To illustrate the effectiveness of our framework, we perform convergence analysis of IPM of the simple EA in Sections 4 and 5. In Section 4, we adopted a “stronger” modeling assumption that individuals of the same generation in the IPM are identically and independently distributed (i.i.d.), and we gave sufficient conditions under which the IPM is convergent. For general EA, this assumption may seem restricted at first sight, but it turns out to be a reasonable one. In Section 5, we analyze the mutation operator and the $k$-ary recombination operator. We show that these commonly used operators have the property of producing i.i.d. populations, in the sense that if the initial population is i.i.d., as the population size goes to infinity, in the limit all subsequent generations are also i.i.d. This means that for these operators, the transition equation in the IPM can predict the real population dynamics as the population size goes to infinity. We also show that our results hold even if these operators are stacked together and iterated repeatedly by the algorithm. Finally, in Section 6 we conclude the article and propose future research.

To be complete, regarding Qi and Palmieri (1994a,b), there is a comment by Yong et al. (1998) with a reply published. However, the comment was mainly about the latter part of Qi and Palmieri (1994a), where the authors analyzed the properties of EAs based on the IPM. It did not discuss the proof for the model itself. For IPMs of EAs on *discrete* optimization problems, extensive research was done by Vose et al. in a series of studies (Nix and Vose, 1992; Vose, 1999b, 1999a, 2004). The problems under consideration were discrete optimization problems with *finite* solution space. The starting point of the authors' analysis was to model each generation of the population as an “incidence vector,” which describes for each point in the solution space the proportion of the population it occupies. Based on this representation the authors derived transition equations between incidence vectors of consecutive generations and analyzed their properties as the population size goes to infinity. However, for EAs on *continuous* solution space, the analyses of Vose et al. are not immediately applicable. This is because for continuous optimization problems the solution space is not denumerable. Therefore, the population cannot be described by a finite-dimensional incidence vector.

## 2 Discussion of the Works of Qi et al.

In this section we analyze the results of Qi and Palmieri (1994a,b). We begin by introducing some preliminaries for the analysis. Then, in Section 2.2, following the notations and derivations in the authors' papers, we provide a counterexample to show that the convergence proof for the transition equation in Qi and Palmieri (1994a) is problematic. We further show that the modeling assumption of exchangeability cannot yield the transition equation in general. In Section 2.3, we show that the analyses in Qi and Palmieri (1994a,b) are incomplete. The authors did not prove the convergence of IPMs in the cases where operators are stacked together and the algorithm is iterated for multiple generations.

### 2.1 Preliminaries

After presenting the optimization problem and the algorithm, the authors proved the convergence of the IPM if the distributions of individuals in the population are exchangeable. It is the main result in Qi and Palmieri (1994a).

$(Theorem1inQiandPalmieri,1994a):$ Assume that the fitness function $g(x)$ in (2) and the mutation operator of simple EA described by (4) satisfy the following conditions:

$0<gmin\u2264g(x)\u2264gmax<\u221e,\u2200x\u2208F$.

$supx,y\u2208Rdfw(x|y)\u2264M<\u221e$.

In Theorem ^{1}, $fxk$ is the marginal p.d.f. of the $k$th generation predicted by the IPM. It should be emphasized that in Qi and Palmieri (1994a,b), the authors proved this theorem under the assumption that simple EA has exchangeable individuals in the population. Though not explicitly stated in the theorem, the assumption of exchangeability is the core assumption in their proof and an integral part of their formulation of the theorem.

For analyses in this article, we use the concept of exchangeability in probability theory. Its definition and some basic facts are listed.

$(Exchangeablerandomvariables,Definition1.1.1inTayloretal.,1985):$ A finite set of random variables ${xi}i=1n$ is said to be exchangeable if the joint distribution of $(xi)i=1n$ is invariant with respect to permutations of the indices $1,2,\cdots ,n$. A collection of random variables ${x\alpha :\alpha \u2208\Gamma}$ is said to be exchangeable if every finite subset of ${x\alpha :\alpha \u2208\Gamma}$ is exchangeable.

Definition ^{2} can also be extended to cover exchangeable random vectors or exchangeable random elements by replacing the term “random variables” in the definition with the respective term. One property of exchangeability is that if ${xi}i=1n$ are $n$ exchangeable random elements, then the joint distributions of any $1\u2264k\u2264n$ distinct ones of them are always the same (Proposition 1.1.1 in Taylor et al., 1985). When $k=1$ this property indicates that ${xi}i=1n$ have the same marginal distribution. Another property is that a collection of random elements are exchangeable if and only if they are conditionally independent and identically distributed (c.i.i.d.) given some $\sigma $-field $G$ (Theorem 1.2.2 in Taylor et al., 1985). Conversely, a collection of c.i.i.d. random elements are always exchangeable. Finally, it is obvious that i.i.d. random elements are exchangeable, but the converse is not necessarily true.

It can be seen that the simple EA generates c.i.i.d. individuals given the current population. Therefore, the individuals within the same generation are exchangeable, and they have the same marginal distribution. It is the core assumption of the proof in Qi and Palmieri (1994a,b) and Condition 3 in the Theorem ^{1}.

### 2.2 Convergence Proof of the Transition Equation

^{1}and show that it is incorrect. The proof by Qi et al. is in Appendix A of Qi and Palmieri (1994a). In the proof, the authors assumed that individuals in the same generation are exchangeable; therefore, they have the same marginal distribution. After a series of derivation steps, the authors managed to obtain a transition equation between the density functions of $xk+1i$ and $Xk$:

*correct*until this step. However, the authors then asserted that

^{1}is proved.

In the following, we provide a counterexample to show that assertion (11) is not true when $N\u22652$ ($N=1$ is the degenerate case). Then, we carry out further analysis to show that under the modeling assumption of exchangeability, conclusion (12) or equivalently Theorem ^{1} cannot be true in general.

#### 2.2.1 On Assertion (11)

^{1}imply

However, we use the following counterexample (modified from Example 1.1.1 and related discussions on pages 11–12 in Taylor et al. (1985), to show that assertion (16) is not true. Therefore, (11) is not true.

#### 2.2.2 Counterexample

*independent*of ${zl}l=1\u221e$ satisfying

It can easily be verified that ${yl}l=1\u221e$ and $y$ satisfy (13) and (15). Since $zl$ is bounded, $E(|zl|)<\u221e$ for any $l$. By the strong law of large numbers (SLLN) for i.i.d. random variables, $1N\u2211l=1Nzl\u21920a.s.asN\u2192\u221e$; therefore, (14) is also satisfied, that is, $y$ is the limit of $1N\u2211l=1Nyl$ as $N\u2192\u221e$. However, because $1N\u2211l=1Nyl=y+1N\u2211l=1Nzl$ and $y$ is independent of ${zl}l=1\u221e$, it can be seen that $1N\u2211l=1Nyl$ is *not* independent of $y$ except for some degenerate cases (e.g., when $y$ equals to a constant). In particular, in general $yl=y+zl$ is *not* independent of $y$ for any $l$. Therefore, assertion (16) is not true. Equivalently, assertion (11) is not true. This renders the authors' proof for (12) invalid.

#### 2.2.3 Further Analysis

In the following, we carry out further analysis to show that (12) cannot be true even considering other methods of proof and adding new sufficient conditions. Therefore, in general, Theorem ^{1} cannot be true.

To begin with, consider the random variable $\xi k(x)\eta kN$. We prove the following lemma.

$E\xi k(x)\eta kN\u2192E\xi k(x)\eta k$ as $N\u2192\u221e$.

According to (9), $\eta kN\u2192a.s.\eta k$, since $gmin\u2264\eta kN\u2264gmax$, $0<gmin\u2264\eta k\u2264gmax$ almost surely.

Finally, by the conditions in Theorem ^{1}, $0\u2264\xi k(x)\eta kN\u2264Mgmaxgmin$. By the Lebesgue's Dominated Convergence Theorem (Proposition 11.30 in Port, 1994), $E\xi k(x)\eta kN\u2192E\xi k(x)\eta k$ as $N\u2192\u221e$.$\u25a1$

^{3}, (12) is equivalent to

Now it is clear that if the only assumption is exchangeability, ($\u25b5$) is *not* true even considering other methods of proof. Of course, if (11) is true, $\xi k(x)$ and $\eta k$ are independent, then ($\u25b5$) is true. However, as already shown by the counterexample, (11) is not true in general. Therefore, ($\u25b5$), and equivalently Theorem ^{1}, are in general not true.

A natural question then arises: Is it possible to introduce some reasonable sufficient conditions such that ($\u25b5$) can be proved? One of such conditions frequently used is that $\eta k=E[g(xkj)]$. However, the following analysis shows that given the modeling assumption of exchangeability, this condition is not true in general. Therefore, it cannot be introduced.

### 2.3 The Issue of the Stacking of Operators and Iterating the Algorithm

In the following, we discuss IPMs from another perspective and show that the proofs in Qi and Palmieri (1994a,b) are incomplete, as they only consider convergence of one iteration. Because discussion in this section relates very closely to our proposed framework, **we will use our notation from here on in our article consistently**, which is different from the previous two sections where we followed strictly the notations of Qi and Palmieri (1994a,b).

Consider an EA with only one operator. Let the operator be denoted by $H$. When the population size is $n$, denote this EA by $EAn$ and the operator it *actually* uses by $Hn$. Let $Pkn=(xk,in)i=1n$ denote the $k$th generation produced by $EAn$. Then the transition rules between consecutive generations produced by $EAn$ can be described by $Pk+1n=Hn(Pkn)$. In Table 1, we write down the population dynamics of $EAn$. Each row in Table 1 shows the population dynamics produced by $EAn$. In the table $Pkn$ is expanded as $[Hn]k(P0n)$. Let $EA\u221e$ denote the IPM, and $Pk\u221e=[H\u221e]k(P0\u221e)$ denote the populations predicted by $EA\u221e$. Then we can summarize the results in Qi and Palmieri (1994a) in the following way.

. | 0 . | 1 . | 2 . | $\cdots $ . |
---|---|---|---|---|

$EA1$ | $P01$ | $H1(P01)$ | $H1(H1(P01))$ | … |

$\vdots $ | $\vdots $ | $\vdots $ | $\vdots $ | |

$EAn$ | $P0n$ | $Hn(P0n)$ | $Hn(Hn(P0n))$ | … |

$\vdots $ | $\vdots $ | $\vdots $ | $\vdots $ | |

$EA\u221e$ | $P0\u221e$ | $H\u221e(P0\u221e)$ | $H\u221e(H\u221e(P0\u221e))$ | … |

. | 0 . | 1 . | 2 . | $\cdots $ . |
---|---|---|---|---|

$EA1$ | $P01$ | $H1(P01)$ | $H1(H1(P01))$ | … |

$\vdots $ | $\vdots $ | $\vdots $ | $\vdots $ | |

$EAn$ | $P0n$ | $Hn(P0n)$ | $Hn(Hn(P0n))$ | … |

$\vdots $ | $\vdots $ | $\vdots $ | $\vdots $ | |

$EA\u221e$ | $P0\u221e$ | $H\u221e(P0\u221e)$ | $H\u221e(H\u221e(P0\u221e))$ | … |

*known*sequence of individuals, represented by $P0=(xi)i=1\u221e$. For $EAn$, its initial population $P0n$ consists of the first $n$ elements of $P0$; that is, $P0n=(xi)i=1n$. Let $P0\u221e=P0$. This setting represents the fact that $EAn$ and $EA\u221e$ use the same initial population. $Hn$ can be viewed as operators on $P0$ which takes only the first $n$ elements to produce the next generation. Then the authors essentially proved that

However, apart from the fact that this proof is problematic, the authors' proof covers only *one* iteration step, corresponding to the column-wise convergence of the $k=1$ column in Table 1. The problem is that even if (21) is true, it does not automatically lead to the conclusion that for the arbitrary $k$th step, $[Hn]k(P0)\u27f6m.p.w.[H\u221e]k(P0)$ as $n\u2192\u221e$. In other words, one has to study whether the transition equation for one step can be iterated recursively to predict populations after multiple steps. In Table 1, this problem corresponds to whether other columns have similar column-wise convergence property when the convergence of the $k=1$ column is proved.

*incomplete*.

The issue of the stacking of operators is similar. Given some operator $H$ satisfying (20) and some operator $G$ satisfying $Gn(P0)\u27f6m.p.w.G\u221e(P0)$ as $n\u2192\u221e$, it is *not* necessarily true that $Hn(Gn(P0))\u27f6m.p.w.H\u221e(G\u221e(P0))$ as $n\u2192\u221e$. However, the authors in Qi and Palmieri (1994b) totally ignored this issue and combined the transition equations for selection, mutation and crossover together (in Section III of Qi and Palmieri 1994b) without any justification.

In addition, there are several statements in the authors' proofs in Qi and Palmieri (1994b) that are questionable. First, in the first paragraph of Appendix A (the proof for Theorem ^{1} in that paper), the authors considered a pair of parents $xk$ and $xk'$ for the uniform crossover operator. $xk$ and $xk'$ are “drawn from the population independently with the same density of $fxk\u2261fxk'$.” Then, the authors claimed that “the joint density of $xk$ and $xk'$ is therefore $fxk\xb7fxk'$.” This is simply not true. Two individuals drawn independently from the same population are *conditionally* independent, they are not necessarily independent. In fact, without the i.i.d. assumption, it is very likely that individuals in the same population are dependent. Therefore, the joint density function of $xk$ and $xk'$ is not necessarily $fxk\xb7fxk'$, and the authors' proof for Theorem ^{1} in Qi and Palmieri (1994b) is dubious at best. On the other hand, if the authors' modeling assumption is i.i.d. of individuals for the uniform crossover operator, this assumption is incompatible with the modeling assumption of exchangeability in Qi and Palmieri (1994a) for selection and mutation. Therefore, combining the transition equations for all these three operators is problematic, because the i.i.d. assumption cannot hold beyond one iteration step.

Another issue in Qi and Palmieri (1994b) is that the uniform crossover operator produces two *dependent* offspring at the same time. As a result, after uniform crossover, the intermediate population is not even exchangeable because it has pair-wise dependency between individuals. Then the same incompatible assumption problem arises: that is, the transition equation for the uniform crossover operator cannot be combined with the transition equations for selection and mutation. Besides, the transition equation for the uniform crossover operator cannot be iterated beyond one step and still hold i.i.d. or exchangeability as its modeling assumption.

In summary, several issues arise from previous studies on IPMs for EAs on continuous optimization problems. Therefore, new frameworks and proof methods are needed for analyzing the convergence of IPMs and studying the issue of the stacking of operators and iterating the algorithm.

## 3 Proposed Framework

In this section, we present our proposed analytical framework. In constructing the framework we strive to achieve the following three goals.

The framework should be general enough to cover real-world operators and to characterize the evolutionary process of real EA.

The framework should be able to define the convergence of IPMs and serve as justifications of using them. The definition should match one's intuition and at the same time be mathematically rigorous.

The framework should provide an infrastructure to study the issue of the stacking of operators and iterating the algorithm.

The contents of this section roughly reflect the pursuit of the first two goals. The third goal is reflected in the sufficient conditions for convergence and i.i.d. IPM construction in Section 4 and the analyses of the simple EA in Section 5. More specifically, in Section 3.1, we introduce notations and preliminaries for the remainder of this article. In Section 3.2, we present our framework. In the framework, each generation is modeled by a random sequence. This approach unifies the spaces of random elements modeling populations of different sizes. In Section 3.3, we define the convergence of the IPM as convergence in distribution on the space of random sequences. We summarize and discuss our framework in Section 3.4.

### 3.1 Notations and Preliminaries

^{1}are sometimes needed. We will introduce them when they are required.

From now on we use $N$ to denote the set of nonnegative integers and $N+$ the set of positive integers. For any two real numbers $a$ and $b$, let $a\u2227b$ be the smaller one of them and $a\u2228b$ be the larger one of them. Let $x,y$ be random elements of some measurable space $(\Omega ,F)$. We use $L(x)$ to represent the law of $x$. If $x$ and $y$ follow the same law, that is, $P(x\u2208A)=P(y\u2208A)$ for every $A\u2208F$, we write $L(x)=L(y)$. Note that $L(x)=L(y)$ and $x=y$ have different meanings. In particular, $x=y$ indicates dependency between $x$ and $y$.

We use the notation $(xi)i=mn$ to represent the array $(xm,xm+1,\cdots ,xn)$. When $n=\u221e$, $(xi)i=m\u221e$ represents the infinite sequence $(xm,xm+1,\cdots )$. ${xi}i=mn$ and ${xi}i=m\u221e$ represent the collections ${xm,xm+1,\cdots ,xn}$ and ${xi|i=m,m+1,\cdots}$, respectively. When the range is clear, we use $(xi)i$ and ${xi}i$ or $(xi)$ and ${xi}$ for short.

Let $S$ denote the solution space $Rd$. This simplifies the notation system when we discuss the spaces $Sn$ and $S\u221e$. In the following, we define metrics and $\sigma $-fields on $S$, $Sn$ and $S\u221e$ and state properties of the corresponding measurable spaces.

$S$ is equipped with the ordinary metric $\rho (x,y)=[\u2211i=1d(xi-yi)2]12$. Let $S$ denote the Borel $\sigma $-field on $S$ generated by the open sets under $\rho $. Together $(S,S)$ defines a measurable space.

Similarly, $Sn$ is equipped with the metric $\rho n(x,y)=[\u2211i=1n\rho 2(xi,yi)]12$, and the corresponding Borel $\sigma $-field under $\rho n$ is denoted by $S'n$. Together $(Sn,S'n)$ is the measurable space for $n$ tuples.

Next, consider the space of infinite sequences $S\u221e={(x1,x2,\cdots )|xi\u2208S,i\u2208N+}$. It is equipped with the metric $\rho \u221e(x,y)=\u2211i=1\u221e12i\xb7\rho (xi,yi)1+\rho (xi,yi)$. The Borel $\sigma $-field on $S\u221e$ under $\rho \u221e$ is denoted by $S'\u221e$. Then $(Sn,S'\u221e)$ is the measurable space for infinite sequences.

Since $S$ is separable and complete, it can be proved that $Sn$ and $S\u221e$ are also separable and complete (Appendix M6 in Billingsley, 1999). In addition, because of separability, the Borel $\sigma $-fields $S'n$ and $S'\u221e$ are equal to $Sn$ and $S\u221e$, respectively. In other words, the Borel $\sigma $-fields $S'n$ and $S'\u221e$ generated by the collection of open sets under the corresponding metrics coincide with the product $\sigma $-fields generated by all measurable rectangles ($Sn$) and all measurable cylinder sets ($S\u221e$), respectively (Lemma 1.2 in Kallenberg, 2002). Therefore, from now on we write $Sn$ and $S\u221e$ for the corresponding Borel $\sigma $-fields. Finally, let $M$, $Mn$ and $M\u221e$ denote the set of all random elements of $S$, $Sn$ and $S\u221e$, respectively.

Let $\pi n:S\u221e\u2192Sn$ be the natural projection: $\pi n(x)=(x1,x2,\cdots ,xn)$. Since given $x\u2208M\u221e$, $(\pi n\u2218x):\Omega \u2192Sn$ defines a random element of $Sn$ projected from $S\u221e$, we also use $\pi n$ to denote the mapping: $\pi n:M\u221e\u2192Mn$ where $\pi n(x)=(x1,x2,\cdots ,xn)$. By definition, $\pi n$ is the operator which truncates random sequences to random vectors. Given $A\u2282S\u221e$, we use $\pi n(A)$ to denote the projection of $A$; that is, $\pi n(A)={x\u2208Sn:x=\pi n(y)forsomey\u2208A}$.

### 3.2 Analytical Framework for EA and IPMs

In this section, we present an analytical framework for the EA and IPMs. First, the modeling assumptions are stated. We deal only with operators that generate c.i.i.d. individuals. Then, we present an abstraction of the EA and IPMs. This abstraction serves as the basis for building our framework. Finally, the framework is presented. It unifies the range spaces of the random elements and defines the convergence of IPMs.

#### 3.2.1 Modeling Assumptions

We assume that the EA on the problem (23) is time homogeneous and Markovian, such that the next generation depends only on the current one, and the transition rule from the $k$th generation to the $(k+1)$th generation is invariant with respect to $k\u2208N$. We further assume that individuals in the next generation are c.i.i.d. given the current generation. As this assumption is the only extra assumption introduced in the framework, it may need some further explanation.

The main reason for introducing this assumption is to simplify the analysis. Conditional independence implies exchangeability, therefore individuals in the same generation $k\u2208N+$ are always exchangeable. As a result, it is possible to exploit the symmetry in the population and study the transition equations of marginal distributions. Besides, it is because of conditional independence that we can easily expand the random elements modeling finite-sized populations to random sequences, and therefore define convergence in distribution for random elements of the corresponding metric space. In addition, many real world operators in EAs satisfy this assumption, such as the proportionate selection operator and the crossover operator analyzed in Qi and Palmieri (1994a,b). Finally, we emphasize that exchangeability is a property that facilitates the analysis of convergence across multiple iterations and different operators. Because of the generational nature of EA, we want a property that leads to convergence for one iteration and also holds as a premise for the analysis of the next iteration. Exchangeability serves as this property in our analysis. By exchangeability the convergence results can be extended to further iterations and stacked with results from other operators.

However, we admit that there are some exceptions to our assumption. A most notable one may be the mutation operator, though it does not pose significant difficulties. The mutation operator perturbs each individual in the current population independently, according to a common conditional p.d.f. If the current population is not exchangeable, then after mutation the resultant population is not exchangeable, either. Therefore, it seems that mutation does not produce c.i.i.d. individuals. However, considering the fact that mutation is often used along with other operators, as long as these other operators generate c.i.i.d. populations, the individuals after mutation will be c.i.i.d., too. Therefore, a combined operator of mutation and any other operator satisfying the c.i.i.d. assumption can satisfy our assumption. An example can be seen in Qi and Palmieri (1994a), where mutation is analyzed together with proportionate selection. On the other hand, an algorithm which only uses mutation is very simple. It can be readily modeled and analyzed without much difficulty.

Perhaps more significant exceptions are operators such as selection *without* replacement, or the crossover operator which produces two dependent offspring at the same time. In fact, for these operators not satisfying the c.i.i.d. assumption, it is still possible to expand the random elements modeling finite-sized population to random sequences. For example, the random elements can be padded with some fixed constants or random elements of known distributions to form the random sequences. In this way, our definition of the convergence of IPMs can still be applied. However, whether in this scenario convergence in distribution for these random sequences can still yield meaningful results similar to the transition equation is another research problem. It may need further investigation. Nonetheless, our assumption is equivalent to the exchangeability assumption generally used in previous studies.

#### 3.2.2 The Abstraction of EA and IPMs

Given the modeling assumptions, we develop an abstraction to describe the population dynamics of the EA and IPMs.

Let the EA with population size $n$ be denoted by $EAn$, and the $k$th $(k\u2208N)$ generation it produces be modeled as a random element $Pkn=(xk,in)i=1n\u2208Mn$, where $xk,in\u2208M$ is a random element representing the $i$th individual in $Pkn$. Without loss of generality, assume that the EA has two operators, $G$ and $H$. In each iteration, the EA first employs $G$ on the current population to generate an intermediate population, on which it then employs $H$ to generate the next population. Notice that here $G$ and $H$ are just terms representing the operators in the real EA. They facilitate describing the evolutionary process. For $EAn$, $G$ and $H$ are actually *instantiated* as functions from $Mn$ to $Mn$, denoted by $Gn$ and $Hn$, respectively. For example, if $G$ represents proportionate selection, the function $Gn:Mn\u2192Mn$ is the actual operator in $EAn$ generating $n$ c.i.i.d. individuals according to the conditional probability (3). Of course, for the above abstraction to be valid, the operators used in $EAn$ should actually produce random elements in $Mn$; that is, the newly generated population should be measurable on $(Sn,Sn)$. As most operators in real EAs satisfy this condition and this is the assumption implicitly taken in previous studies, we assume that this condition is automatically satisfied.

#### 3.2.3 The Proposed Framework

As stated before, for each generation $k\u2208N$, the elements of the sequence $(Pk1,Pk2,\cdots )$ and the limit $Pk\u221e$ are all random elements of different metric spaces. Therefore, the core of developing our model is to expand $Pkn$ to random sequences, while ensuring that this expansion will not affect modeling the evolutionary process of the real EA. The result of this step is the sequence of random sequences $(Qkn\u2208M\u221e)k=0\u221e$ for each $n\u2208N+$, which completely describes the population dynamics of $EAn$. For the population dynamics of $EA\u221e$, we just let $Qk\u221e=Pk\u221e$.

The expansion of $Pkn$ and the relationships between $Pkn$, $Qkn$, and $Qk\u221e$ are the core of our framework. In the following, we present them rigorously.

#### 3.2.4 The Expansion of $Pkn$

We start by decomposing each of $Gn$ and $Hn$ to two operators. One operator is from $S\u221e$ to $Sn$. It corresponds to how to convert random sequences to random vectors. A natural choice is the projection operator $\pi n$.

To model the evolutionary process, we also have to define how to expand random vectors to random sequences. In other words, we have to define the expansions of $Gn$ and $Hn$, which are functions from $Sn$ to $S\u221e$.

$(Theexpansionofoperator):$ For an operator $Tn:Mn\u2192Mn$ satisfying the condition that for any $x\u2208Mn$, the elements of $Tn(x)$ are c.i.i.d. given $x$, the expansion of $Tn$ is the operator $T\u02dcn:Mn\u2192M\u221e$, satisfying that for any $x\u2208Mn$,

$Tn(x)=(\pi n\u2218T\u02dcn)(x)$.

The elements of $T\u02dcn(x)$ are c.i.i.d. given $x$.

In Definition ^{4}, the operator $T\u02dcn$ is the expansion of $Tn$. Condition 1 ensures that $Tn$ can be safely replaced by $\pi n\u2218T\u02dcn$. Condition 2 ensures that the paddings for the sequence are generated according to the same conditional probability distribution as that used by $Tn$ to generate new individuals. In other words, if the operator $T\u02d9n:Mn\u2192M$ describes how $Tn$ generates each new individual from the current population, $Tn$ is equivalent to invoking $T\u02d9n$ independently on the current population for $n$ times, and $T\u02dcn$ is equivalent to invoking $T\u02d9n$ independently for infinite times. Finally, because $Tn$ satisfies the condition in the premise, the expansion $T\u02dcn$ always exists.

^{4}, the operators in $EAn$ can be decomposed as $Gn=\pi n\u2218G\u02dcn$ and $Hn=\pi n\u2218H\u02dcn$, respectively. Then, the evolutionary process of $EAn$ can be described by the sequence of random sequences $[Qkn=(yk,in)i=0\u221e\u2208M\u221e]k=0\u221e$, satisfying the recurrence equation

Essentially, (26) and (27) describe how the algorithm progresses in the order $\cdots ,Qkn,Pkn,Qk+1n,Pk+1n,\cdots $. It fully characterizes the population dynamics $(Pkn)k$, and it is clear that the extra step of generating $Qkn$ does not introduce modeling errors.

In summary, the relationships between $Pkn$, $Qkn$ and $Qk\u221e$ are better illustrated in Figure 2. This is the core of our framework for modeling the EA and IPMs. For clarity, we also show the intermediate populations generated by $G$ (denoted by $P'kn$), their expansions (denoted by $Q'kn$), and their counterparts generated by $G\u221e$ (denoted by $Q'k\u221e$), respectively. How they fit in the evolutionary process can be clearly seen in the figure.

In Figure 2, a solid arrow with an operator on it means that the item at the arrow head equals the result of applying the operator on the item at the arrow tail. For example, from the figure it can be read that $Q1n=H\u02dcn(P'0n)$. Dashed arrow with a question mark on it signals the place to check whether convergence in distribution holds. For example, when $k=2$, it should be checked whether $(Q2n)n=1\u221e$ converges to $Q2\u221e$ as $n\u2192\u221e$.

Finally, one distinction needs special notice. For $EAm$ and $EAn$ ($m\u2260n$), consider the operators to generate $Pkm$ and $Pkn$. It is clear that $Gm:Mm\u2192Mm$ and $Gn:Mn\u2192Mn$ are two different operators because their domains and ranges are all different. The distinction still exists when we consider $Qkn$, though it is more subtle and likely to be ignored. In Figure 2, if we consider the operator $G^n=\pi n\u2218G\u02dcn:M\u221e\u2192M\u221e$, it is clear that $G^n$ uses the same mechanism to generate new individuals as the one used in $Gn=G\u02dcn\u2218\pi n$, and $Q'kn=G^n(Qkn)$ describes the same population dynamics as that generated by $P'kn=Gn(Pkn)$. However, if we choose $m\u2260n$, $G^m$ and $G^n$ are both functions from $M\u221e$ to $M\u221e$. Therefore, checking domains and ranges are not enough to discern $G^m$ and $G^n$. It is important to realize that the distinction between $G^m$ and $G^n$ lies in the *contents* of the functions. $G^m$ and $G^n$ use $m$ and $n$ individuals in the current population to generate the new population, respectively, although the new population contains infinite number of individuals. In short, $EAm$ and $EAn$ are the EA instantiated with different population sizes. Mathematically, the corresponding population dynamics are modeled by stochastic processes involving *different* operators, even though their domains and ranges may be the same. The same conclusion also holds for the operator $H$.

### 3.3 Convergence of IPMs

Given the framework modeling the EA and IPMs, first, we define convergence in distribution for random elements of $S\u221e$. This is standard material. Then, the convergence of IPMs is defined by requiring that the sequence $(Qk1,Qk2,\cdots )$ converges to $Qk\u221e$ for every $k\u2208N$.

#### 3.3.1 Convergence in Distribution

As $Qkn$ are random elements of $S\u221e$, in the following we define convergence in distribution for sequences of $S\u221e$-valued random elements. Convergence in distribution is equivalent to weak convergence of *induced* probability measures of the random elements. We use the former theory because when modeling individuals and populations as random elements, the former theory is more intuitive and straightforward. The following materials are standard. They contain the definition of convergence in distribution for random elements, as well as some useful definitions and theorems which are used in our analysis of the simple EA. Most of the materials are collected from the theorems and examples in Sections 1–3 of Billingsley (1999). The definition of Prokhorov metric is collected from Section 11.3 in Dudley (2002).

Let $x,y,xn,n\u2208N+$ be random elements defined on a hidden probability space $(\Omega ,F,P)$ taking values in some separable metric space $T$. $T$ is coupled with the Borel $\sigma $-field $T$. Let $(T',T')$ be a separable measurable space other than $(T,T)$.

$(Convergenceindistribution):$ If the sequence $(xn)n=1\u221e$ satisfies the condition that $Eh(xn)\u2192Eh(x)$ for every bounded, continuous function $h:T\u2192R$, we say $(xn)n=1\u221e$ converges in distribution to $x$, and write $xn\u2192dx$.

For $\epsilon >0$, let $A\epsilon ={y\u2208T:d(x,y)<\epsilon forsomex\u2208A}$. Then it is well known that convergence in distribution on separable metric spaces can be metricized by the Prokhorov metric.

Call a set $A$ in $T$ an $x$-continuity set if $P(x\u2208\u2202A)=0$, where $\u2202A$ is the boundary set of $A$.

$(ThePortmanteautheorem):$ The following statements are equivalent.

$xn\u2192dx$.

$lim supnP(xn\u2208F)\u2264P(x\u2208F)$ for all closed set $F\u2208T$.

$lim infnP(xn\u2208G)\u2265P(x\u2208G)$ for all open $G\u2208T$.

$P(xn\u2208A)\u2192P(x\u2208A)$ for all $x$-continuity set $A\u2208T$.

$(Themappingtheorem):$ Suppose $h:(T,T)\u2192(T',T')$ is a measurable function. Denote by $Dh$ the set of discontinuities of $h$. If $xn\u2192dx$ and $P(Dh)=0$, then $h(xn)\u2192dh(x)$.

Let $a,an$ be random elements of $T$, $b,bn$ be random elements of $T'$, then $(ab)T$ and $(anbn)T$ are random elements of $T\xd7T'$. Note that $T\xd7T'$ is separable.

$(Convergenceindistributionforproductspaces):$ If $a$ is independent of $b$ and $an$ is independent of $bn$ for all $n\u2208N+$, then $(anbn)T\u2192d(ab)T$ if and only if $an\u2192da$ and $bn\u2192db$.

Theorem ^{9} is adapted from Theorem 2.8 (ii) in Billingsley (1999).

Let $z,zn,n\u2208N+$ be random elements of $S\u221e$.

$(Finite-dimensionalconvergence):$$zn\u2192dz$ if and only if $\pi m(zn)\u2192d\pi m(z)$ for any $m\u2208N+$.

Theorem ^{10} basically asserts that convergence in distribution for countably infinite dimensional random elements can be studied through their finite-dimensional projections. It is adapted from Example 1.2 and Example 2.4 in Billingsley (1999). In Billingsley (1999), the metric space under consideration is $R\u221e$. However, as both $R$ and $S$ are separable, it is not difficult to adapt the proofs for $R\u221e$ to a proof for Theorem ^{10}. Note that $\pi m(z)$ are random elements defined on $(\Omega ,F,P)$ taking values in $(Sm,Sm)$, and $P[\pi m(z)\u2208A]=P(z\u2208A\xd7S\xd7S\xd7\cdots )$ for every $A\u2208Sm$. The same is true for $\pi m(zn)$.

#### 3.3.2 Convergence of IPM

As convergence in distribution is properly defined, we can use the theory to define convergence of IPMs. The idea is that IPM is convergent (thus justifiable) if and only if it can predict the limit distribution of the population dynamics of $EAn$ for *every* generation $k\u2208N$ as the population size $n$ goes to infinity. It captures the limiting behaviors of real EAs.

Definition ^{11} is essentially the core of our proposed framework. It defines the convergence of IPM and is rigorous and clear.

### 3.4 Summary

In this section, we built a framework to analyze the convergence of IPMs. The most significant feature of the framework is that we model the populations as random sequences, thereby unifying the ranges of the random elements in a common metric space. Then, we gave a rigorous definition for the convergence of IPMs based on the theory of convergence in distribution.

Our framework is general. It only requires that operators produce c.i.i.d. individuals. In fact, any EA and IPM satisfying this assumption can be put into the framework. However, to obtain meaningful results, the convergence of IPMs has to be proved. This may require extra analyses on IPM and the inner mechanisms of the operators. These analyses are presented in Sections 4 and 5.

Finally, there is one question worth discussing. In our framework, the expansion of operator is carried out by padding the finite population with c.i.i.d. individuals following the *same* marginal distribution. Then a question naturally arises: why not pad the finite population with some other random elements, or just with the constant 0? This idea deserves consideration. After all, if the expansion is conducted by padding 0s, the requirement of c.i.i.d. can be discarded, and the framework and the convergence of IPMs stay the same. However, we did not choose this approach. The reason is that padding the population with c.i.i.d. individuals facilitates the analysis of the IPM. For example, in our analysis in Sections 4 and 5, the sufficient conditions for the convergence of IPMs require us to consider $\Gamma m(Qkn)$, where $\Gamma $ is the operator under analysis. $\Gamma m$ uses the first $m$ elements of $Qkn$ to generate new individuals. Now if $m>n$ and $Qkn$ is expanded from $Pkn$ by padding 0s, $\Gamma m(Qkn)$ does not make any sense because the $m$ individuals used by $\Gamma m$ have $(m-n)$ 0s. This restricts our option in proving the convergence of IPMs.

## 4 Sufficient Conditions for Convergence of IPMs and I.I.D. IPM Construction

In this section, we give applicable sufficient conditions for convergence of IPMs. In Section 4.1, we give sufficient conditions for the convergence of IPMs. To appreciate the necessity, consider the framework in Figure 2. To prove the convergence of IPM, by Definition ^{11}, we should check whether $Qkn\u2192dQk\u221e$ as $n\u2192\u221e$ for every $k\u2208N$. However, this direct approach is usually not viable. To manually check the convergence for all values of $k$ is wearisome and sometimes difficult. This is because as $k$ increases, the distributions of $Qkn$ and $Qk\u221e$ change. Therefore, the method needed to prove $Qkn\u2192dQk\u221e$ as $n\u2192\u221e$ may be different from the method needed to prove $Qk+1n\u2192dQk+1\u221e$ as $n\u2192\u221e$. Of course, after proving the cases for several values of $k$, it may be possible to discover some patterns in the proofs, which can be extended to cover other values of $k$, thus proving the convergence of the IPM. But this process is still tedious and uncertain.

*one*iteration step for

*each*operator is proved. Then, the results are combined and extended to cover the whole population dynamics. The idea is that if the convergence holds for one generation number $k$, then it can be passed on automatically to all subsequent generations. For example, in Figure 2, consider the operators $G\u221e$ and $G\u02dcn\u2218\pi n$. The first step is to prove that

However, this approach still seems difficult because we have to prove this pass-on relation (30) holds for every $k$. In essence, this corresponds to whether the operators in IPM can be stacked together and iterated for any number of steps. This is the issue of the stacking of operators and iterating the algorithm. Therefore, in Section 4.1, we give sufficient conditions for this to hold. These conditions are important. If they hold, proving the convergence of the overall IPM can be broken down to proving the convergence of one iteration step of each operator in IPM. This greatly reduces the difficulty in deriving the proof.

To model real EAs, IPM has to be constructed reasonably. As shown in Section 2, exchangeability cannot yield the transition equation for the simple EA. This creates the research problem of finding a suitable modeling assumption to derive IPM. Therefore, in Section 4.2, we discuss the issue and propose to use i.i.d. as the modeling assumption in IPM.

### 4.1 Sufficient Conditions for Convergence of IPMs

To derive sufficient conditions for the convergence of the overall IPM, the core step is to derive conditions under which the operators in the IPM can be stacked and iterated.

As before, let $EAn$ and $EA\u221e$ denote the EA with population size $n$ and the IPM under analysis, respectively. Let $\Gamma $ be an operator in the EA, and $\Gamma n:M\u221e\u2192M\u221e$ and $\Gamma \u221e:M\u221e\u2192M\u221e$ be its corresponding expanded operators in $EAn$ and $EA\u221e$, respectively. Note that $\Gamma n$ and $\Gamma \u221e$ generate random elements of $S\u221e$. To give an example, $\Gamma n$ and $\Gamma \u221e$ may correspond to $\pi n\u2218G\u02dcn$ and $G\u221e$ in Figure 2, respectively.

We define a property under which $\Gamma \u221e$ can be stacked with some other operator $\Psi \u221e$ satisfying the same property without affecting the convergence of the overall IPM. In other words, for an EA using $\Psi $ and $\Gamma $ as its operators, we can prove the convergence of IPM by studying $\Psi $ and $\Gamma $ separately. We call this property “the stacking property.” It is worth noting that if $\Phi =\Gamma $, then this property guarantees that $\Gamma \u221e$ can be iterated for any number of times. Therefore, it also resolves the issue of iterating the algorithm.

Let $A\alpha $ be random elements in $M\u221e$ for $\alpha \u2208N+\u222a{\u221e}$. We have the following results.

$(Thestackingproperty):$ Given $U\u2282M\u221e$, if for any converging sequence $An\u2192dA\u221e\u2208U$, $\Gamma n(An)\u2192d\Gamma \u221e(A\u221e)\u2208U$ as $n\u2192\u221e$ always holds, then we say that $\Gamma \u221e$ has the stacking property on $U$.

If $\Psi \u221e$ and $\Gamma \u221e$ have the stacking property on $U$, then $\Psi \u221e\u2218\Gamma \u221e$ has the stacking property on $U$.

For any converging sequence $An\u2192dA\u221e\u2208U\u2282M\u221e$, because $\Gamma \u221e$ has the stacking property on $U$, we have $\Gamma n(An)\u2192d\Gamma \u221e(A\u221e)\u2208U$. Then, $(\Gamma n(An))n$ is also a converging sequence. Since $\Psi \u221e$ has the stacking property on $U$, then by definition we immediately have $(\Psi n\u2218\Gamma n)(An)\u2192d(\Psi \u221e\u2218\Gamma \u221e)(A\u221e)\u2208U$.$\u25a1$

By Theorem ^{13}, any composition of $\Psi \u221e$ and $\Gamma \u221e$ has the stacking property on $U$. In particular, $(\Gamma \u221e)m$ has the stacking property on $U$. The stacking property essentially guarantees that the convergence on $U$ can be passed on to subsequent generations.

$(Sufficientcondition1):$ For an EA consisting of a single operator $\Gamma $, let $\Gamma $ be modeled by $\Gamma \u221e$ in the IPM, $EA\u221e$ and $\Gamma \u221e$ have the stacking property on some space $U\u2282M\u221e$. If the initial populations of both EA and $EA\u221e$ follow the same distribution $PX$ for some $X\u2208U$, then $EA\u221e$ converges.

Note that for $EAn$ and $EA\u221e$, the $k$th populations they generate are $(\Gamma n)k(X)$ and $(\Gamma \u221e)k(X)$, respectively. By Theorem ^{13}, $(\Gamma \u221e)k$ has the stacking property on $U$. Because the sequence $(X,X,\cdots )$ converges to $X\u2208U$, by Definition ^{12}, $(\Gamma n)k(X)\u2192d(\Gamma \u221e)k(X)\u2208U$ as $n\u2192\u221e$. Since this holds for any $k\u2208N$, by Definition ^{11}, $EA\u221e$ converges.$\u25a1$

By Theorems ^{13} and ^{14}, we can prove the convergence of the overall IPM by proving that the operators in the IPM have the stacking property. Comparing with (30), it is clear that the stacking property is a sufficient condition. This is because the stacking property requires that $(\Gamma n(An))n$ converges to a point in $U$ for *any* converging sequence $(An)n$ satisfying $(An)n\u2192dA\u221e\u2208U$, while (29) requires the convergence to hold only for the *specific* converging sequence $(Qkn)n$. Since $(Qkn)n$ is generated by the algorithm, it may have special characteristics regarding converging rate, distributions, etc. On the other hand, checking the stacking property may be easier than proving (29). This is because the stacking property is independent of the generation number $k$.

Another point worth discussing is the introduction of $U$ in Definition ^{12}. Of course, if we omit $U$ (or equivalently let $U=M\u221e$), the stacking property will become “stronger” because if it holds, the convergence of the IPM is proved for the EA starting from *any* initial population. However, in that case the condition is so restricted that the stacking property cannot be proved for many operators.

In Definition ^{12}, it is required that $\Gamma n(An)\u2192d\Gamma \u221e(A\u221e)\u2208U$ as $n\u2192\u221e$. The sequence under investigation is $(\Gamma n(An))n$, which is a sequence of changing operators $(\Gamma n)n$ on a sequence of changing inputs $(An)n$. As both the operators and the inputs change, the convergence of $(\Gamma n(An))n$ may still be difficult to prove. Therefore, in the following, we further derive two sufficient conditions for the stacking property.

First, let $B\alpha ,\beta =\Gamma \beta (A\alpha )$, where $\alpha ,\beta \u2208N+\u222a{\u221e}$. Then, we have the following sufficient conditions for the stacking property.

$(Sufficientcondition2):$ For a space $U$ and all converging sequences $An\u2192dA\u221e\u2208U$, if the following two conditions

$\u2203M\u2208N+$, such that for all $m>M$, $Bn,m\u2192dB\u221e,m$ uniformly as $n\u2192\u221e$, i.e. $supm>M\rho d(Bn,m,B\u221e,m)\u21920$ as $n\u2192\u221e$,

$B\u221e,m\u2192dB\u221e,\u221e\u2208U$ as $m\u2192\u221e$,

are both met, then $\Gamma \u221e$ has the stacking property on $U$.

$(Sufficientcondition3):$ For a space $U$ and all converging sequences $An\u2192dA\u221e\u2208U$, if the following two conditions

$\u2203N\u2208N+$, such that for all $n>N$, $Bn,m\u2192dBn,\u221e$ uniformly as $m\u2192\u221e$, i.e. $supn>N\rho d(Bn,m,Bn,\u221e)\u21920$ as $m\u2192\u221e$,

$Bn,\u221e\u2192dB\u221e,\u221e\u2208U$ as $n\u2192\u221e$,

are both met, then $\Gamma \u221e$ has the stacking property on $U$.

Since Theorems ^{15} and ^{16} are symmetric in $m$ and $n$, proving one of them leads to the other. In the following, we prove Theorem ^{15}. Recall that $\rho d$ is the Prokhorov metric (Definition ^{6}) and $\u2228$ gets the maximal in the expression.

^{15}, $\u2203N$ s.t. $supm>M\rho d(Bn,m,B\u221e,m)<12\epsilon $ for all $n>N$. By condition 2 in Theorem

^{15}, $\u2203M\u02dc$ s.t. $\rho d(B\u221e,m,B\u221e,\u221e)<12\epsilon $ for all $m>M\u02dc$. Now for all $l>M\u2228N\u2228M\u02dc$,

To understand these two theorems, consider the relationships between $A\alpha $ and $B\alpha ,\beta $ illustrated by Figure 3. In the figure, the solid arrow represents the premise in Definition ^{12}; that is, $An\u2192dA\u221e\u2208U$ as $n\u2192\u221e$. The double line arrow represents the direction to be proved for the stacking property on $U$, i.e. $Bn,n\u2192dB\u221e,\u221e\u2208U$ as $n\u2192\u221e$. The dashed arrows are the directions to be checked for Theorem ^{15} to hold. The wavy arrows are the directions to be checked for Theorem ^{16} to hold.

Now it is clear that Theorems ^{15} and ^{16} bring benefits. For example, for Theorem ^{15}, instead of proving the convergence for a sequence generated by changing operators and inputs ($Bn,n\u2192dB\u221e,\u221e$), this sufficient condition considers the convergence of sequences generated by the *same* operator on changing inputs ($Bn,m\u2192dB\u221e,m$) and of the sequence generated by changing operators on the *same* input ($B\u221e,m\u2192dB\u221e,\u221e$).

The reason we introduce $M$ and $N$ in Theorems ^{15} and ^{16}, respectively, is to exclude some of the starting columns and rows in Figure 3, if necessary. This is useful in proving the convergence of the IPM of the $k$-ary recombination operator.

### 4.2 The I.I.D. Assumption

In this section, we address the issue of how to construct IPM. This issue also corresponds to how to choose the space $U$ for the stacking property.

Before introducing the i.i.d. assumption, let us give an example. Consider the space $U={x\u2208M\u221e|P[x=(c,c,\cdots )]=1forsomec\u2208S}$. If the initial population follows some distribution from $U$, then the population consists of all identical individuals. If an EA with proportionate selection and crossover operates on this initial population, then all subsequent populations stay the same as the initial population. An IPM of this EA can be easily constructed, and it can be easily proved that the stacking property holds as long as the EA chooses its initial population from $U$. However, this is not a very interesting case. This is because $U$ is too small to model real EAs.

On the other hand, if $U={x\u2208M\u221e|xisexchangeable}$, $U$ may be too big to derive meaningful results. This can be seen from our analysis in Section 2 which shows that under exchangeability it is not possible to derive transition equations of marginal distributions for the simple EA.

Therefore, choosing $U$ should strike a balance between the capacity and the complexity of the IPM. In the following analysis, we choose $U$ to be $UI={x\u2208M\u221e|xisi.i.d.}$. IPMs of EAs are constructed using the i.i.d. assumption, and we prove the convergence of the overall IPM by proving that the operators in the IPM have the stacking property on $UI$.

We choose $UI$ for the following reasons. First, in the real world, many EAs generate i.i.d. initial populations. Therefore this assumption is realistic. Secondly, i.i.d. random elements have the same marginal distributions. Therefore, IPM can be described by transition equations of marginal distributions. Finally, there is abundant literature on the converging laws and limit theorems of i.i.d. sequences. Therefore, the difficulty in constructing IPM can be greatly reduced compared with using other modeling assumptions.

In the following, we show how to construct IPM under the i.i.d. assumption. This process also relates to condition 2 in Theorem ^{15}. It essentially describes how the IPM generates new populations.

*each*output can be described by the conditional p.d.f. $f\Gamma m(x|y1,y2,\cdots ,ym)$. Let $a=(ai)i=1\u221e\u2208M\u221e$ be the input and $b=(bi)i=1\u221e=\Gamma m(a)$ be the output, then the distribution of $b$ can be completely described by its

*finite-dimensional*p.d.f.s

*each*new individual. Let the transition equation be

To better understand the construction, it is important to realize that for $\Gamma \u221e$*both* the input and the output are i.i.d. In other words, $\Gamma \u221e$ generates i.i.d. population dynamics to simulate the real population dynamics produced by $\Gamma $, only that the transition equation in $\Gamma \u221e$ is derived by mimicking how $\Gamma $ generates each new individual on i.i.d. inputs and taking the population size to infinity. In fact, if the stacking property on $UI$ is proved and the initial population is i.i.d., $\Gamma \u221e$ will always take i.i.d. inputs and produce i.i.d. outputs. The behaviors of $\Gamma \u221e$ on $UI$ are well-defined. On the other hand, $\Gamma \u221e(A\u2209UI)$ is not defined in the construction. This leaves us freedom. We can define $\Gamma \u221e(A\u2209UI)$ freely to facilitate proving the stacking property of $\Gamma \u221e$. In particular, $Bn,\u221e$ for $n\u2208N+$ in Figure 3 can be defined freely to facilitate the analysis.

In fact, under the i.i.d. assumption, deriving the transition equation for most operators is the easy part. The more difficult part is to prove the stacking property of $\Gamma \u221e$ on $UI$. To give an example, consider the transition equation (5) constructed in Qi and Palmieri (1994a), which models the joint effects of proportionate selection and mutation. As our analysis in Section 2 shows, it does not hold under the assumption of exchangeability. However, if the modeling assumption is i.i.d., the transition equation can be immediately proved (see our analysis in Section 2). This also applies to the transition equation built by the same authors for the uniform crossover operator (in Theorem ^{1} of Qi and Palmieri, 1994b), where the transition equation is in fact constructed under the i.i.d. assumption. Therefore, in the following analyses, we do not refer to the explicit form of the transition equation, unless it is needed. We only assume that the transition equation is successfully constructed, and it has the form (32) which is derived from (31) as $m\u2192\u221e$.

The construction of the IPM also relates partly to condition 2 in Theorem ^{15}. Comparing with this condition, it can be seen that for a successfully constructed $\Gamma \u221e$, the following two facts are proved in the construction (m.p.w. stands for point-wise convergence of marginal p.d.f.s.).

$B\u221e,m\u27f6m.p.w.B\u221e,\u221e$ as $m\u2192\u221e$.

$B\u221e,\u221e\u2208UI$.

Of course, these two facts are not sufficient for this condition to hold. One still needs to prove $B\u221e,m\u2192dB\u221e,\u221e$ as $m\u2192\u221e$. In other words, one has to consider convergence of finite dimensional distributions.

Finally, we sometimes use $x$ for $x1,\cdots ,xl$ if $l$ is clear in the context. For example (30) can be rewritten as $f\pi l(b)(x)=\u222b\u222bSm\u220fi=1lf\Gamma m(xi|y)\xb7f\pi m(a)(y)dy$, where $l$ takes $m$'s place and means the population size which the operator is operating on.

## 5 Analysis of the Simple EA

### 5.1 Analysis of the Mutation Operator

Having derived sufficient conditions for the stacking property and constructed the IPM, we prove the convergence of the IPM of the mutation operator first. Mutation adds an i.i.d. random vector to each individual in the population. If the current population is $A\u2208M\u221e$, then the population after mutation satisfies $L[B=\Gamma m(A)]=L(A+X)$ for all $m\u2208N+$, where $X\u2208UI$ is a random element decided by the mutation operator. As the content of the mutation operator does not depend on $m$, we just write $\Gamma $ to represent $\Gamma m$. To give an example, $X$ may be the sequence $(x1,x2,\cdots )$ with all $xi\u2208M$ mutually independent and $xi\u223cN(0,Id)$ for all $i\u2208N+$, where $N(a,B)$ is the multivariate normal distribution with mean $a$ and covariance matrix $B$, and $Id$ is the $d$-dimensional identity matrix. Note that every time $\Gamma $ is invoked, it generates perturbations independently. For example, let $A1$ and $A2$ be two populations, then we can write $\Gamma (Ai)=Ai+Xi$ for $i=1,2$ satisfying $L(X1)=L(X2)=L(X)$ and ${Xi}i=1,2$ are mutually independent and independent from ${Ai}i=1,2$.

Next, consider $\Gamma \u221e$. Recall that as an IPM, $\Gamma \u221e$ simulates real population dynamics by taking i.i.d. inputs and producing i.i.d. outputs. If the marginal p.d.f.s of $A$ and $X$ are $fa$ and $fx$, respectively, then $\Gamma \u221e(A)$ generates i.i.d. individuals whose p.d.f.s are $fa*fx$, where $*$ stands for convolution. Given the construction, we can prove the stacking property of $\Gamma \u221e$.

$(Mutation):$ Let $\Gamma $ be the mutation operator, and $\Gamma \u221e$ be the corresponding operator in the IPM constructed under the i.i.d. assumption, then $\Gamma \u221e$ has the stacking property on $UI$.

We use the notations and premises in Theorem ^{15}. Refer to Figure 3. In particular, the sequence $(An)$ and the limit $A\u221e$ are given and $An\u2192dA\u221e\u2208UI$ as $n\u2192\u221e$.

^{15}is satisfied.

Noting that condition 1 in Theorem ^{15} is equivalent to $\Gamma (An)\u2192d\Gamma (A\u221e)$, we prove this condition by proving that $\pi i[\Gamma (An)]\u2192d\pi i[\Gamma (A\u221e)]$ for all $i\u2208N+$. Then by Theorem ^{10}, condition 1 in Theorem ^{15} holds. Then, as both conditions in Theorem ^{15} are satisfied, this theorem is proved.

Now, we prove $\pi i[\Gamma (An)]\u2192d\pi i[\Gamma (A\u221e)]$ for all $i\u2208N+$. First, note that $\Gamma (A\alpha )=A\alpha +X\alpha $ for all $\alpha \u2208N\u222a{\u221e}$. ${X\alpha \u2208M\u221e}$ are i.i.d. and independent from ${A\alpha \u2208M\u221e}$. In addition, for every $\alpha $, $L(X\alpha )=L(X)$.

Since $L(X\alpha )=L(X)$, it is apparent that $Xn\u2192dX\u221e$. Then by Theorem ^{10}, we have $\pi i(Xn)\u2192d\pi i(X\u221e)$ and $\pi i(An)\u2192d\pi i(A\u221e)$.

^{9}, it follows that

^{8}, $\pi i[\Gamma (An)]\u2192d\pi i[\Gamma (A\u221e)]$ for any $i\u2208N+$.$\u25a1$

In the proof, we concatenate the input ($An$) and the randomness ($Xn$) of the mutation operator in a common product space, and represent $\Gamma $ as a continuous function in that space. This technique is also used when analyzing other operators.

### 5.2 Analysis of $k$-ary Recombination

*with*replacement. Assume the current population consists of ${xi}x=1m$, and the selected $k$ parents are ${yi}i=1k$, then ${yi}i=1k$ follows the probability:

*joint*distribution of $(Ui)i$ is decided by the inner mechanism of $\Gamma $. Overall, $\Gamma m$ generates the next population by repeatedly using this procedure to generate new individuals independently.

Consider the IPM $\Gamma \u221e$. As stated in Section 4.2, we do not give the explicit form of the transition equation in $\Gamma \u221e$. We assume that the IPM is successfully constructed, and the transition equation is derived by taking $m\u2192\u221e$ in (31). The reason for this approach is not only because deriving the transition equation is generally easier than proving the convergence of the IPM, but also the formulation in (36) and (37) encompasses many real-world $k$-ary recombination operators. We do not delve into details of the mechanisms of these operators and derive a transition equation for each one of them. Instead, our approach is general in that as long as the IPM is successfully constructed, our analysis on the convergence of the IPM can always be applied.

The following theorem is the primary result of our analysis for the $k$-ary recombination operator.

$(k-aryrecombination):$ Let $\Gamma $ be the $k$-ary recombination operator, and $\Gamma \u221e$ be the corresponding operator in the IPM constructed under the i.i.d. assumption, then $\Gamma \u221e$ has the stacking property on $UI$.

We use the notations and premises in Theorem ^{16}. Refer to Figure 3. In particular, the sequence $(An)$ and the limit $A\u221e$ are given and $An\u2192dA\u221e\u2208UI$ as $n\u2192\u221e$.

^{10}, the conclusion follows.

The overall idea to prove (38) is that we first prove the convergence in distribution for the $k\xb7i$ selected parents; then because the recombination operator is continuous, (39) follows.

First, we decompose the operator $\pi i\u2218\Gamma m:M\u221e\u2192Mi$. $\pi i\u2218\Gamma m$ generates the $i$ c.i.i.d. outputs one by one. This generation process can also be viewed as first selecting the $i$ groups of $k$ parents at once from the first $m$ elements of the input (in total the intermediate output is $k\xb7i$ parents not necessarily distinct), then producing the $i$ outputs one by one by using each group of $k$ parents. In the following, we describe this process mathematically.

To prove (44), we prove the following two conditions.

$\u2203N\u2208N+$, such that for all $n>N$, $\Phi m(An)\u2192d\Phi \u221e(An)$ uniformly as $m\u2192\u221e$; that is, $supn>N\rho d[\Phi m(An),\Phi \u221e(An)]\u21920$ as $m\u2192\u221e$.

$\Phi \u221e(An)\u2192d\Phi \u221e(A\u221e)$ as $n\u2192\u221e$ and $\Phi \u221e(A\u221e)$ is i.i.d.

These two conditions correspond to the conditions in Theorem ^{16}. Since $\Phi \alpha $ is from $M\u221e$ to $Mk\xb7i$, we cannot directly apply Theorem ^{16}. However, it is easy to extend the proof of Theorem ^{16} to prove that these two conditions lead to $\Phi n(An)\u2192d\Phi \u221e(A\u221e)$ as $n\u2192\u221e$. Then, by (40) it is apparent that $\Psi $ is a continuous function of its input and inner randomness. By concatenating the input and the inner randomness using the same technique as that used in the proof for Theorem ^{17}, (44) can be proved. Then this theorem is proved.

In the remainder of the proof, we prove conditions 1 and 2. These conditions can be understood by replacing the top line with $\Phi m$ in Figure 3.$\u25a1$

### Proof of Condition 2

Since $\Phi \u221e=\pi k\xb7i:S\u221e\u2192Sk\xb7i$ (recall that $\pi k\xb7i$ can be viewed both as a mapping from $S\u221e$ to $Sk\xb7i$ and from $M\u221e$ to $Mk\xb7i$), $\Phi \u221e$ is continuous (see Example 1.2 in Billingsley, 1999). Since $An\u2192dA\u221e$, by Theorem ^{8}, $\Phi \u221e(An)\u2192d\Phi \u221e(A\u221e)$. Apparently, $\Phi \u221e(A\u221e)$ is i.i.d. Therefore condition 2 is proved.

It is worth noting that this simple proof comes partly from our extension of $\Psi \u2218\Phi \u221e$ to inputs $A\u2209UI$. In fact, the only requirement for $\Phi \u221e$ is (42); that is, $\Psi \u2218\Phi \u221e$ should model $\pi i\u2218\Gamma \u221e$ on *i.i.d.* inputs. By defining $\Phi \u221e$ to be $\pi k\xb7i$, it can take non-i.i.d. inputs such as $An$. Thus this condition can be proved. In Figure 3, this corresponds to our freedom of defining $Bn,\u221e,n\u2208N+$.

### Proof of Condition 1

To prove condition 1, we first give another representation of $\Phi m(A\alpha )$, where $m>k\xb7i$ and $\alpha \u2208N+\u222a{\u221e}$. This representation is based on the following mutually exclusive cases.

The $k\xb7i$ parents chosen from $A\alpha $ by $\Phi m$ are distinct.

There are duplicates in the $k\xb7i$ parents which are chosen from $A\alpha $ by $\Phi m$.

*conditional*distribution of the $k\xb7i$ parents when $sm,\alpha =1$, and $ym,\alpha \u2208Mk\xb7i$ follow the

*conditional*distribution of the $k\xb7i$ parents when $sm,\alpha =0$, then $\Phi m(A\alpha )$ can be further represented as

*distinct*individuals from the current

*exchangeable*population $A\alpha $. Also note that ${sm,\alpha}\alpha $ are i.i.d. random variables. They are independent of $xm,\alpha $ and $ym,\alpha $.

^{7}, $|P[\Phi \u221e(An)\u2208A]-P[\Phi \u221e(A\u221e)\u2208A]|\u21920$. Then apparently (51) converges to 0. Noting that $A$ is arbitrary, by applying 4) in Theorem

^{7}again, $\Phi n(An)\u2192d\Phi \u221e(A\u221e)$ is proved.$\u25a1$

We give a brief discussion of the proof. In our opinion, the most critical step of our proof is decomposing the $k$-ary recombination operator to two suboperators: one is responsible for selecting parents ($\Phi $), the other is responsible for combining them ($\Psi $). In addition, for parent selection, the suboperator does *not* use the information of fitness values. Rather, it selects parents “blindly” according to its own rules (uniform sampling with replacement). This makes the operator $\Phi $ easier to analyze because the way it selects parents does not rely on its input. Therefore, we can prove uniform convergence in (50).

Another point worth mentioning is the choice of Theorem ^{16} in our proof. Though Theorems ^{15} and ^{16} are symmetric, the difficulties of proving them are quite different. In fact, it is very difficult to prove the uniform convergence condition in Theorem ^{15}.

Finally, our proof can be easily extended to cover $k$-ary recombination operators using uniform sampling *without* replacement to select parents for each offspring. The overall proof framework roughly stays the same.

### 5.3 Summary

In this section, we analyzed the simple EA within the proposed framework. As the analysis shows, although the convergence of IPM is rigorously defined, actually proving the convergence for operators usually takes a lot of effort. We did analysis under the IPM construction and sufficient conditions from Section 4, and used various techniques to analyze the mutation operator and the $k$-ary recombination operator. It can be seen that although the sufficient conditions can provide general directions for the proofs, there are still many details to be worked out in order to analyze different operators.

To appreciate the significance of our work, it is worth noting that in Qi and Palmieri (1994a,b), the convergence of the IPMs of the mutation operator, the uniform crossover operator and the proportionate selection operator was not properly proved, and the issue of stacking of operators and iterating the algorithm was not addressed at all. In this article, however, we have proved the convergence of IPMs of several general operators. Since these general operators cover the operators studied in Qi and Palmieri (1994a,b) as special cases, the convergence of the IPMs of mutation and uniform crossover are actually proved in this article. Besides, our proof does not depend on the explicit form of the transition equation of the IPM. As long as the IPM is constructed under the i.i.d. assumption, our proof is valid.

As a consequence of our result, consider the explicit form of the transition equation for the uniform crossover operator derived in Section II in Qi and Palmieri (1994b). As the authors' proof was problematic and incomplete, the derivation of the transition equation was not well founded. However, it can be seen that the authors' derivation is in fact equivalent to constructing the IPM under the i.i.d. assumption. Since we have already proved the convergence of IPM of the $k$-ary crossover operator, the analysis in Qi and Palmieri (1994b) regarding the explicit form of the transition equation can be retained.

## 6 Conclusion and Future Research

In this article, we revisited the existing literature on the theoretical foundations of IPMs, and proposed an analytical framework for IPMs based on convergence in distribution for random elements taking values in the metric space of infinite sequences. Under the framework, commonly used operators such as mutation and recombination were analyzed. Our approach and analyses are new. There are many topics worth studying for future research.

Perhaps the most immediate topic is to analyze the proportionate selection operator in our framework. The reason that the mutation operator and the $k$-ary recombination operator can be readily analyzed is partly because they do not use the information of the fitness value. Also to generate a new individual, these operators draw information from a fixed number of parents. On the other hand, to generate each new individual, the proportionate selection operator actually gathers and uses fitness values of the whole population. This makes analyzing proportionate selection difficult.

We think further analysis on proportionate selection can be conducted in the following two directions.

In the analyses we tried to prove the stacking property on $UI$ for the IPM of proportionate selection. Apart from more efforts trying to prove/disprove this property, it is worth considering modifying the space $UI$. For example, we can incorporate the

*rate*of convergence into the space. If we can prove the stacking property on $UI\u2229U$ where $U$ is the space of converging sequences with rate $O(h(n))$, it is also a meaningful result.Another strategy is to bypass the sufficient conditions and return to Definition

^{11}to prove $Qkn\u2192dQk\u221e$ for every $k$. This is the original method. In essence, it requires studying the convergence of nesting integrals.

Apart from proportionate selection, it is also worth studying whether other operators, such as ranking selection, can be analyzed in our framework. As many of these operators do not generate c.i.i.d. offspring, it makes deriving the IPM and proving its convergence difficult, if not impossible. In this regard, we believe new techniques of modeling and extensions of the framework are fruitful directions for further research.

Finally, it is possible to extend the concept of “incidence vectors” proposed by Vose to the continuous search space. After all, as noted by Vose himself, incidence vectors can also be viewed as marginal p.d.f.s of individuals. As a consequence, the cases of EAs on discrete and continuous solution spaces indeed do bear some resemblance. By an easy extension, the incidence vectors in the continuous space can be defined as functions with the form $\u2211ci\delta (xi)$, where $\delta $ is the Dirac function and $ci$ is the rational number representing the fraction that $xi$ appears in the population. If similar analyses based on this extension can be carried out, many results in Nix and Vose (1992) and Vose (1999b,a, 2004) can be extended to the continuous space.