In this article, we consider a fitness-level model of a non-elitist mutation-only evolutionary algorithm (EA) with tournament selection. The model provides upper and lower bounds for the expected proportion of the individuals with fitness above given thresholds. In the case of so-called monotone mutation, the obtained bounds imply that increasing the tournament size improves the EA performance. As corollaries, we obtain an exponentially vanishing tail bound for the Randomized Local Search on unimodal functions and polynomial upper bounds on the runtime of EAs on the 2-SAT problem and on a family of Set Cover problems proposed by E. Balas.
Evolutionary algorithms are randomized heuristic algorithms employing a population of tentative solutions (individuals) and simulating an evolutionary type of search for optimal or near-optimal solutions by means of selection, crossover, and mutation operators. The evolutionary algorithms with crossover operator are usually called genetic algorithms (GAs). Evolutionary algorithms in general have a more flexible outline and include genetic programming, evolution strategies, estimation of distribution algorithms, and other evolution-inspired paradigms. Evolutionary algorithms are now frequently used in areas of operations research, engineering, and artificial intelligence.
Two major outlines of an evolutionary algorithm are the elitist evolutionary algorithm, that keeps a certain number of most promising individuals from the previous iteration, and the non-elitist evolutionary algorithm, that computes all individuals of a new population independently using the same randomized procedure. In this article, we focus on the non-elitist case.
One of the first theoretical results in the analysis of non-elitist GAs is the Schemata Theorem (Goldberg, 1989) which gives a lower bound on the expected number of individuals from some subsets of the search space (schemata) in the next generation, given the current population. A significant progress in understanding the dynamics of GAs with non-elitist outline was made in Vose (1995) by means of dynamical systems. However most of the findings in Vose (1995) apply to the infinite population case, and it is not clear how these results can be used to estimate the applicability of GAs to practical optimization problems. A theoretical possibility of constructing GAs that provably optimize an objective function with high probability in polynomial time was shown in Vitányi (2000) using rapidly mixing Markov chains. However, Vitányi (2000) provides only a very simple artificial example where this approach is applicable and further developments in this direction are not known to us.
One of the standard approaches to studying evolutionary algorithms in general is based on the fitness levels (Wegener, 2002). In this approach, the solution space is partitioned into disjoint subsets, called fitness-levels, according to values of the fitness function. In Lehre (2011), the fitness-level approach was first applied to upper-bound the runtime of non-elitist mutation-only evolutionary algorithms. Here and below, by the runtime we mean the expected number of fitness evaluations made until an optimum is found for the first time. Upper bounds of the runtime of non-elitist GAs, involving the crossover operators, were obtained later in Corus et al. (2014) and Eremeev (2017). The runtime bounds presented in Corus et al. (2014) and Lehre (2011) are based on the drift analysis. In Moraglio and Sudholt (2015), a runtime result is proposed for a class of convex search algorithms, including some non-elitist crossover-based GAs without mutation, on the so-called concave fitness landscapes.
In this article, we consider the non-elitist evolutionary algorithm, which uses a tournament selection and a mutation operator but no crossover. The -tournament selection randomly chooses individuals from the existing population and selects the best one of them (see, e.g., Thierens and Goldberg, 1994). The mutation operator is viewed as a randomized procedure, which computes one offspring with a probability distribution depending on the given parent individual. In this article, evolutionary algorithms with such outline are denoted as EA. We study the probability distribution of the EA population with regards to a set of fitness levels. The estimates of the EA behavior are based on a priori known parameters of a mutation operator. Using the proposed model we obtain upper and lower bounds on expected proportion of the individuals with fitness above certain thresholds. The lower bounds are formulated in terms of linear algebra and resemble the bound in the Schemata Theorem (Goldberg, 1989). Instead of schemata, here we consider the sets of genotypes with the fitness bounded from below. Besides that, the bounds obtained in this article may be applied recursively up to any given iteration.
The lower bounds on expected proportions of sufficiently fit individuals at iteration also imply the lower bounds on probabilities of finding a genotype with fitness above a specified threshold at any given iteration . Such results are closely related to the area of fixed budget computations, where one has a fixed budget of fitness evaluations that may be spent and the question is how good a solution one can expect to find with this budget (Jansen and Zarges, 2012).
This article pays particular attention to a special case when mutation is monotone. Informally speaking, a mutation operator is monotone if fitter parents have a higher probability of producing fit offspring. One of the most well-known examples of monotone mutation is the bitwise mutation in the case of OneMax fitness function. As shown in Borisovsky and Eremeev (2008), in the case of monotone mutation, one of the most simple evolutionary algorithms, known as the (1+1) EA has the best-possible performance in terms of runtime and probability of finding the optimum.
In the case of monotone mutation, the lower bounds on expected proportions of the individuals turn into equalities for the trivial evolutionary algorithm (1,1) EA. This implies that the tournament selection at least has no negative effect on the EA performance in such a case. This observation is complemented by the asymptotic analysis of the EA with monotone mutation indicating that, given a sufficiently large population size and some technical conditions, increasing the tournament size , always improves the EA performance.
As corollaries of the general lower bounds on expected proportions of sufficiently fit individuals, we obtain polynomial upper bounds on the Randomized Local Search (RLS) runtime on unimodal functions and upper bounds on the runtime of EAs on 2-SAT problem and on a family of Set Cover problems proposed by Balas (1984). Unlike the upper bounds on the runtime of evolutionary algorithms with tournament selection from Corus et al. (2014), Eremeev (2017) and Lehre (2011), which require sufficiently large tournament size, the upper bounds on runtime obtained here hold for any tournament size.
The rest of the article is organized as follows. In Section 2, we give a formal description of the considered EA, introduce an approximating model of the EA population, and define some required parameters of the probability distribution of a mutation operator in terms of fitness levels. In Section 3, using the model from Section 2, we obtain lower and upper bounds on expected proportions of genotypes with fitness above some given thresholds. Section 4 is devoted to analysis of an important special case of monotone mutation operator, where the bounds obtained in the previous section become tight or asymptotically tight. In Section 5, we consider some illustrative examples of monotone mutation operators and demonstrate some applications of the general results from Section 3. In particular, in this section we obtain new lower bounds for probability to generate optimal genotypes at any given iteration for a class of unimodal functions, for 2-SAT problem and for a family of set cover problems proposed by E. Balas (in the latter two cases we also obtain upper bounds on the runtime of the EA). Besides that, in Section 5, we give an upper bound on expected proportion of optimal genotypes for OneMax fitness function. Section 6 contains concluding remarks.
This work extends the conference paper (Eremeev, 2000). The extension consists in comparison of the EA behavior to that of the (1,1) EA, the (1,) EA, and the (1+1) EA in Section 3 and in the new runtime bounds and tail bounds demonstrated in Section 5. The main results from the conference paper are refined and provided with more detailed proofs.
2 Description of Algorithms and Approximating Model
2.1 Notation and Algorithms
Let the optimization problem consist in maximization of an objective function on the set of feasible solutions , where is the search space of all binary strings of length .
The Evolutionary Algorithm EA. The EA searches for the optimal or suboptimal solutions using a population of individuals, where each individual (genotype) is a bitstring , and its components are called genes.
The individuals of the population may be ordered according to the sequence in which they are generated; thus the population may be considered as a vector of genotypes , where is the size of population, which is constant during the run of the EA, and is the number of the current iteration. In this article, we consider a non-elitist algorithmic outline, where all individuals of a new population are generated independently from each other with identical probability distribution depending on the existing population only.
Each individual is generated through selection of a parent genotype by means of a selection operator, and modification of this genotype in mutation operator. During the mutation, a subset of genes in the genotype string is randomly altered. In general, the mutation operator may be viewed as a random variable with the probability distribution depending on .
The genotypes of the initial population are generated with some a priori chosen probability distribution. The stopping criterion may be, for example, an upper bound on the number of iterations . The result is the best solution generated during the run. The EA has the following scheme:
Generate the initial population .
For to do
For to do
Choose a parent genotype from by -tournament selection.
Add to the population .
In theoretical studies, the evolutionary algorithms are usually treated without a stopping criterion (see, e.g., Neumann and Witt, 2010). Unless otherwise stated, in the EA we will also assume that
Note that in the special case of the EA with , we can assume that , since the tournament selection has no effect in this case.
(1,) EA and (1+1) EA. In the following sections, we will also need a description of two simple evolutionary algorithms, known as the (1,) EA and the (1+1) EA.
The genotype of the current individual on iteration of the (1,) EA will be denoted by , and in the (1+1) EA it will be denoted by . The initial genotypes and are generated with some a priori chosen probability distribution. The only difference between the (1,) EA and the (1+1) EA consists in the method of construction of an individual for iteration using the current individual of iteration as a parent. In both algorithms the new individual is built with the help of a mutation operator, which we will denote by . In the case of the (1,) EA, the mutation operator is independently applied times to the parent genotype and out of offspring a single genotype with the highest fitness value is chosen as . (If there are several offspring with the highest fitness, the new individual is chosen arbitrarily among them.) In the (1+1) EA, the mutation operator is applied to once. If is such that then ; otherwise .
2.2 The Proposed Model
The EA may be considered as a Markov chain in a number of ways. For example, the states of the chain may correspond to different vectors of genotypes that constitute the population (see Rudolph, 1994). In this case, the number of states in the Markov chain is . Another model representing the GA as a Markov chain is proposed in Nix and Vose (1992), where all populations that differ only in the ordering of individuals are considered to be equivalent. Each state of this Markov chain may be represented by a vector of components, where the proportion of each genotype in the population is indicated by the corresponding coordinate and the total number of states is . In the framework of this model, Vose and collaborators have obtained a number of general results concerning the emergent behavior of GAs by linking these algorithms to the infinite-population GAs (Vose, 1995).
The major difficulties in application of the above-mentioned models to the analysis of GAs for combinatorial optimization problems are connected with the necessity to use the high-grained information about fitness value of each genotype. In the present article, we consider one of the ways to avoid these difficulties by means of grouping the genotypes into larger classes on the basis of their fitness.
Let be the probability that an individual, which is added after selection and mutation into , has a genotype from for , and According to the scheme of the EA this probability is identical for all genotypes of , i.e. .
for all .
Consider the sequence of identically distributed random variables , where if the -th individual in the population belongs to , otherwise . By the definition, , consequently
Level-Based Mutation. If, for some mutation operator, there exist two equal matrices of lower and upper bounds and , that is, for all then the mutation operator will be called level-based. By this definition, in the case of level-based mutation, does not depend on a choice of genotype and the probabilities are well defined. In what follows, we call a cumulative transition probability. The symbol will denote the matrix of cumulative transition probabilities of a level-based mutation operator.
The cardinality of set may be evaluated analogously to the number of states in the model of Nix and Vose (1992). Now levels replace individual elements of the search space, which gives a total of possible population vectors.
3 Bounds on Expected Proportions of Fit Individuals
In this section, our aim is to obtain lower and upper bounds on for arbitrary and if the distribution of the initial population is known.
3.1 Lower Bounds
Note that Eq. (5) turns into an equality in the case of level-based mutation and . We would like to use Eq. (5) recursively times in order to estimate for any , given the initial vector . It will be shown in the sequel that such a recursion is possible under monotonicity assumptions defined below.
Monotone Matrices and Mutation Operators. In what follows, any -matrix with elements , will be called monotone iff for all from 1 to . Monotonicity of a matrix of bounds on transition probabilities means that the greater fitness level a parent solution has, the greater is its bound on transition probability to any subset . Note that for any mutation operator, the monotone upper and lower bounds exist. Formally, for any mutation operator a valid monotone matrix of lower bounds would be where is a zero matrix. A monotone matrix of upper bounds, valid for any mutation operator is , where is the matrix with all elements equal 1. These are extreme and impractical examples. In reality, a problem may be connected with the absence of bounds which are sharp enough to evaluate the mutation operator properly.
If given some set of levels there exist two matrices of lower and upper bounds such that and these matrices are monotone then operator is called monotone with regards to the set of levels. In this article, we will also call such operators monotone for short. Informally speaking, in the case of monotone mutation the fitter parents have a higher probability of producing fit offspring. Note that by the definition, any monotone mutation operator is level-based, since for all . The following proposition shows how the monotonicity property may be equivalently defined in terms of cumulative transition probabilities.
Finally, note that lower bound Eq. (5) holds as an equality if the mutation operator is monotone and ; therefore, the last lower bound is an equality in the case of monotone and .
Lower Bounds from Linear Algebra. Let be a -matrix with elements let be the identity matrix of the same size, and denote . With these notations, Inequality (6) takes a short form Here and below, the inequality sign “” for some vectors and means the component-wise comparison, i.e. iff for all . The following theorem gives a component-wise lower bound on vector for any .
The proof of this theorem is similar to the well-known inductive proof of the formula for a sum of terms in a geometric series Note that the recursion is similar to the recursive formula assuming . However, in our case, matrices and vectors replace numbers, we have to deal with inequalities rather than equalities and the initial element may be nonzero unlike .
Let us consider a sequence of -dimensional vectors where , . We will show that for any , using induction on . Indeed, for the inequality holds by the definition of . Now note that the right-hand side of Inequality (6) will not increase if the components of are substituted with their lower bounds. Therefore, assuming we already have for some and substituting for we make an inductive step .
By properties of the linear operators (see, e.g., Kolmogorov and Fomin 1999, Chapter III, § 29), due to the assumption that we conclude that matrix exists.
In many evolutionary algorithms, an arbitrary given genotype may be produced with a nonzero probability as a result of mutation of any given genotype . Suppose that the probability of such a mutation is lower bounded by some for all . Then one can obviously choose some monotone matrix of lower bounds that satisfies for all . Thus, for all . In this case, one can consider the matrix norm . Due to the monotonicity of we have , so , and the conditions of Theorem 4 are satisfied. A trivial example of a matrix that satisfies the above description would be a matrix where all elements are equal to .
Application of Theorem 4 may be complicated due to difficulties in finding the vector and in estimation the effect of multiplication by matrix Some known results from linear algebra can help to solve these tasks, as the example in Subsection 5.2 shows. However, sometimes it is possible to obtain a lower bound for via analysis of the (1,1) EA algorithm, choosing an appropriate mutation operator for it. This approach is discussed below.
Note that Inequalities (7) and (8) in Theorems 4 and 5 turn into equalities if these theorems are applied to the EA with and monotone mutation operator defined above. Therefore, both theorems guarantee equal lower bounds on , given equal matrices .
3.2 Upper Bounds
Under the expectation in the right-hand side we have a convex function on . Therefore, in the case of monotone matrix , using Jensen's inequality (see, e.g., Rudin 1987, Chapter 3) we obtain the following proposition.
By means of iterative application of Inequality (13) the components of the expected population vectors may be bounded up to arbitrary , starting from the initial vector . The nonlinearity in the right-hand side of Inequality (13), however, creates an obstacle for obtaining an analytical result similar to the bounds of Theorems 4 and 5.
3.3 Comparison of EA to (1,) EA and (1+1) EA
This subsection shows how the probability of generating the optimal genotypes at a given iteration of the EA relates to analogous probabilities of (1,) EA and (1+1) EA. The analysis here will be based on upper bound Inequality (13) and on some previously known results provided in the appendix.
Suppose, matrix gives the upper bounds for cumulative transition probabilities of the mutation operator used in the EA. Consider the (1,) EA and the (1+1) EA, based on a monotone mutation operator for which is the matrix of cumulative transition probabilities and suppose that the initial solutions and have the same distribution over the fitness levels as the best incumbent solution in the EA population . Formally: for any and In what follows, for any by we denote the probability that current individual on iteration of the (1,) EA belongs to . Analogously denotes the probability for the (1+1) EA.
The following proposition is based on upper bound Inequality (13) and the results from Borisovsky (2001) and Borisovsky and Eremeev (2001) that allow us to compare the performance of the EA, the (1,) EA, and the (1+1) EA.
Let us compare the EA to the (1,) EA and to the (1+1) EA using the mutation and initialization procedures as described above. Theorem 16 (see the appendix) together with Proposition 1 imply that for all . Furthermore, Theorem 15 from Borisovsky and Eremeev (2001) (see the appendix) implies that for all . Using Proposition 6 and monotonicity of , we conclude that both claimed inequalities hold.
4 EA with Monotone Mutation Operator
In general, the population vectors are random values whose distributions depend on . To express this in the notation, let us denote the proportion of genotypes from in population by .
The following Lemma 8 and Theorem 9 based on this lemma indicate that in the case of monotone mutation, recursive application of the formula from right-hand side of upper bound Eq. (13) allows to compute the expected population vector of the infinite-population EA at any iteration .
The main step in the proof of Lemma 8 (i) will consist in showing that for a supplementary random variable the value of is upper-bounded by an arbitrary small . This step is made by splitting the range of into a “high-probability” area and a “low-probability” area in such a way that is at most in the “high-probability” area. An analogous technique is used, for example, in the proof of Lebesgue Theorem (see, e.g., Kolmogorov and Fomin, 1999, Chapter VII, Section 44).
From Eq. (14), we conclude that if statement (i) holds, then with the convergence of to will imply that . Thus, statement (ii) follows by induction on .
For any and the term of the sequence defined by Eq. (17) is nondecreasing in and in as well. With this in mind, we can expect that the components of population vector of the infinite-population EA will typically increase with the tournament size. Theorem 10 below gives a rigorous proof of this fact under some technical conditions on distributions of and .
Let the sequences and be defined as in Lemma 8, corresponding to tournament sizes and . By the above assumptions,
Furthermore, if we assume that for all holds and then analogously to Eq. (19) we get for all . Besides that, just as in the case of we get and So by induction we conclude that for all and all .
Finally, by claim (ii) of Lemma 8, for any and , given a sufficiently large , holds .
Informally speaking, Theorem 10 implies that in the case of the monotone mutation operator an optimal selection mechanism consists in setting which actually converts the EA into the (1,) EA.
5 Applications and Illustrative Examples
5.1 Examples of Monotone Mutation Operators
Let us consider two cases where the mutation is monotone and the matrices have a similar form.
First, we consider the simple fitness function . Suppose that the EA uses the bitwise mutation operator, changing every gene with a given probability , independently of the other genes. Let the subsets be defined by the level lines and . The matrix for this operator could be obtained using the result from Bäck (1992), but here we shall consider this example as a special case of a more general setting.
Let the representation of the problem admit a decomposition of the genotype string into nonoverlapping substrings (called blocks here) in such a way that the fitness function equals the number of blocks for which a certain property holds. The functions of this type belong to the class of additively decomposed functions, where the elementary functions are Boolean and substrings are non-overlapping (see, e.g., Mühlenbein et al., 1999). Let if holds for the block of genotype , and otherwise (here ).
Now matrix for the bitwise mutation on OneMax function is obtained assuming that and . This operator is monotone in view of the above mentioned result, if , since in this case . The monotonicity of bitwise mutation on OneMax is used in works of Doerr et al. (2010) and Witt (2013).
Expression (20) may be also used for finding the cumulative transition matrices of some other optimization problems with a regular structure. As an example, below we consider the vertex cover problem (VCP) on graphs of a special structure.
In general, the vertex cover problem is formulated as follows. Let be a graph with a set of vertices and the edge set where . A subset is called a vertex cover of if every edge has at least one endpoint in . The vertex cover problem is to find a vertex cover of minimal cardinality.
Suppose that the VCP is handled by the EA with the following representation: each gene corresponds to an edge of , assigning one of its endpoints which has to be included in the cover . To be specific, we can assume that means that and means that . The vertices, not assigned by one of the chosen endpoints, do not belong to . On one hand, this edge-based representation is degenerate in the sense that one vertex cover may be encoded by different genotypes . On the other hand, any genotype defines a feasible cover . A natural way to choose the fitness function in the case of this representation is to assume .
Note that most publications on evolutionary algorithms for VCP use the vertex-based representation with genes, where implies inclusion of vertex into (see, e.g., Neumann and Witt, 2010, § 12.1). In contrast to the edge-based representation, the vertex-based representation is not degenerate but some genotypes in this representation may define infeasible solutions.
Following Saiko (1989) we denote by the graph consisting of disconnected triangle subgraphs. Each triangle is covered optimally by two vertices and the redundant cover consists of three vertices. In spite of simplicity of this problem, it is proven in Saiko (1989) that some well-known algorithms of branch and bound type require exponential in number of iterations if applied to the VCP on graph .
In the case of , the fitness coincides with the number of optimally covered triangles in (i.e., triangles where only two different vertices are chosen), since covering nonoptimally all triangles gives and each optimally covered triangle decreases the size of the cover by one. Let the genes representing the same triangle constitute a single block, and let the property imply that a triangle is optimally covered. Then by looking at the two possible ways to produce a gene triplet that redundantly covers a triangle, (i) given a redundant triangle and (ii) given an optimally covered triangle, we conclude that (i) and (ii) . Using Expression (20) we obtain the cumulative transition matrix for this mutation operator. It is easy to verify that in this case the inequality holds for any mutation probability , and therefore the operator is always monotone.
The experimental results are shown in dashed lines. The solid lines correspond to the lower and upper bounds given by Expressions (7) and (13). The plot shows that upper bound Expression (13) gives a good approximation to the value of even if the population size is not large. The lower bound Expression (7) coincides with the experimental results when , up to a minor sampling error.
5.2 Lower Bound for Randomized Local Search on Unimodal Functions.
First of all, let us describe the RLS algorithm which will be implicitly studied in this subsection. At each iteration of RLS the current genotype is stored. In the beginning of RLS execution, is initialized with some probability distribution (e.g., uniformly over ). An iteration of RLS consists in building an offspring of by flipping exactly one randomly chosen bit in . If , then is replaced by the new genotype . The process continues until some termination condition is met.
Below we will illustrate the usage of Theorem 4 on the class of -Unimodal functions. In this class, each function has exactly distinctive fitness values , and each solution in the search space is either optimal or its fitness may be improved by flipping a single bit. Naturally we assume that and that level consists of optimal solutions.
As a mutation operator in the EA we will use a routine denoted by : given a genotype , this routine first changes one randomly chosen gene and if this modification improves the genotype fitness, then outputs the modified genotype, otherwise outputs the genotype unchanged. Note that in the case of the EA with mutation becomes a version of RLS. The lower bounds from Section 3 are tight for (which implies