We present a general method for analyzing the runtime of parallel evolutionary algorithms with spatially structured populations. Based on the fitness-level method, it yields upper bounds on the expected parallel runtime. This allows for a rigorous estimate of the speedup gained by parallelization. Tailored results are given for common migration topologies: ring graphs, torus graphs, hypercubes, and the complete graph. Example applications for pseudo-Boolean optimization show that our method is easy to apply and that it gives powerful results. In our examples the performance guarantees improve with the density of the topology. Surprisingly, even sparse topologies such as ring graphs lead to a significant speedup for many functions while not increasing the total number of function evaluations by more than a constant factor. We also identify which number of processors lead to the best guaranteed speedups, thus giving hints on how to parameterize parallel evolutionary algorithms.
Due to the increasing number of CPU cores, exploiting possible speedups by parallel computations is currently more important than ever. Parallel evolutionary algorithms (EAs) form a popular class of heuristics with many applications to computationally expensive problems (Luque and Alba, 2011; Nedjah et al., 2006; Tomassini, 2005). This includes island models, also called distributed EAs, multi-deme EAs, or coarse-grained EAs. Evolution is parallelized by evolving subpopulations, called islands, on different processors. Individuals are periodically exchanged in a process called migration, where selected individuals, or copies of these, are sent to other islands, according to a migration topology that determines which islands are neighboring. More fine-grained models are also known, where neighboring subpopulations communicate in every generation, first and foremost in cellular EAs (Tomassini, 2005).
By restricting the flow of information through spatial structures and/or infrequent communication, diversity in the whole system is increased. Researchers and practitioners frequently report that parallel EAs speed up the computation time, and at the same time lead to a better solution quality (Luque and Alba, 2011).
Despite these successes, a long history (Cantú Paz, 1997) and very active research in this area (Alba, 2005; Luque and Alba, 2011; Rudolph, 2006), the theoretical foundation of parallel EAs is still in its infancy. The impact of even the most basic parameters on performance is not well understood (Skolicki and De Jong, 2005). Past and present research is mostly empirical, and a solid theoretical foundation is missing. Theoretical studies are mostly limited to artificial settings. In the study of takeover times, an important question is how long it takes for a single optimum to spread throughout the whole parallel EA, if the EA uses selection and migration but neither mutation nor crossover (Rudolph, 2000, 2006). This gives a useful indicator for the speed at which communication is spread, but it does not give any formal results about the runtime of evolutionary algorithms with mutation and/or crossover.
One way of gaining insight into the capabilities and limitations of parallel EAs is by means of rigorous runtime analysis (He and Yao, 2003; Wegener, 2002). By asymptotic bounds on the runtime, it is possible to compare different implementations of parallel EAs and assess the speedup gained by parallelization in a rigorous manner. Many runtime analyses have been presented (Auger and Doerr, 2011; Jansen, 2013; Neumann and Witt, 2010; Oliveto, He, and Yao, 2007), from simple pseudo-Boolean test functions (Droste et al., 2002) to NP-hard problems from combinatorial optimization (Friedrich et al., 2010; Horoba, 2010; Witt, 2005; Yu et al., 2012).
Lässig and Sudholt (2010a) presented the first runtime analysis of a parallel evolutionary algorithm with a nontrivial migration topology. It was demonstrated for a constructed problem that migration is essential in the following way. A suitably parameterized island model with migration has a polynomial runtime; whereas the same model without migration as well as comparable panmictic populations1 need exponential time, with overwhelming probability. Neumann, Oliveto, Rudolph, and Sudholt (2011) presented a similar result for island models using crossover. If islands perform crossover with immigrants during migration, this can drastically speed up optimization. This was demonstrated for a pseudo-Boolean example as well as for instances of the VertexCover problem (Neumann et al., 2011).
In this work, we take a broader view and consider the speedup gained by parallelization in terms of the number of generations, for various common pseudo-Boolean functions and function classes of varying difficulty. A general method is presented for proving upper bounds on the parallel runtime of parallel EAs. The latter is defined as the number of generations of the parallel EA until a global optimum is found for the first time. This allows us to estimate the speedup gained by parallelization, defined as the ratio of the expected parallel runtime of a single island and the expected runtime of an island model with multiple islands (see Section 2 for formal definitions). It also can be used to determine how to choose the number of islands such that the best possible upper bounds on the parallel runtime are obtained, while still maintaining an asymptotically optimal speedup.
Our method is based on the fitness-level method or method of f-based partitions, a simple and well-known tool for the analysis of evolutionary algorithms (Droste et al., 2002; Wegener, 2002). The main idea of this method is to divide the search space into sets , strictly ordered according to fitness values of elements therein. Elitist EAs, that is, EAs where the best fitness value in the population can never decrease, can only increase their current best fitness. If, for each set Ai we know a lower bound si on the probability that an elitist EA finds an improvement, that is, for finding a new search point in a new best fitness-level set , this gives rise to an upper bound on the expected runtime. The method is described in more detail in Section 2.
In Section 3 we first derive a general upper bound for parallel EAs, based on fitness levels. Our general method is then tailored toward different spatial structures often used in fine-grained or cellular evolutionary algorithms and parallel architectures in general: ring graphs (Theorem 8 in Section 4), torus graphs (Theorem 10 in Section 5), hypercubes (Theorem 12 in Section 6), and complete graphs (Theorems 14 and 17 in Section 7).
The only assumption made is that islands run elitist algorithms, and that in each generation each island has a chance of transmitting individuals from its best current fitness level to each neighboring island, independent with probability at least p. We call the latter the transmission probability. It can be used to model various stochastic effects such as disruptive variation operators, the impact of selection operators, probabilistic migration, probabilistic emigration and immigration policies, and transient faults in the network. This renders our method widely applicable to a broad range of settings.
1.1 Main Results
Our estimates of parallel runtimes from Theorems 8, 10, 12, 14, and 17 are summarized in the following theorem, hence characterizing our main results. Throughout this work always denotes the number of islands.
Consider an island model with islands where each island runs an elitist EA. For each island let there be a fitness-based partition such that for all all points in Ai have a strictly worse fitness than all points in Ai+1, and Am contains all global optima. We say that an island is in Ai if the best search point on the island is in Ai. Let si be a lower bound for the probability that in one generation a fixed island in Ai finds a search point in.
Further, assume that for each edge in the migration topology in every iteration there is a probability of at least p that the following holds, independently from other edges and for all . If the source island is in Ai then after the generation the target island is in . Then the expected parallel runtime of the island model is bounded from above by
for every ring graph or any other strongly connected2 topology (Theorem 8),
for every undirected grid or torus graph whose side lengths are at least in both directions (Theorem 10),
for the -dimensional hypercube graph (Theorem 12),
for the complete topology , as well as (Theorems 14 and 17).
A remarkable feature of our method is that it can automatically transfer upper bounds for panmictic EAs to parallel versions thereof. The only requirement is that bounds on panmictic EAs must be derived using the fitness-level method, and that the partition and the probabilities for improvements used therein are known. Then the expected parallel time of the corresponding island model can be estimated for all mentioned topologies simply by plugging the si into Theorem 1. Fortunately, many published runtime analyses use the fitness-level method—either explicitly or implicitly—and the mentioned details are often stated or easy to derive. Hence even researchers with limited expertise in runtime analysis can easily reuse previous analyses to study parallel EAs.
Further, note that we can easily determine which choice of , the number of islands, will give an upper bound of order —the best upper bound we can hope for, using the fitness-level method. In all bounds from Theorem 1 we have a first term that varies with the topology and p, and a second term that is always . The first term reflects how quickly information about good fitness levels is spread throughout the island model. Choosing such that the second term becomes asymptotically as large as the first one, or larger, we get an upper bound of . For settings where is an asymptotically tight upper bound for a single island, this corresponds to an asymptotic linear speedup. The maximum feasible value for depends on the problem, the topology, and the transmission probability p.
We give simple examples that demonstrate how our method can be applied. Our examples are from pseudo-Boolean optimization, but the method works in any setting where the fitness-level method is applicable. The simple (1+1) EA is used on each island (see Section 2 for details). Table 1 summarizes the resulting runtime bounds for the considered algorithms and problem classes. For simplicity we assume p=1; a more detailed table for general transmission probabilities is presented in the appendix, see Table 3. The number of islands was chosen as explained above: to give the smallest possible parallel runtime, while not increasing the sequential time, asymptotically. The table also shows the expected communication effort, defined as the total number of individuals migrated throughout the run. Details are given in Theorems 8, 10, 12, 14, and 17. Bounds on the expected communication effort follow easily from bounds on the parallel runtime using Theorem 2. The functions used in this table are explained in Section 2. Table 3 in the appendix shows all our results for a variable number of islands and variable transmission probabilities p.
|.||(1+1) EA .||Ring .||Grid/torus .||Hypercube .||Complete .|
|.||(1+1) EA .||Ring .||Grid/torus .||Hypercube .||Complete .|
The method has already found a number of applications and it spawned a number of follow-up papers. After the preliminary version of this work (Lässig and Sudholt, 2010b) was presented, the authors applied it for various problems from combinatorial optimization: the sorting problem (as maximizing sortedness), finding shortest paths in graphs, and Eulerian cycles (Lässig and Sudholt, 2011b). Very recently, Mambrini, Sudholt, and Yao (2012) also used it for studying how quickly island models find good approximations for the NP-hard SetCover problem. This work has also led to the discovery of simple adaptive schemes for changing the number of islands dynamically throughout the run, see Lässig and Sudholt (2011a). These schemes lead to near-optimal parallel runtimes, while asymptotically not increasing the sequential runtime on many examples (Lässig and Sudholt, 2011a). These schemes are tailored towards island models with complete topologies, which includes offspring populations as special case. The study of offspring populations in comma strategies is another recent development that was inspired by this work (Rowe and Sudholt, 2012).
Many common definitions of speedup consider the wall-clock time, where the differences in Execmigr play a role. Here speedups are considered with regard to the number of generations only, ignoring the differences in Execmigr. This makes sense in setting where ; for instance, when fitness evaluations are so expensive that they dominate the execution time. Otherwise, the speedups stated here may be optimistic as the overhead induced by migration is ignored. Note, however, that with additional information about Execgen and Execmigr, and Equation (1), the results of the current paper easily extend to more sophisticated notions of speedups.
To get a more complete picture of the resources used in a parallel system and to take into account the overhead by communication, this paper also considers the communication effort Tcom. It is defined as the total number of individuals migrated to other islands during the course of a run. The communication effort therefore captures the total bandwidth used during a run of an island model. It represents an important factor for determining the performance of a parallel EA, alongside the parallel runtime.
The expected communication effort is a multiple of the parallel expected runtime, with the factor depending on the number of (directed) edges in the topology, the transmission probability p and the number of individuals migrated in each migration event.
The following theorem lists various topologies: a unidirectional ring is a graph consisting of a single directed cycle, whereas a directed ring has undirected edges. Note that an undirected edge can be regarded as two directed edges. In a torus graph, all vertices are arranged on a two-dimensional grid, with undirected edges wrapping around (vertices in the top row are neighbors to the ones in the bottom row and vice versa, similarly for the leftmost and rightmost columns). Each vertex in a torus thus has four distinct neighbors, provided that the torus has at least three rows and at least three columns. Hypercubes are formally defined in Section 6, and the complete graph contains undirected edges between all pairs of nodes.
for a unidirectional ring andfor a bidirectional ring,
for any torus graph where both sides have length at least three,
for the-dimensional hypercube, and
for the complete graph.
Hence to estimate the expected communication effort it suffices to analyze the expected parallel runtime.
Our method for proving upper bounds is based on the fitness-level method (Droste et al., 2002; Wegener, 2002). The idea is to partition the search space into sets called fitness levels that are ordered with respect to fitness values. We say that an algorithm is in Ai or on level i if the current best individual in the population is in Ai. An evolutionary algorithm where the best fitness value in the population can never decrease (called an elitist EA) can only improve the current fitness level. If one can derive lower bounds on the probability of leaving a specific fitness level toward higher levels, this yields an upper bound on the expected runtime.
In contrast to other methods such as drift analysis (He and Yao, 2004; Johannsen, 2010), the fitness-level method is applicable in cases where “easy” and “hard” fitness levels are mixed up, so that the progress toward the optimum cannot reasonably be bounded by a closed formula.
The fitness-level method has also been applied to other elitist optimization methods, including elitist ant colony optimizers (Gutjahr and Sebastiani, 2008; Neumann et al., 2009) and a binary particle swarm optimizer (Sudholt and Witt, 2010). It gives rise to powerful tail inequalities (Zhou et al., 2012) and it can be used to prove lower bounds as well, when combined with additional knowledge on transition probabilities (Sudholt, 2013). Finally, Lehre (2011) recently showed that the fitness-level method can be extended toward nonelitist EAs with additional mild conditions on transition probabilities and the population size.
Note that the method only requires a finite number of fitness-level sets, not a finite set of fitness values. In principle, the method can be applied to continuous fitness functions as well, provided a suitable discretization is made and the goal is to find the best fitness-level set. However, this might not be the most practical approach.
In the following, the fitness-level method is applied to parallel EAs. For the considered EAs the assumption of this work is that there is a migration topology, given by a directed graph. Islands represent vertices of the topology and directed edges indicate neighborhoods between the islands. We often describe undirected graphs for use as migration topology, understanding that for an undirected edge we have two directed edges (u, v) and (v, u). In other words, though formally the migration topology is a directed graph, we often use the language of undirected graphs to describe it.
Our methods for proving upper bounds require that the islands run elitist evolutionary algorithms. All islands create new offspring independently by mutation and/or recombination among individuals in the island. In every generation there is a chance that migration will send an individual on the current best fitness level to some target island, and that this individual will be included on the target island. This would effectively increase the fitness level of the target island to the current best level (or an even better one). For every pair of connected islands, we call this probability transmission probability and denote it p. Note that for any pair of islands, the mentioned transmission events are independent.
The transmission probability can model various settings, where randomness and stochasticity may be involved:
Migrations do not take place in every generation, but only probabilistically with probability p,
Islands do not automatically select individuals on the best fitness level for emigration, but there is a probability of at least p that this happens,
Similarly, islands do not automatically include immigrants on higher fitness levels, but only with probability of at least p,
During migration crossover is performed, and p is a lower bound on the probability that crossover does not disrupt the fitness of an individual on a current best fitness level (if a crossover probability pc is used, then clearly ), and
The physical architecture suffers from transient faults and p is a lower bound on the probability that migration is executed correctly.
Of course, the transmission probability can also model any combination of the above, in which case the product of all above probabilities gives a lower bound on the transmission probability.
Most of our results also apply when instead of probabilistic migration a fixed migration interval is used. This is similar to a migration probability ; in fact, it can be regarded as a derandomized or quasirandom version of probabilistic migration. With a fixed migration interval the variance in the information propagation is reduced, and all islands operate in synchronicity. Probabilistic migrations are asynchronous; this simplifies the analysis as we do not need to keep track on how much time has passed since the last migration. We expect our results for probabilistic migration to transfer to the study of migration intervals. The only notable exception is the case of a complete topology, when the migration probability is rather small (Theorem 17), as there, synchronous and asynchronous migrations lead to different effects.
3 Proving Upper Bounds for Parallel EAs
3.1 A General Upper Bound
This section covers how to prove upper bounds on the runtime of parallel EAs. In contrast to panmictic EAs, in an island model several islands might participate in the search for improvements from the current-best fitness level. The number of islands may vary over time according to the spread of information.
The following theorem transfers upper bounds for panmictic EAs derived by the fitness-level method into upper bounds for parallel EAs in a systematic way.
The upper bound from Theorem 4 is very general as it does not restrict the communication among the islands in any way. These aspects are hidden in the definition of the variables . When looking at one particular fitness level, say level i, we also speak of islands being informed if and only if they contain an individual on level i. The variable then gives the number of informed islands t generations after the first island has become informed by reaching level i.
The spread of information obviously depends on the migration topology, the migration interval, and the selection strategies used to choose migrants that are sent and how migrants are included in the population. The basic method works for all choices of these design aspects. We elaborate on these aspects and then move on to more specific scenarios where we can obtain more concrete results.
3.2 How to Deal with Migration Intervals
With a migration interval of , the value remains fixed for periods of generations, unless further islands are raised toward the current best fitness level by variation. If we pessimistically ignore this effect, then we have for appropriate t that . In any case, we have , hence the sum of values is at least . This implies the following simplified upper bound.
The values can be estimated like the values in a setting with . In order to keep the presentation simple, in the following applications we only consider the case that , that is, migration happens in every generation. This reflects common principles used in fine-grained or cellular evolutionary algorithms. The following considerations can always be combined with the above arguments to handle migration intervals larger than 1.
3.3 Stochastic Communication and Finding Improvements
In order to arrive at more concrete bounds on the parallel runtime for common migration topologies, it is important to understand how the number of informed islands grows on each fitness level, that is, the growth curves underlying the variables. Note that these variables are random variables in all settings where we have a transmission probability less than 1. This means that getting a closed formula for the expected parallel runtime is not easy. In Theorem 4 we cannot simply replace the variables by their expectations as by Jensen's (1906) inequality this would yield an estimation in the wrong direction (i.e., it would give a lower bound where an upper bound is needed). More work is required in order to arrive at closed formulas for common topologies.
Instead of arguing with the random number of informed islands, it is easier to argue with expected hitting times for the time until a specified number of islands is informed. If we know such expected hitting times, or upper bounds thereof, the time until the parallel EA finds a better fitness level can be estimated.
A good choice for k is one where as this is likely to minimize the bound from Lemma 6, at least asymptotically.
The lemma ignores the fact that during the first generations islands can already find improvements. It also ignores that the number of informed islands might grow beyond k after this time. However, we will see that for appropriate choices of k, the lemma can still give near-optimal results. In the first generations, the number of informed islands is likely to be too small to yield a significant benefit. In addition, after k islands have been informed, this number is large enough to guarantee that improvements are found quickly, for appropriate k.
3.4 Information Propagation in Networks
The task remains to estimate the first hitting time for informing a certain number of vertices. Note that this is similar to studying growth curves and takeover times. In fact, is the expected time until the whole island model is informed. Growth curves and takeover times have been studied in artificial settings where no variation takes place (see Alba and Luque, 2004; Giacobini, Alba, et al., 2005; Giacobini, Alba, et al., 2003; Giacobini, Tomassini, et al., 2003; Giacobini, Tomassini, et al., 2005; Rudolph, 2000, 2006; Sarma and De Jong, 1997) or recent surveys (Luque and Alba, 2011, Chapter 4; Sudholt, 2013).
In the following, we refer to our model of transmission probabilities as it is a general model that captures many stochastic components in the dynamic behavior of island models. But at the same time, it is simple enough to allow for a theoretical analysis.
Interestingly, the same probabilistic process also underlies the way randomized search heuristics find shortest paths in weighted undirected graphs. Doerr et al. (2007, 2011) showed that the (1+1) EA can find shortest paths in graphs by simulating the Bellman-Ford algorithm. The task is to find shortest paths from a source to all other vertices. For vertices whose shortest paths have few edges, shortest paths are found quickly. In our terminology, these vertices would be called informed. If u is informed and the graph contains an edge , then v can become informed with a fixed probability during a lucky mutation, if the shortest path from to v contains u. This way, shortest paths propagate through the graph in the same fashion as information does. The same can be observed for ant colony optimizers (Sudholt and Thyssen, 2012).
Doerr et al. (2007) independently used a different argument for bounding the expected propagation time. Fix a shortest path in the graph, leading from to some fixed vertex v, then in every generation there is a chance of informing the first uninformed vertex on the path, until eventually the information reaches v. If the path has at least log n edges, the time until v is informed is highly concentrated. Using tail bounds, the probability of significantly exceeding the expectation is very small. This allows us to apply a union bound for all considered vertices v.
Following the proof of Doerr et al. (2007, Lemma 3), we get the following lemma. An advantage over the general bound from Rowe et al. (2008) is that it not only bounds the propagation time for the whole network; it also bounds expected hitting times for informing smaller numbers of vertices.
The first claim follows from the proof of Lemma 3 in Doerr et al. (2007) and the fact that for .
The next section will analyze parallel EAs with concrete topologies.
4 Parallel EAs with Ring Structures
Ring graphs are the starting point as they are often used as topologies (Tomassini, 2005). Rings can either be unidirectional, in which case there is exactly one directed cycle, or bidirectional, when all edges are undirected. The following theorem holds for both kinds of graphs, and in fact for all strongly connected graphs. Recall that a directed graph is called strongly connected if for every two vertices u, v there is a directed path from u to v (implying that there is also a path from v to u).
The shape of this formula deserves some explanation. The second term is by a factor of smaller than the upper bound for a single island by Theorem 3. If the latter is asymptotically tight, the second term in Theorem 8, regarded in isolation, would give a perfect linear speedup. The first term is related to the speed at which information is propagated; it reflects the time needed to bring a reasonably large number of islands to the current best fitness level. Unlike for the second term, it is independent of , but it depends on the transmission probability p. We do have a linear speedup if the first term asymptotically does not grow faster than the second term, again assuming that the bound for a single island is tight.
As grows, the second term becomes smaller, while the first term remains fixed. Thus, if we have a linear speedup for small , there is a point where with growing the linear speedup disappears. This threshold can be easily computed by checking which value of gives rise to the first and second terms being of equal asymptotic order. As will be seen in the next sections, the same also holds for other migration topologies.
For the unidirectional ring we have since a new island is informed with probability at least p. As this happens independently in each generation, the expected wait time until this happens is at most 1/p. In fact, this argument holds for all strongly connected topologies and in particular for the bidirectional ring.
The claim on the expected communication effort follows from Theorem 2.
As remarked in the proof, the bound from Theorem 8 holds for arbitrary strongly connected topologies as the unidirectional ring is a worst case for the values. Along with Theorem 2, this also gives a general upper bound on the expected communication effort for any strongly connected topology.
For bidirectional rings, we have . This can be seen from applying Johannsen's drift theorem, stated in the appendix as Theorem 20, applied to the difference between k and the current number of informed vertices. If there is more than one uninformed vertex, there are always at least two vertices neighboring to informed ones. The number of informed vertices then increases by 2p in expectation. This means that we can use h(1)=p and h(x)=2p for x>1 as the drift function. This decreases the constant 2 in the first term toward , at the expense of an additional term m−1. In some settings this upper bound may be better than the upper bound from Theorem 8 for unidirectional rings; where it is not, we may still use the latter for bidirectional rings as Theorem 8 applies to all strongly connected topologies.
Also note that if p<si then the trivial bound 1/si gives a better estimate for the time that this fitness level remains. If this holds for all fitness levels, information is propagated too slowly and our method does not give any provable speedups for the parallel model.
Contrarily, if, say, , compared to a single island in a ring, the expected wait time for every fitness level can be replaced by its square root. This can yield significant speedups. We make this precise for concrete functions in the following theorem. For comparing these times with runtime bounds for the (1+1) EA, we refer to Table 1.
The following holds for the parallel (1+1) EA with transmission probability at least p on a unidirectional or bidirectional ring (or any other strongly connected topology):
for every unimodal function with d+1 function values,
The speedups obtained are indeed significant, particularly for those functions where improvements are hard to find.
5 Parallel EAs with Two-Dimensional Grids and Tori
For two-dimensional grids and tori, Theorem 4 is adapted in a similar manner, in order to get the best possible leading constant in the first term of the runtime bound. We also consider applications of the resulting theorem similar to the applications for ring graphs.
The claim on the expected communication effort follows from Theorem 2.
Note that the communication effort in one generation is asymptotically as large as for ring graphs, but for large p the upper bound on the parallel runtime is generally smaller (or asymptotically equal, in the case where upper bounds are dominated by the term ). If p<3si, then again the trivial upper bound 1/si is better, because then the spread of information is too slow.
Compared to a single island, in a torus, the expected wait time for every fitness level can be replaced by its third root. This leads to improved upper bounds for unimodal functions and Jumpk.
The following holds for the parallel (1+1) EA with transmission probability p on a grid or torus topology whose side lengths are at least in both directions:
for every unimodal function with d+1 function values,
for Jumpk with.
We choose the same partitions as in the proof of Theorem 9. Note that the second terms in Theorem 8 and 10 are identical, so we only estimate the first terms and refer to Theorem 9 for the second terms.
6 Parallel EAs with Hypercube Graphs
Hypercube graphs are popular topologies in parallel computation. In a d-dimensional hypercube, each vertex has a label of d bits. Two vertices are neighboring if and only if their labels differ in exactly one bit. The number of vertices is then 2d, and each vertex has d neighbors. The diameter of a d-dimensional hypercube is d; hence, only logarithmic in the size of the graph. The small diameter implies that in many communication models information is spread rapidly, even though the degree of vertices is quite small. With regard to the propagation process investigated here, we get a small first term in the following runtime bound, and still have a very moderate communication effort.
The claim on the expected communication effort follows from Theorem 2.
The results for the example applications are as follows.
The following holds for the parallel (1+1) EA with transmission probability p on a -dimensional hypercube:
for every unimodal function with d+1 function values,
for Jumpk with.
If , linear speedups arise for OneMax if , and linear speedups for unimodal functions where the bound O(dn) for a single island is tight, if . For Jumpk, if k=O(n/log n), can be chosen to get a linear speedup. Note from Table 1 that the upper bounds on the expected parallel times for LO and Jumpk are much better for the hypercube than for rings and torus graphs, if p is large.
7 Parallel EAs with Complete Topologies
Finally, the densest topology is considered, the complete graph , where every island is a neighbor to every other island. The complete graph is interesting because it represents an extreme case: the largest possible communication effort with regard to one generation, but also the fastest possible spread of information.
For the special case of p=1, a parallel (1+1) EA is basically equivalent to a () EA, which creates offspring independently and then compares a best offspring against the current search point. The only difference is that the parallel (1+1) EA can store different individuals of the same fitness. However, this issue is irrelevant when using the fitness-level method. Hence, the results for a parallel (1+1) EA with a complete topology and p=1 also apply to the (1 ) EA. For p<1, the two models are generally different.
To state a simple argument, clearly, if there is at least one informed island, each other island will become informed with a probability of at least p.
The expected time is estimated until at least islands are informed after an improvement. If more than islands are uninformed, the expected number of islands that become informed in one generation is at least . By standard drift analysis arguments (He and Yao, 2004) the desired expectation is bounded from above by 2/p.
The claim on the expected communication effort follows from Theorem 2.
As mentioned previously, the complete graph leads to a maximal spread of information. In addition, the best upper bounds for the considered function classes are found. However, a maximum amount of migration takes place in each generation, so the expected total communication effort is also highest (cf. Tables 1 and 3; 3 is shown in the appendix).
Let . The following holds for the expected parallel runtime of the parallel (1+1) EA with topology . In the case where p=1, the same holds for the (1+) EA:
for every unimodal function with d+1 function values, and
for Jumpk with.
The proof is obvious by now.
The term 2/p for the time until at least islands are informed is a reasonable estimate if p is large (e.g., ). However, for small p, this estimation is quite loose as we have completely neglected that all informed vertices have a chance to inform other islands.
This work, therefore, also presents a more detailed analysis for small p. The motivation for studying complete graphs and small p is that it captures random migration policies. Assume that each island decides randomly with probability p for each other island whether to migrate individuals to that island. Then this can be regarded as a complete topology with transmission probability p.
Values around seem particularly interesting as then in each generation one migration takes place for each island in expectation. In fact, there are different results for and .
The claim is obvious for , so is assumed in the following. Let Xt denote the random number of informed vertices after t iterations. First, the expected time until at least vertices become informed is estimated, and then an estimate is made of how long it takes to get from informed vertices to informed ones.
Combining Lemma 16 with Lemma 6 gives the following. Apart from an additive term m, the case of yields a bound where the first term is smaller by a factor on the order of . For fairly large transmission probabilities, , in the first term we have replaced the factor 1/p by . These improvements reflect that the complete graph can spread information much more quickly than previously estimated in the proof of Theorem 14.
For the example applications, the refinements in Theorem 17 result in the following refined bounds. As there are only improved upper bounds for , it is not necessary to mention the special case of the (1+) EA with p=1.
Let. The following holds for the expected parallel runtime of the parallel (1+1) EA with topology:
for OneMax ifandotherwise,
for unimodal functions with d+1 values, if, andotherwise, and
for Jumpk with, ifandotherwise.
In order to complement the analytical results above, this paper now gives experimental results on the behavior of island models for different topologies. As a detailed experimental evaluation is beyond the scope of this paper, only illustrative results for the two functions OneMax and LO are shown.
The first investigation is of the parallel runtime Tpar and the communication effort for different transmission probabilities. The experiments were repeated 100 times per data point for the parallel (1+1) EA with islands and an instance size of n=256 for both example functions, varying the transmission probability p in steps of 0.01. Figure 1 shows the behavior for the topologies K64, a bidirectional ring graph, an torus graph, and a 6-dimensional hypercube.
Looking at the influence of the transmission probability on the runtime, a higher transmission probability improves the runtime behavior of the algorithm, also according to the expectations from the theoretical analysis. In particular, all values p that are not too small lead to much smaller runtimes compared to the pathological setting p=0, where there is no communication, but independent runs of the (1+1) EA. This demonstrates for our functions that parallelization and migration can lead to drastic speedups. For larger or intermediate values for p, the parallel runtime does not vary much, as then for all topologies the runtime is dominated by the second terms from our bounds: and for OneMax and LO, respectively.
Comparing the behavior of those topologies, we see that the parallel runtime indeed depends on the density of the topology, that is, more dense topologies spread information more efficiently, which results in a faster convergence. As expected, the topology performs best, and the ring graph performs worst.
Two-sided Mann-Whitney U tests were used on the data from Figure 1(a) and 1(b), along with a comparison of mean ranks, to make pairwise comparisons between the topologies concerning the parallel runtime. Separate tests were performed for each individual data point (e.g., each tested transition probability), as this illuminates in which settings one topology is better than another.3 For both OneMax and LO and all transmission probabilities of at least 0.01, the outcome is that hypercube < torus < ring on a significance level of 0.001.
Looking at the communication effort, Figure 1(c) and 1(d), it seems that it is larger for more dense topologies, as expected. Hence, although the topology shows the best runtime behavior, the communication effort is highest for all transmission probabilities. Interestingly, the communication effort is about the same for the other three topologies. This is in particular the case for LO, that is, although, for example, the ring graph is more sparse, its parallel runtime is higher, so that the communication effort remains similar to the hypercube and the torus graph.
Applying the Mann-Whitney U test for the communication effort and OneMax yields the result that ring < torus < hypercube < for all transmission probabilities at least 0.01 on a significance level of 0.001. Looking at LO, the level of significance is slightly less when comparing ring and torus. For the communication effort and LO, the relation ring < torus for a transmission probability of at least 0.01 only holds on a significance level of 0.05. Additionally, there are three exceptions: for transmission probabilities 0.13, 0.14, and 0.17, the results were not significant.
Again, the instance size of the benchmark functions was set to n=256 and the number of islands was chosen from 1 to 64. Only square torus graphs were used. Therefore, our torus graphs and hypercubes are only defined for square numbers and powers of 2, respectively, leading to fewer data points.
For lower numbers of islands, the efficiency of the algorithm is better than for larger numbers of islands. This is expected, as a single (1+1) EA, that is, the setting with minimizes the number of function evaluations for both OneMax and LO (Sudholt, 2013), among all EAs that only use standard bit mutation. This excludes superlinear speedups on OneMax and LO, for such EAs, from a theoretical perspective.
It can be seen that more dense topologies are more efficient than sparse topologies. In accordance with the theoretical analyses, the efficiency decreases more rapidly for OneMax. For OneMax, and , only values were guaranteed to give a linear speedup. And indeed, the efficiency in Figure 2(c) degrades quite quickly for OneMax and p=1.0.
Higher numbers of islands are still efficient for LO. For the ring, the range of good values is up to . This is reflected in Figure 2(d) as the efficiency degrades as increases beyond . For denser topologies, the efficiency only degrades for large . The complete graph remains effective throughout the whole scale—and even stronger, for values up to (not shown in Figure 2) the efficiency was always above 0.75. This was also expected as still guarantees a linear speedup for LO.
Comparing the runtime behavior for different transmission probabilities, the plots confirm again that in our examples a higher transmission probability for individuals allows for a better overall performance.
Also, for the results in Figure 2, pairwise two-sided Mann-Whitney U tests were performed as before. For the efficiency, the mean ranks in each setup indicate the ordering of ring < torus < hypercube < , but with different levels of significance. In Table 2 we list those numbers of islands for which the pairwise comparisons are statistically significant at a significance level of 0.001. Note that torus and hypercube by definition only share very few data points (squares of powers of two): . For very small values of , the results are not significant, as the topologies are too similar and hence show indistinguishable performance. But for larger topologies, that is, , all comparisons are indeed significant at a very low level of 0.001.
|.||.||Torus .||Hypercube .||Complete .|
|.||.||Torus .||Hypercube .||Complete .|
We have provided a general method for the runtime analysis of parallel evolutionary algorithms, including applications to a set of well-known and illustrative example functions. Our method provides a way of automatically transforming runtime bounds obtained for panmictic EAs to parallel EAs with spatial structures. In addition to a general result, we have provided methods tailored toward specific topologies: ring graphs, torus graphs, hypercubes, and complete graphs. The latter also covers offspring populations and random migration topologies as special cases. Our results can estimate the expected parallel runtime from above, thus lower-bounding the speedup obtained through parallelization with regard to the number of generations. They also bound the expected total communication effort in terms of the total number of individuals migrated as an indicator of the bandwidth used.
The example applications revealed insights that are remarkable in their own right (see Table 1, and a more general version in Table 3 in the appendix). Compared to upper bounds obtained for a single panmictic island by the fitness-level method, for ring graphs the expected wait time for an improvement can be replaced by its square root in the parallel runtime, provided that the number of islands is large enough and improvements are transmitted efficiently, that is, . This leads to a speedup on the order of log n for OneMax and on the order of for some unimodal functions, such as LO. On Jumpk, the speedup is even on the order of at least nk/2. A similar effect is observed for torus graphs where the expected wait time can be replaced by its cube root. The hypercube reduces the (upper bound on the) expected wait time on each level to its logarithm, and on the complete graph it is reduced to a constant, again provided there are sufficiently many islands. This way, even on functions such as LO and Jumpk  the expected parallel time can be reduced to O(n). In all these results, the population size can be chosen in such a way that the total number of function evaluations does not increase, in an asymptotic sense. The population sizes leading to the best possible upper bounds on the parallel runtime have been stated explicitly (cf. Table 1 and also Table 3, which is in the appendix), therefore giving hints on how to parameterize parallel EAs.