Abstract

Structured evolutionary algorithms have been investigated for some time. However, they have been under explored especially in the field of multi-objective optimization. Despite good results, the use of complex dynamics and structures keep the understanding and adoption rate of structured evolutionary algorithms low. Here, we propose a general subpopulation framework that has the capability of integrating optimization algorithms without restrictions as well as aiding the design of structured algorithms. The proposed framework is capable of generalizing most of the structured evolutionary algorithms, such as cellular algorithms, island models, spatial predator-prey, and restricted mating based algorithms. Moreover, we propose two algorithms based on the general subpopulation framework, demonstrating that with the simple addition of a number of single-objective differential evolution algorithms for each objective, the results improve greatly, even when the combined algorithms behave poorly when evaluated alone at the tests. Most importantly, the comparison between the subpopulation algorithms and their related panmictic algorithms suggests that the competition between different strategies inside one population can have deleterious consequences for an algorithm and reveals a strong benefit of using the subpopulation framework.

1  Introduction

Although particle swarm optimization algorithms, differential evolution, and genetic algorithms follow different lines of thought, they can all be seen from the same framework or structure. Not only these types, but most of the algorithms in evolutionary computation, share the same framework. They are based on a single population of individuals, which interacts in some form to produce new ones inside the same population. These types of algorithms are usually given the name of unstructured evolutionary algorithms (EAs) or panmictic (Sprave, 1999).

On the other hand, island-based models and cellular algorithms achieved relevant improvements, indicating that the evolutionary bioinspiration, when extended to include concepts of subpopulation and neighborhood aspects, can be advantageous (Tomassini, 2005). These types of algorithms are called structured models.

Nonetheless, the use of structured algorithms in multi-objective optimization has been underexplored (Nebro et al., 2006). Recently, we started asking ourselves what the next step could be (future research trends; Coello et al., 2006), since very simple and effective algorithms were developed, and it is hard to improve them without losing their benefits. This article tackles this problem from a different perspective. Here, we switch the focus from algorithms to frameworks.1 Moreover, when changing from a panmictic to a structured framework, small and simple changes may give relevant improvements to the state of the art algorithms.

This article proposes a subpopulation framework which has the following features:

  1. Integration Capability.  It allows for the addition of any number of algorithms which are integrated as subpopulations of the framework. Although this feature is not new, for example, it was explored similarly in island models (Li and Yang, 2008), here we show that not only EAs, but any optimization algorithm can be integrated in this framework. These algorithms are not required to be population-based either (examples of how this can be constructed are given in Section 6.2).

  2. General Formulation.  This framework is a general case for most of the structured approaches including but not limited to cellular algorithms and island based models (Section 6.1). The formalized subpopulation framework also generalizes the panmictic framework, because the panmictic framework is its special case when the number of subpopulations is fixed to 1 and the matrix set (which describes the interaction among subpopulations of the proposed framework, further explained in Section 6) can be ignored.

  3. State of the Art Solutions.  Experimentally, it was shown that algorithms based on the subpopulation framework can achieve state of the art results (Section 9). In fact, results with the subpopulation algorithm based on novelty (SAN; Section 7.2) can be reasonably regarded as one of the most robust algorithms to date in multi-objective optimization, solving different types of problems in bi-objective and many objective settings with excellent results and surpassing the third version of the generalized differential evolution algorithm (in short GDE3, currently one of the most suitable MOEAs; Durillo et al., 2010) in most of the tests.

Experiments are conducted with two novel algorithms that implement the proposed subpopulation framework. These algorithms are developed based on single population ones (panmictic). The chosen panmictic algorithms, which were also used for comparison, are the GDE3 and a simple novelty search algorithm called the multi-objective novelty algorithm (which is also a contribution of this article, described in Section 5.2). Here, the intention is to choose algorithms as different as possible to show some aspects of the subpopulation framework and its applicability to any type of algorithm. Note that the dissimilarities in the GDE3 and multi-objective novelty algorithm (MONA) arise from the fact that the former is objective-based while the latter is novelty-based (further explanation of novelty search is given on Section 5). In fact, it will be shown that the differences present in strategies of two or more subpopulations benefits their integration, in contrast with the competition which arises when different strategies are present in a single population.

This article shows that simple subpopulations dynamics can give relevant improvements when combined with an algorithm of the state of the art in the proposed framework, demonstrating the strong benefits of the subpopulation framework. Additionally, the competition between different strategies inside the traditional single-population framework can have deleterious consequences for an algorithm. This is analyzed and verified experimentally in Section 9.5. Such problems confronted by the panmictic algorithms are similar to the ones confronted by the objective-based algorithms when contrasted with novelty-search based algorithms (Lehman and Stanley, 2011), since they are easily trapped in deceptive fitness landscapes. The solution provided by the subpopulation framework is that the presence of multiple populations with different dynamics will let the algorithm be less sensitive to local optima.

Finally, this article presents a discussion over an unexpected result, where the experimental results with a combination of three simple subpopulations achieved state of the art quality in the WFG Toolkit (Huband et al., 2006; presented and explained in Sections 9.3 and 9.5).

Sections 2, 3, 4, and 5 briefly review the literature in similar structured EAs, differential evolution in single-objective optimization, differential evolution multi-objective algorithms, and novelty search areas, respectively. Thereafter, Section 6 proposes the general subpopulation framework. Section 7 describes two subpopulation algorithms which use the general subpopulation framework as their basis. Section 8 presents the methodology used for comparison. Section 9 describes the problems’ characteristics and shows the results obtained from them. Lastly, conclusions are presented in Section 10.

2  Structured EAs

On one hand, the usual type of EAs pertain to a class of single population algorithms, which we call here the single-population framework. But they are also known as panmictic EAs. On the other hand, there are other algorithms that spread their population into a structure with some defined interrelationship (Alba and Tomassini, 2002). This article will follow the definition that structured algorithms are any procedure that may have its population formulated with subpopulation groups, with the number of possible nontrivial subpopulation groups necessarily greater than one. For example, the simple EA cannot be seen as a structured algorithm, since the number of possible subpopulation groups can never be formulated as greater than one (Goldberg, 1989). Multi-objective ELSA is a local selection algorithm which also cannot be seen as a structured algorithm (Menczer et al., 2000). Note that some procedures, such as restricted mating, fit in the previous definition of structured algorithms (Zitzler and Thiele, 1999). Therefore, restricted-mating-based algorithms can be seen as structured algorithms (see Section 6.1 for the complete description).

Parallel EAs are usually examples of structured EAs that are sometimes divided into three classes (Gorges-Schleuter, 1991; Sprave, 1999).

  1. Island Model.  The basic structure used by this model consists of multiple subpopulations, where a limited amount of genetic information is exchanged between any of them arbitrarily.

  2. Stepping Stone Model.  In this model a neighborhood relation is defined, where only the adjacent subpopulations can exchange information. Aside from that, it is defined in the same way as the island model.

  3. Neighborhood Model.  A complex single population structure, where individuals interact only with adjacent individuals.

The cellular algorithm (Manderick and Spiessens, 1989; also called fine grained model or lattice model), for example, pertains to the third class.

According to De Toro et al. (2002), Parallel MOEA models can be divided also into three classes: global parallelization, coarse grain, and fine grain. Global parallelization does not present any structured population aspect, while coarse grain (also called island GAs) and fine grain (also called cellular GAs) are parallel versions of the structured algorithms already mentioned. In Talbi et al. (2008), the classifications of the parallel models differ from the previous three classes, although from a population structure point of view they can still be converted to the previous three classes.

Other types of EAs were also developed where the evolutionary conditions differed from subpopulation to subpopulation. These were called nonstandard structured EAs and they were reviewed by Alba and Tomassini (2002). Another extensive review of single-objective structured EAs can be found in the book of Tomassini (2005).

Regarding multi-objective algorithms, there are also some algorithms that are structured. To cite some: multi-objective cellular algorithms (Nebro et al., 2006), some rudimentary subpopulation algorithms (Santos et al., 2010; Delbem et al., 2005), spatial predator-prey MOEA (Laumanns et al., 1998) and multi-colony ant algorithms (Iredi et al., 2001). Spatial predator-prey MOEA defines an adjacency matrix with edges as solutions where the predator makes a random walk and erases the worst solution in the neighborhood which is related to a given objective (Laumanns et al., 1998). The number of predators walking defines the number of objectives. Ant colony optimization algorithms construct a population of solutions by sampling from a probabilistic model (usually in the form of a matrix of pheromone trails). This pheromone matrix is constantly updated by the ants. Although they cannot be defined as structured algorithms by the definition above, their multi-colony version can be so defined. Multi-colony optimization algorithms normally use multiple matrices of pheromone with some rules to decide how and which pheromone matrix to be updated/used.

Moreover, a generalized framework of the structured algorithms is still nonexistent. This article fills this gap by formalizing a general unifying framework capable of representing most if not all of these structured models.

2.1  Related Methods

Although it is a single population algorithm, AMALGAM is related to the proposed framework since both can be used to integrate algorithms. AMALGAM is a panmictic multi-objective algorithm that creates a number of offspring points using genetic operators from different algorithms. Fast nondominated sorting is used to rank the current offspring together with the previous population, subsequently defining the next population (Vrugt and Robinson, 2007).

As mentioned before, one important difference between AMALGAM and the proposed framework is that the first is panmictic. Therefore, it has the disadvantage that multiple algorithms joined together may conflict with each other in the single pool of solutions. Another important difference is that AMALGAM can only define the integration of algorithms with biological models for population evolution, since genetic operators are necessary for the integration. Here, the proposed framework defines the integration of any optimization algorithm.

The portfolio design proposed by Gomes and Selman (1997) runs different algorithms (strategies) or copies of the same strategy with the objective of selecting the best strategy for the given problem. Details of how the selection and evaluation of strategies as well as the strategies themselves are dependent on the problem at hand (Guerri and Milano, 2004). The strategies run without communication between each other. Therefore, when considered under the light of the framework described here, the set of interaction matrices is null and can be ignored (interaction matrices are part of the framework defined in Section 6). The similarities between this method and the proposed framework are limited to the use of multiple algorithms together.

3  Differential Evolution

Differential evolution (DE) is a meta-heuristic contained in the subfield of evolutionary computation, which can be employed for optimizing multi-dimensional real-value functions, where these functions are required neither to be continuous nor to be differentiable. It solves problems using a simple algorithm similar to the ones used by EAs, but the operators used by the DE are not based on the evolution of species (Storn and Price, 1997). The algorithm is described succinctly in Table 1, and the procedures of mutation, crossover, and selection are explained in the following sections.

Table 1:
Differential evolution algorithm.
1. Initialize population with random samples uniformly distributed over the search space 
2. Repeat for each individual until a criterion of convergence is met 
 (a)   Mutation 
 (b)   Crossover 
 (c)   Selection 
3. Return solution 
1. Initialize population with random samples uniformly distributed over the search space 
2. Repeat for each individual until a criterion of convergence is met 
 (a)   Mutation 
 (b)   Crossover 
 (c)   Selection 
3. Return solution 

3.1  Mutation

For each vector , where i is the index of this vector (which relates to the individual index in the population, since each individual has its own vector) and g is the current generation where the vector takes place, the mutation is applied by creating a mutant vector based on a numerical operator described in Equation (1).
formula
1
where , , and are randomly selected individuals of the population that must differ from the individual i. F is a parameter which should meet the condition .

3.2  Crossover

During crossover, trial vectors are created from a combination of the mutation vector and the original vector . The trial vector created is expressed in Equation (2).
formula
2
where is an uniformly distributed random number, is a parameter passed to DE, j is the vector component index, and rndi is a randomly chosen index, with the objective of choosing at least one component from the vector .

3.3  Selection

The selection is the last step of the generation, where it is determined for each vector whether the trial vector will be a substitute for the original vector or not. For this, both the and are evaluated and the vector with better fitness function is kept, forming the next generation vector .

3.4  Comparison with Other Evolutionary Algorithms

The DE algorithm and its variations are known by their robustness, quality of the solutions, short runtime ease of use, and application to a wide range of applications not limited by the type of the objective function (Storn and Price, 1997; Brest et al., 2006).

Promising results were obtained in numerous different experiments. Two variations of it achieved the best solutions on all problems from ICEC’96 (Storn and Price, 2002). In the work of Vesterstrom and Thomsen (2004) it was shown to achieve more accurate solutions, faster, and with greater robustness than particle swarm optimization (PSO) and EAs. Currently, the DE is still compared on equal grounds to complex optimization algorithms (e.g., estimation of distribution algorithms; García et al., 2009).

4  Differential-Evolution-Based-Multi-Objective Methods

The DE was shown to achieve significant improvement over other single-objective (Vesterstrom and Thomsen, 2004) as well as in multi-criteria optimization algorithms (Tušar and Filipič, 2007; Durillo et al., 2010). The reason behind these overall better results lies partially on the rotational invariant behavior of DE’s operators, which adapts to the fitness landscape when compared with NSGA-II’s genetic operators and other algorithms with similar genetic operators (Iorio and Li, 2005). Recent studies show that in multi-objective problems, DE is one of the best approaches when the problem size increases in scale (Durillo et al., 2010).

There are various multi-objective methods based on differential evolution (Chakraborty, 2008). They can be divided into old versions of algorithms which used only Pareto dominance to select individuals and modern methods which use the Pareto dominance and a diversity measure for selection (Tušar and Filipič, 2007). It is generally accepted that the last version of the generalized evolution algorithm (the GDE3; Kukkonen and Lampinen, 2005) and the differential evolution multi-objective algorithm (DEMO; Robič and Filipič, 2005) are the representatives of the modern class of multi-objective algorithms based on DE (Tušar and Filipič, 2007; Durillo et al., 2010). By taking into account that DEMO (Robič and Filipič, 2005) is similar to GDE3 (Kukkonen and Lampinen, 2005), although without the constraint handling and a fallback to the original DE in the case of single objective, we will conduct the comparison and study solely on GDE3.

Recently, a comparison between eight modern multi-objective algorithms was made (Durillo et al., 2010). This work presented evidence that GDE3 is currently one of the most suitable MOEAs. The results stated that GDE3 tends not only to be faster, but also scales better in relation to the number of decision variables. In the tests made, there was only one other algorithm based on the PSO approach with similar performance.

4.1  General Differential Evolution 3

GDE3 has the same basic loop as the DE, with a modification in the selection phase and the addition of a pruning stage. In the selection phase, the algorithm considers the Pareto dominance and the constraints. Let s and t correspond respectively to the solution and its respective trial solution. Then, in the selection phase, the following statements apply:

  1. If both s and t are not feasible, then the trial solution t substitute s only when it dominates the solution s in unconstrained space.

  2. If one solution is feasible and the other is not feasible, the feasible solution is chosen.

  3. Finally, if both solutions are feasible, the solution that dominates the other is kept. However, if neither one dominates the other, both solutions are added to the next population, increasing the size of the population.

As a consequence of the modifications in the selection stage, the pruning stage was added to keep the population to a minimum because the GDE3 selection phase described above can cause the population to increase in size. The pruning stage consists of sorting based on a diversity measure, consecutively selecting the first individuals to fill the next population size.

In the first version, GDE3 used crowding distance as its diversity measure (Kukkonen and Lampinen, 2005), similar to NSGAII (Deb et al., 2002). But in its most recent version the k-nearest neighbors measure was used as a distance measure. This metric was shown to be more consistent than the crowding distance measure when the number of objectives is greater than two (Kukkonen and Lampinen, 2007). The experiments conducted in this article use GDE3 with a k-nearest neighbors measure.

5  Novelty Search

In nature, evolution is usually observed as an open-ended process that continually creates individuals with greater complexity and diversity (Maley, 1999). Novelty search is a method developed by Lehman and Stanley that mimics the open-ended evolutionary process with a simple novelty metric (Lehman and Stanley, 2008, 2011), rewarding novel individuals with a direct measure of novelty.

Moreover, in the perspective of optimization, problems are sometimes deceptive. This is usually the case for real-world problems, because when problems increase in size and complexity, it is improbable that a fitness function exists that can drive the algorithm directly to the goal. Novelty search aids the optimization in these deceptive spaces by identifying stepping stones, which are the novel individuals found by the novelty metric.

Recently, novelty search was used in very distinct areas such as neuro-evolution (Lehman and Stanley, 2011; Mouret and Doncieux, 2009), genetic programming (Lehman and Stanley, 2010), multi-objective evolution (Mouret and Doncieux, 2009), and robotics (Doncieux and Mouret, 2010; Doncieux et al., 2009). Moreover, there are an ever increasing number of articles with further evidence of novelty search benefits in deceptive problems. Some articles even showed the astonishing find that novelty search can sometimes be used as a substitute of objective-based search (Lehman and Stanley, 2011; Woolley and Stanley, 2011). The good results of novelty search in relation to objective-based search revealed that objective-based search may have deleterious effects on search.

5.1  Novelty Metric

For measuring the novelty of a solution, novelty search relies on a metric that can be any equation capable of describing how much an individual is novel in comparison with the past individuals of the archive. The usual metric used is the k-nearest neighbors, which was also employed by Lehman and Stanley in their pioneering work on novelty search (Lehman and Stanley, 2008). The following equation defines it exactly:
formula
3
where k is a parameter defined arbitrarily, and is the ith nearest neighbor of x according to the distance measure . The distance measure is problem dependent. Usually, it is calculated in the behavior space rather than the fitness space, where the behavior space is the small set of features which identifies a unique behavior (reducing the search space and differing in this way from exhaustive enumeration). The archive is an incremental set of individuals, receiving new individuals only if they surpass a novelty threshold nmin adjusted automatically by some rule.

It goes often unnoticed, but one of the problems of this novelty metric lies on its dynamic adjustment, that is the parameters used to update the archive. The following are the dynamics commonly used to update the metric:

  1. If more than na individuals entered the archive, multiply nmin by ninc;

  2. If nr individuals did not enter in the archive, multiply nmin by ndec;

where na, nr are positive integers (referring to the number of individuals), ninc, (referring to values of the novelty metric). These parameters define the rate of individuals that enter the archive. It follows that the bigger the archive, the more sensitive the novelty metric is to identify new individuals, because the higher the number of points, the less separated the points will be from each other. Therefore, a bigger archive is the direct result of a smaller nmin and consequently a more sensitive search with fewer chances of letting new individuals go unnoticed. On the other hand, a bigger archive makes the metric evaluation slower.

5.2  Multi-Objective Novelty Algorithm

In this section, we propose MONA, the first algorithm to use novelty in a multi-objective context. The algorithm solely uses novelty search. Therefore, this algorithm follows the same line as the Lehman and Stanley (2011) study, hypothesizing that an algorithm based on the novelty alone might be better than objective-based methods. MONA is a very simple algorithm proposed in this article, where the space of all the objectives is taken to be the behavior space of the novelty, which differentiates the method from the Mouret approach (Mouret and Doncieux, 2009) where novelty was seen as an additional objective. Table 2 describes the algorithm.

Table 2:
Multi-Objective Novelty Algorithm.
1. Initialize population with random samples uniformly distributed over the search space 
2. Repeat for each individual in the population until a criterion of convergence is met 
 (a)   Apply the same mutation and crossover operators as used by DE 
 (b)   Calculate the novelty metric 
 (c)   Verify whether its novelty metric is above the nmin threshold; if it is above, then insert it on the archive (unlimited in size) 
 (d)   Update nmin (see Section 5.1
 (e)   Create a new population by sampling uniformly with replacement from the archive 
3. Return the archive’s non-dominated solutions as the solution set 
1. Initialize population with random samples uniformly distributed over the search space 
2. Repeat for each individual in the population until a criterion of convergence is met 
 (a)   Apply the same mutation and crossover operators as used by DE 
 (b)   Calculate the novelty metric 
 (c)   Verify whether its novelty metric is above the nmin threshold; if it is above, then insert it on the archive (unlimited in size) 
 (d)   Update nmin (see Section 5.1
 (e)   Create a new population by sampling uniformly with replacement from the archive 
3. Return the archive’s non-dominated solutions as the solution set 

The purpose of this algorithm is to be a very simple algorithm, which will be compared as well as used in the general subpopulation framework, showing that from very simple bases efficient and robust algorithms can be constructed.

6  General Subpopulation Framework

The general subpopulation framework (GSF) is proposed here as an underlining structure of a class of multi-objective algorithms which unifies a number of structured EAs in its formalization. Additionally, it is capable of integrating different optimization algorithms without restriction. This flexible ability of joining algorithms together is important, as will be shown in the experiments. Mostly, because this type of cooperation between algorithms can sum their benefits while the competition between them in each subpopulation is decreased to a minimum.

In this context, we define several terms of art.

Definition 1

Subpopulation

A subpopulation is a finite set of individuals related with a group of well defined dynamics. These dynamics are usually (although not necessarily) composed of interactions of these individuals with either themselves or individuals of other subpopulations. But they are not in any way limited to it.

When connecting these subpopulations together, a new matrix appears. To this matrix is given the name . It is formally defined as follows.

Definition 2

Subpopulation Interaction Probability Matrix Set ()

The subpopulation interaction probability matrix set is a set of matrixes of the form:
formula
4
where m is the number of types of interactions used in an optimization algorithm, and each IMi corresponds to the following matrix:
formula
5
where s is the number of subpopulations and is the probability of an interaction i occurring in subpopulation a and taking as parameters the individuals of subpopulation b or the subpopulation b itself.

The evolutionary operators are examples of interactions. For example, in the case of a subpopulation-based version of DE’s operators, let us assume their interactions are described by IMd. Then, the trial vector would be, for each individual of this subpopulation, composed of three individuals chosen based on the probabilities of the IMd matrix. Recall that the matrix set can be ignored in the case of only one subpopulation and this is why it can be ignored for panmictic algorithms.

Note also that the interaction of each subpopulation can differ from subpopulation to subpopulation. In the case of just one subpopulation k having an interaction i, IMi would be of the following form:
formula
6
Naturally, more complicated global dynamics might also be present, such as dynamical probabilities that depend on time t:
formula
7

Additionally, the population size variable is extended to a vector version. This is possible because the proposed framework has a number of subpopulations, each with a given size. This vector is hereby called and is defined as follows:

Definition 3

Vector of Subpopulation Sizes

The subpopulations’ sizes are defined by vector , which corresponds to:
formula
8
where is the size of subpopulation a. The total subpopulation size () is naturally:
formula
9
An equivalent and more convenient representation exists which is independent of the total subpopulation size. Let , corresponding to the ratio of the total subpopulation. Then, the following representation is also verified:
formula
10

With the previous definitions it is possible to explicitly describe the GSF:

Definition 4

General Subpopulation Framework (GSF)

Suppose we have s subpopulations, then is the set of subpopulations and is the set of panmictic algorithms where a subpopulation Pi is constructed by an algorithm (strategy) Ai. Therefore, the GSF is defined as a 4-tuple , where and were previously defined as the vector of subpopulation sizes and the set of interaction probability matrices, respectively.

The subpopulations may be used to join arbitrary algorithms which may not even be based on populations. That is, as long as each algorithm can generate a set of solutions to compose the subpopulation which is representative of its dynamics, the subpopulation framework can handle the joining process (examples are given in Sections 6.2 and 7). For example, in the case of the random search algorithm, the subpopulation can be constructed from the last generated solutions. Therefore, to the knowledge of the authors, any algorithms can be joined (mixed) by using this framework. Naturally, for the inclusion of an algorithm in this framework it is also relevant but not necessary to have:

  1. Dynamics taking into account different individuals of its population (which can be modified to handle any individual of any population by the set of matrices).

  2. Different dynamics from the other subpopulations present in the framework. This can be relevant, since the higher the similarities between subpopulations are, the less important the subpopulations become; in other words, multiple subpopulations with similar dynamics will produce results similar to a single population.

The following sections demonstrate how GSF can represent most of the optimization algorithms. Section 6.1 shows how GSF can represent various types of structured EAs, while Section 6.2 gives two examples of famous algorithms (one a panmictic EA and the other a non-evolutionary algorithm) as well as shows how they can be transformed to the GSF approach without losing many of their characteristics. In Section 7, two new algorithms are proposed based on their related panmictic algorithms. This time, however, the objective is not merely illustrative, since the algorithms described possess important features described in detail later on. In fact, these important features enable them to surpass the state of the art algorithms.

6.1  Representation Capabilities

The general subpopulation framework can represent various types of structured EAs, including:

  1. Island-Based Models.  From Tomassini (2005) each panmictic island forms a subpopulation Pi with the set of algorithms containing identical algorithms for all subpopulations. Let the number of panmictic islands be s, then and . Between the subpopulations an interaction defined by the exchange of genetic information can be formalized with an IM1 matrix of the form:
    formula
    That is, each individual selected for exchange must necessarily go to another subpopulation, therefore the diagonal entries are zero. This dynamic is usually the unique one which other subpopulations can participate in. Inside the algorithms other dynamics can take place (e.g., crossover) and these would have also a trivial set of IMis with the only non-null probabilities residing on its diagonal (i.e., the interactions happen only from inside the same subpopulation).
  2. Cellular Algorithms.  From Manderick and Spiessens (1989), this type of algorithm can be thought of as the opposite to island-based models, where the number of subpopulations is maximized with the minimum possible size of subpopulations, that is, cellular algorithms can be seen as a large number of subpopulations Pi of equal size 1. Let the number of cells in a given cellular algorithm be s, then and with each individual cell corresponding to a subpopulation Pi and the update of each cell can be divided into s algorithms forming the set of panmictic algorithms. Consider the case of a cellular algorithm with nine individuals with a von Neumann neighborhood, then it possesses nine subpopulations and an IMc matrix defined by:
    formula
    Moreover, all interactions of cellular algorithms use the same neighborhood, therefore the set of matrices is given by:
    formula
    In some certain cellular algorithms, a dynamical IMc has to be used to represent the change of neighborhood of each cell.
  3. Restricted Mating.  From Zitzler and Thiele (1999) some procedures although not related to subpopulations at first glance, can be converted to this formalization. Restricted mating, for example, can be formalized with subpopulations. By considering each subpopulation as containing only one individual, the restricted mating interaction is defined by:
    formula
    when for any pair, becomes:
    formula
    formula
    where is an arbitrary threshold and dist(a, b) is usually the Euclidean distance between solutions a and b (Zitzler and Thiele, 1999).
  4. Spatial Predator-Prey MOEA.  From Laumanns et al. (1998), this algorithm defines an adjacency matrix G with edges as solutions where the predator makes a random walk. This algorithm can be reformulated into the subpopulation framework by considering as interaction the replacement of the prey selected by the predators. Although the replacement can be done in multiple ways, only the edges in the predator’s neighborhood participate. Therefore, for the replacement interaction, each position of the interaction matrix becomes:
    formula
    where k is the edge of the predator responsible for this interaction matrix. Basically, two solutions can only interact if they are in the k (predator’s edge) neighborhood.
  5. Multi-Colony Ant Algorithms.  Ant colony optimization algorithms in general are difficult to map into the subpopulation framework because they use population models instead of the solutions themselves. This problem is faced similarly when trying to convert estimation of distribution algorithms (Larranaga and Lozano, 2002; Pelikan, 2005). Additionally, some of these methods do not possess a population structure. For example, ant colony optimization algorithms with one colony do not use a structure approach to optimization following the definition above, that is, although the construction of the solutions by the ants use solution components organized in a structured way, the population of solutions itself is not structurally formulated (Iredi et al., 2001). However, some of them such as the multi-colony ant algorithms do have a population structure. In this case, it is possible to approximate roughly the population model (e.g., the pheronomone matrix) as a subpopulation and consider the interrelation between them as interactions with their respective interaction matrices. That is, the update interaction of pheromone matrices can be represented as:
    formula
    when the update is only realized at the original colony. And when the update is done by region () in the nondominated front, for a given solution a we have:
    formula

6.2  Examples of Panmictic to GSF Conversion

This section shows how optimization algorithms of almost any type can be converted to multi-population versions represented by the GSF. Examples of both the simple genetic algorithm (Goldberg, 1989) and the simulated annealing (Kirkpatrick et al., 1983) will be presented. Their matrix sets will be defined and, among other things, it will be shown how their dynamics could be used to affect other subpopulations. Note that the conversions will not make explicit the vector of subpopulations sizes , since this parameter is not related to the representation, and thus it can be established independently.

6.2.1  Simple Genetic Algorithm

There are three basic procedures in a simple genetic algorithm: crossover, mutation, and selection. However, mutation does not depend on other individuals and selection is executed over a set of individuals of its own population. In this case, it does not make sense to define an interaction matrix for them. Mutation and selection can be applied as usual, with the only difference from the single population version being that the target is now the current subpopulation (i.e., not the entire population). In fact, this slight modification defines the algorithm Ai which constructs its respective subpopulation Pi under the GSF formulation.

Thus, let us define the set, which consists of only the crossover interaction (). The crossover interaction matrix IM1 defines the probability that an individual of a given subpopulation will participate in the crossover. The exact value of the IM1 is the trivial . Note that the simple GA is not a structured algorithm (because there are not any other subpopulations to interact with). However, the designer might want to modify IM1 when joining this algorithm with other algorithms.

6.2.2  Simulated Annealing

One of the main difficulties that can be spotted in simulated annealing is that it is not a population-based algorithm. This problem can be circumvented by adding the recent modifications of the variables’ values in a “first in–first out” data structure, creating a subpopulation derived from its dynamics. Therefore, the simulated annealing algorithm plus the creation of a subpopulation defines algorithm Ai to be applied on its created subpopulation Pi.

Lastly, the interaction matrices are defined by an empty set (), since there is no interaction between solutions in the dynamics of simulated annealing. An empty might be unappealing at first glance, but when joined with the subpopulations of other algorithms, the subpopulation constructed by this algorithm might be used by other interactions and consequently influence the global dynamics.

6.3  What Is the Benefit of Using GSF to Describe a Panmictic Algorithm?

It was shown before that panmictic algorithms can be converted to the GSF. However, they possess a trivial and bring little explanation. Thus, one might question the usefulness of such a conversion.

The answer is that, once converted to the GSF, any panmictic algorithms can be integrated seamlessly as a subpopulation in other GSF-based algorithms. Section 7 will show some examples of algorithms constructed using the GSF.

Last but not least, the pressures of different panmictic algorithms can be compared by weighting their subpopulations’ sizes. Comparison of algorithms is an important and complicated subject which is aided by GSF. GSF also enables a relatively easy evaluation of the cooperation between algorithms, facilitating the construction of hybrid algorithms with the simple addition or deletion of subpopulations.

6.4  What Is the Benefit of Using GSF?

One feature of the subpopulation framework is the division of interactions over interaction matrices. Thus, one can separate only the interactions under interest and compare their structural behavior by looking at those matrices. For example, it is possible to see that both spatial predator-prey and cellular algorithms are similar in the sense that both use similar interaction matrices (neighborhood matrices).

Moreover, designing structured algorithms may become easier by looking at different interactions and interaction matrices instead of multiple structures and their internal behavior. The framework also aids other abstractions such as a mix between structures (i.e., sometimes the structure behaves like a cellular algorithm, and sometimes like the island model) by the simple inclusion of other interaction matrices. One example would be the inclusion of a cellular interaction matrix in an island model algorithm.

7  Evaluation of General Subpopulation Algorithms (GSAs)

To evaluate the subpopulation framework appropriately, we elaborate two subpopulation algorithms: one based on GDE3 (see Section 4.1) and the other based on MONA (see Section 5.2). We will hereby call these GSAs the “subpopulation algorithm based on general differential evolution” (SAGDE) and the “subpopulation algorithm based on novelty” (SAN).

Both SAGDE and SAN are motivated by the fact that single-objective DEs evolved at each objective usually achieve good results. Take, for example, the WFG1 problem (Huband et al., 2006). If we apply a GSA made uniquely of subpopulations of single-objective DEs, each evolving a different single objective, we achieve usually the result plotted in Figure 1. Note that the DEs achieve good results on each single objective, with the resultant individuals very close to the Pareto front, but the front is hardly covered. Then, what if another subpopulation is added to this algorithm, which might wisely “mix” these DE solutions? The following algorithms are motivated by this question and in Section 9 an extensive answer is given based on the experiments.

Figure 1:

Solutions of single-objective DEs, each one evolving a different objective. For the test, the WFG1 problem is used with two objectives, distance parameters, and four position parameters. Each DE had a population of individuals, with , , and the maximum number of generations set to .

Figure 1:

Solutions of single-objective DEs, each one evolving a different objective. For the test, the WFG1 problem is used with two objectives, distance parameters, and four position parameters. Each DE had a population of individuals, with , , and the maximum number of generations set to .

7.1  SAGDE

In a problem with n objectives, SAGDE has subpopulations , where are single-objective DEs with each one evolving a different objective and the GDE3 (multi-objective algorithm) is used as the algorithm for the subpopulation . The GDE3 subpopulation as well as the n single-objective differential evolution subpopulations behave in the same way as usual, aside from the fact that a uniform matrix IM1 (shown in Equation (20)) is used to determine which individual will be part of the trial vector in the differential operator, that is, where:
formula
20

7.2  SAN

In the same way as SAGDE, SAN has subpopulations with n of them made of single-objective DEs (), where n is the number of objectives of the problem. Each single-objective DE optimizes a different objective and there is an additional subpopulation corresponding to the MONA () (i.e., the multi-objective algorithm based on the novelty search approach proposed by this article, see Section 5.2).

Both MONA and the n single-objective DE subpopulations behave in the same way as usual with the unique differences being the use the same uniform matrix IM1 described in Equation (20) (i.e., an individual chosen has an uniform probability of of coming from any subpopulation) to select individuals for the trial vector in the DE operator used in both algorithms. Moreover, MONA verifies any new individuals generated by any subpopulation for inclusion in the novelty archive (i.e., not only its own generated individuals). In other words, the inclusion of solutions in the novelty archive is a different interaction defined by IM2. It is activated every time a new solution is created in any subpopulation, IM2 matrix is defined below:
formula
21
where the last column refers to the MONA’s subpopulation.

8  Comparison Methodology

To compare algorithms the following procedure is used:

  1. Realize multiple runs of the algorithm and store the solution sets.

  2. For each solution set do:

  1. Compute the hypervolume indicator (Section 8.1.1);

  2. Compute the indicator (Section 8.1.2);

  3. Store each quality indicator result in a separate vector.

  1. Algorithms are compared in three ways:

  1. A group of algorithms is compared using their respective quality indicator’s mean value and standard deviation. Algorithms with mean value inside the standard deviation of the best mean value are considered equally good.

  2. Verify the statistical significance between a pair of algorithms with a non-parametric Mann-Whitney test (Hollander and Wolfe, 1999). The alternative hypothesis that one method has a better (smaller) quality indicator than the other is accepted if the p value is lower than .

  3. Calculate the 50% attainment surface (Section 8.2) based on the solution sets.

8.1  Quality Indicators

In this article, to compare the quality of the algorithms, the hypervolume indicator (Beume et al., 2009; Zitzler and Thiele, 1998) and the indicator (Zitzler et al., 2003) are used. These unary quality indicators were recommended by Fonseca et al. (2005), since they are based on different preference information. The following sections define these quality indicators.

8.1.1  Hypervolume Indicator

The hypervolume indicator (Ih) is defined as the difference between the hypervolume of the Pareto front and the hypervolume of the non-dominated solution set in objective space (Beume et al., 2009; Zitzler and Thiele, 1998). This indicator requires a reference point for the calculation, therefore the nadir point is used in this article.

8.1.2  indicator

The indicator () is defined as the minimum factor by which a non-dominated approximation set (i.e., the set of objective vectors which do not dominate each other) is worse than the Pareto optimal front. Let a and p be vectors in Z (the objective space) with where d is the number of objectives, then the dominance between two vectors is defined by Equation (22).
formula
22
Then, according to Zitzler et al. (2003), the indicator is formally defined in Equation (23).
formula
23
where T is the target approximation set and O is the Pareto optimal set. In this article, O refers to a reference set which approximates the Pareto optimal set.

As shown in Okabe et al. (2003), quality indicators may be misleading. Therefore, when visually possible, attainment surfaces were also computed for the comparison.

8.2  Attainment Surfaces

An attainment surface (AS) is the boundary in objective space of the dominated area for a single run of an algorithm. They are important because such surfaces show detailed information about the performance differences between algorithms. To infer a statistically significant attainment surface, multiple runs of the algorithms are required and an approximated mean result is calculated. Usually, the 50% attainment surface is used as a mean measure approximation, which is defined as the area dominated by at least 50% of the approximation sets (Fonseca and Fleming, 1996; Grunert da Fonseca et al., 2001). In this article, the code provided by López-Ibánez et al. (2010) is used to obtain the 50% attainment surfaces.

9  Experiments

Some of the usual benchmarks of multi-objective problems poorly represent important classes such as non-separable and multimodal problems. Therefore, this article makes use of a relatively recent set of tests called WFG (Huband et al., 2006). The WFG set of problems presents a varied set of properties which can test the scalability of algorithms in both parameters and number of objectives. In Table 3 there is a summary of the characteristics of its test problems. The WFG Toolkit makes use of position and distance parameters. On one hand, when a distance parameter is modified, the new solution may dominate, be dominated, or be equivalent to the previous one. On the other hand, when a position parameter is modified, the new solution is either incomparable or equivalent to the previous one. Tests were performed for the WFG problems with distance parameters and four position parameters, resulting in parameters to be optimized.

Table 3:
Properties of the WFG test problems.
ProblemObj.SeparableModalityBiasGeometry
WFG1  Yes Uni Polynomial, flat Convex, mixed 
WFG2  No Uni — Convex, disconnected 
 fM No Multi —  
WFG3  No Uni — Linear, degenerate 
WFG4  Yes Multi — Concave 
WFG5  Yes Deceptive — Concave 
WFG6  No Uni — Concave 
WFG7  Yes Uni Parameter dependent Concave 
WFG8  No Uni Parameter dependent Concave 
WFG9  No Multi, deceptive Parameter dependent Concave 
ProblemObj.SeparableModalityBiasGeometry
WFG1  Yes Uni Polynomial, flat Convex, mixed 
WFG2  No Uni — Convex, disconnected 
 fM No Multi —  
WFG3  No Uni — Linear, degenerate 
WFG4  Yes Multi — Concave 
WFG5  Yes Deceptive — Concave 
WFG6  No Uni — Concave 
WFG7  Yes Uni Parameter dependent Concave 
WFG8  No Uni Parameter dependent Concave 
WFG9  No Multi, deceptive Parameter dependent Concave 

9.1  Results and Discussion

Each empirical attainment surface and quality indicator was calculated based on solution sets, which were obtained from multiple independent runs of the algorithm in question. Different seeds were used for each algorithm run. Both the maximum number of generations and the total subpopulation size2 (or population size in the case of panmictic algorithms) were fixed to and , respectively. This fact ensures that all algorithms have the same number of evaluations.

9.2  Choice of Parameters

Table 4 shows the parameters used for GDE3. They correspond to the same used by Kukkonen and Lampinen (2007). The reader may observe that when compared with the usual single-objective DE settings, the parameters of all algorithms possess a lower value of and F. This happens because multi-objective optimization maintains a high diversity. Therefore, it is not necessary to have a higher value of F or for better exploration of the search space, because individuals are different enough and the trial vectors are also suitably different. Tests with even smaller values of F were shown to improve the coverage (), but with great impacts on the distance to the optimal Pareto front (OPF). The gain in coverage was not enough to surpass SAN’s coverage, and the distance to the front was enough poorer that GDE3 was surpassed by SAN in all problems tested (even on some problems for which it performed similarly to SAN with ).

Table 4:
Parameters used for the different algorithms. The first two ratios of the vector correspond to the subpopulations of DEs used and the third ratio is either MONA (for the SAN) or GDE3 (for the SAGDE).
ParameterValue
GDE3  
 0.1 
F 0.5 
MONA  
 0.1 
F 0.1 
ninc 1.1 
ndec 0.999 
na 
nr 50,000 
SAGDE  
 0.1 
F 0.1 
 Uniform 
  
SAN  
 0.1 
F 0.1 
 Uniform 
  
ninc 1.1 
ndec 0.999 
na 
nr 50,000 
ParameterValue
GDE3  
 0.1 
F 0.5 
MONA  
 0.1 
F 0.1 
ninc 1.1 
ndec 0.999 
na 
nr 50,000 
SAGDE  
 0.1 
F 0.1 
 Uniform 
  
SAN  
 0.1 
F 0.1 
 Uniform 
  
ninc 1.1 
ndec 0.999 
na 
nr 50,000 

In the case of GSA algorithms, F should logically be an even lower value. This is justified by the fact that GSA’s subpopulations are usually very different from each other. We conducted preliminary tests with and many results were the same as the ones obtained with , although some problems showed, as expected, a slightly worse result. For MONA and SAN, the novelty parameters were decided upon a quality-efficiency trade-off, with both algorithms having the same fixed parameters.

Regarding the chosen subpopulation sizes of SAN and SAGDE, they are directly related to subpopulation’s algorithm strength to “mix” the solutions of the single-objective DEs’ subpopulations. Some subpopulations better mix the solutions than others (directly related to the coverage of the OPF), requiring a smaller subpopulation size (MONA subpopulation), while other subpopulations require a bigger subpopulation size to get a similar coverage (GDE3). This happens especially because GDE3s have various strategies, and coverage is just one of its strategies. Recall that in SAN and SAGDE, there are two single-objective DEs. These algorithms explore the problems as shown in Figure 1 and discussed in Section 7. Therefore, mixing the solution is necessary for coverage and this is only achieved by other subpopulations (GDE3 and MONA subpopulations for SAGDE and SAN, respectively).

9.3  Study on Bi-objective Optimization

Tests were performed for the WFG problems with two objectives. Parameters used by the algorithms are fixed and summarized in Table 4.

The comparison between the 50% attainment surfaces of SAN, SAGDE, GDE3, and MONA is shown in Figures 2 and 3. Before discussing the results it is necessary to show Tables 5 and 6 with the mean and standard deviation (SD) of and hypervolume quality indicators as well as Tables 7 and 8 with the statistical significance of both quality indicators. Most of the time, the tables and figures agree with each other. Therefore, when not stated otherwise, the discussion concerns the overall behavior of all three comparisons (attainment surfaces, mean/SD, and statistical hypothesis testing). For more information on the construction of these tables and figures, please refer to Section 8 or to the tables and figures themselves.

Figure 2:

Attainment surfaces at a level of 50% for the WFG Toolkit problems (minimization problems). Calculated for independent runs.

Figure 2:

Attainment surfaces at a level of 50% for the WFG Toolkit problems (minimization problems). Calculated for independent runs.

Figure 3:

Attainment surfaces at a level of 50% for the WFG Toolkit problems (minimization problems). Calculated for independent runs.

Figure 3:

Attainment surfaces at a level of 50% for the WFG Toolkit problems (minimization problems). Calculated for independent runs.

Table 5:
indicator’s mean and standard deviation for SAN, SAGDE, GDE3, and MONA. For each problem, the best mean value as well as the other mean values inside the standard variation of the best mean value are marked in bold.
SANSAGDEGDE3MONA
ProblemsMean (SD)Mean (SD)Mean (SD)Mean (SD)
WFG1     
WFG2     
WFG3     
WFG4     
WFG5     
WFG6     
WFG7     
WFG8     
WFG9     
SANSAGDEGDE3MONA
ProblemsMean (SD)Mean (SD)Mean (SD)Mean (SD)
WFG1     
WFG2     
WFG3     
WFG4     
WFG5     
WFG6     
WFG7     
WFG8     
WFG9     
Table 6:
Hypervolume indicator’s mean and standard deviation for SAN, SAGDE, GDE3, and MONA. For each problem, the best mean value as well as the other mean values inside the standard variation of the best mean value are marked in bold.
SANSAGDEGDE3MONA
ProblemsMean (SD)Mean (SD)Mean (SD)Mean (SD)
WFG1     
WFG2     
WFG3     
WFG4     
WFG5     
WFG6     
WFG7     
WFG8     
WFG9     
SANSAGDEGDE3MONA
ProblemsMean (SD)Mean (SD)Mean (SD)Mean (SD)
WFG1     
WFG2     
WFG3     
WFG4     
WFG5     
WFG6     
WFG7     
WFG8     
WFG9     
Table 7:
Comparison of p values between SAN, SAGDE, GDE3, and MONA algorithms with Mann-Whitney significance test using the indicator. Results are marked in bold when the null hypothesis is rejected with a significance level of . The alternative hypothesis is that the algorithm in the row is statistically better (smaller quality indicator) than the algorithm in the column.
AlgorithmProblemSANSAGDEGDE3MONA
SAN WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5     
 WFG6     
 WFG7     
 WFG8     
 WFG9     
SAGDE WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5     
 WFG6     
 WFG7     
 WFG8     
 WFG9     
GDE3 WFG1   
 WFG2     
 WFG3    
 WFG4     
 WFG5    
 WFG6    
 WFG7   
 WFG8     
 WFG9     
MONA WFG1  
 WFG2    
 WFG3    
 WFG4    
 WFG5    
 WFG6    
 WFG7    
 WFG8    
 WFG9    
AlgorithmProblemSANSAGDEGDE3MONA
SAN WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5     
 WFG6     
 WFG7     
 WFG8     
 WFG9     
SAGDE WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5     
 WFG6     
 WFG7     
 WFG8     
 WFG9     
GDE3 WFG1   
 WFG2     
 WFG3    
 WFG4     
 WFG5    
 WFG6    
 WFG7   
 WFG8     
 WFG9     
MONA WFG1  
 WFG2    
 WFG3    
 WFG4    
 WFG5    
 WFG6    
 WFG7    
 WFG8    
 WFG9    
Table 8:
Comparison of p-values between SAN, SAGDE, GDE3, and MONA algorithms with Mann-Whitney significance test using the hypervolume indicator. Results are marked in bold when the null hypothesis is rejected with a significance level of . The alternative hypothesis is that the algorithm in the row is statistically better (smaller quality indicator) than the algorithm in the column.
ProblemAlgorithmSANSAGDEGDE3MONA
SAN WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5     
 WFG6     
 WFG7     
 WFG8     
 WFG9     
SAGDE WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5    
 WFG6     
 WFG7     
 WFG8     
 WFG9     
GDE3 WFG1   
 WFG2     
 WFG3     
 WFG4     
 WFG5     
 WFG6     
 WFG7     
 WFG8     
 WFG9     
MONA WFG1 
 WFG2   
 WFG3  
 WFG4  
 WFG5  
 WFG6   
 WFG7  
 WFG8   
 WFG9  
ProblemAlgorithmSANSAGDEGDE3MONA
SAN WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5     
 WFG6     
 WFG7     
 WFG8     
 WFG9     
SAGDE WFG1     
 WFG2     
 WFG3     
 WFG4     
 WFG5    
 WFG6     
 WFG7     
 WFG8