Abstract

Adaptive population sizing aims at improving the overall progress of an evolution strategy. At each generation, it determines the parental population size that promises the largest fitness gain, based on the information collected during the evolutionary process. In this paper, we develop an adaptive variant of a evolution strategy. Based on considerations on the sphere, we derive two approaches for adaptive population sizing. We then test these approaches empirically on the sphere model using a normalized mutation strength and cumulative mutation strength adaption. Finally, we compare the methodology on more general functions with a fixed population, covariance matrix adaption evolution strategy (CMA-ES). The results confirm that our adaptive population sizing methods yield better results than even the best fixed population size.

1  Introduction

Evolutionary algorithms (EAs) usually have a number of parameters that require tuning, such as mutation rate, crossover rate, population size, or selection pressure. How to set these parameters properly has been subject to a lot of research over the past 20 years. Approaches range from offline methods that try to determine the most suitable parameters on a set of training problem instances before the algorithm is applied to the actual problem instance, to online methods that adapt the parameters during the optimization run. A good overview and case study can be found in Eiben et al. (2007).

Determining a proper population size for an evolution strategy (ES) is still a challenge. This paper proposes two new mechanisms to adaptively control the parental population parameter of a ES based on the location and fitness values of the generated offspring. The first mechanism derives an approximation for the quality gain depending on the population size of a sphere fitness function, and then chooses the population size that maximizes this approximation. The second mechanism estimates the evolutionary directional derivative, and chooses the population size where this becomes maximal. We proceed to show the improvement made by varying the parental population size in three different test environments: (1) the sphere fitness function with normalized mutation strengths, (2) the sphere fitness function with cumulative mutation strength adaption, and (3) the general fitness functions with covariance matrix adaption ES (CMA-ES; Hansen and Ostermeier, 2001).

The research into determining the appropriate population size can be broken down into two main categories offline and online. The necessity of finding good parameter values has led to extensive work on finding the progress rate equations for evolution strategies based on their fitness function, normalized mutation strength, recombination strategy, parental population, and offspring population in a deterministic setting (Beyer, 1993, 1994, 1995). For some simple fitness functions which have progress rate equations, it is a matter of computation to find the population size maximizing the progress rate for a particular ES. In the expected value, the fixed parental population sizes derived from maximizing these formulas have been proven to have the largest progress rates. In applications where the shape of the fitness function is more complex, it is impossible to estimate a priori all of the parameters needed to maximize the progress rate equations. In those cases, attempts at tuning the offspring population parameter of an ES include, for example, the restart CMA-ES by Auger and Hansen (2005) where the population size is increased by a factor of two for each independent restart.

Parameter control, on the other hand, attempts to change the population size based on the information gained during the evolutionary process. The CMA-ES developed by Hansen and Ostermeier (2001) is used by Eiben et al. (2007) as an example of how parameters of an EA can be controlled during evolution. The idea of controlling the population size of an EA has produced very encouraging results in the genetic algorithm (GA) community. An overview of the adaptive population sizing schemes in GAs can be found in Lobo and Lima (2007). Additional work done in this area since 2007 include the following: Affenzeller et al. (2007) adapt the size of the population based on the fitness of potential offspring, Lareo et al. (2009) provide an empirical study on the benefits of decreasing the population size of a GA following a predetermined schedule which is adjusted according to a speed and severity parameter, Hu et al. (2010) vary the population size based on the aim of efficiently using all of the computing power available. Additionally, the genetic programming (GP) class of EAs has also benefitted from the dynamic sizing of the population size. Tomassini et al. (2004) decreases the negative effects of bloat by decreasing the size of the population when the algorithm is progressing and increases the population when the algorithm is in stagnation, while Hu and Banzhaf (2009) adjust the population size based on the ratio of the rate of adaptive genetic substitutions relative to a background silent substitution rate. The overall theme in the GA and GP class of EAs is to adjust the population size so that the algorithm maintains a certain level of search capability, or genetic diversity, while simultaneously remaining as small as possible to conserve the number of fitness function evaluations that are made on a generational basis.

The ES branch of EAs has not seen as much research in controlling the parental population parameter. Hansen et al. (1995) sought to size the offspring population with respect to the estimated local serial progress rate, while Jansen et al. (2005) varied the offspring population size of a -ES based on the success rate of an offspring replacing its parent. It has promising results of correctly tuning the population size to the complexity of the problem. Rowe and Sudholt (2012) found a threshold for the choice of the offspring population size which efficiently searches for unique optimums of a -EA and demonstrated those results on the ONEMAX function. In an attempt to control the parental population parameter, Storch (2008) investigates the strengths and weaknesses of maintaining diversity through its population size on the problem of maximizing pseudo-Boolean functions using a -EA.

Since we attempt to control the size of the parental population in evolution strategies, our research is most closely related to Storch’s work. It is different from the work mentioned above in several ways. First, our optimization searches the real-valued parameter space . Next, we are interested in varying the parental population parameter in an ES that uses intermediate recombination -ES. Our basic assumption is that we can make a small change to a current fixed population size strategy as dictated by the fitness values and mutation vectors of the offspring in order to generate slightly better results. The remainder of the paper is organized into two sections. The variable population size techniques are formulated and explained in Section 2. The methods are tested and the aggregate results are shown in Section 3. The paper concludes with a summary and some ideas for future work.

2  Methods of Population Control

We derive two methods in this section, both using the fitness values and mutation vectors of each generation to dynamically adapt the parent population size. In Section 2.1, we show theoretically the usefulness of adapting the parental population size on a generational basis. By finding the set of mutation vectors which maximizes the quality gain estimate on a generational basis, we are able to use the fitness values and mutation vectors to determine the best parental population size for the sphere fitness function. In Section 2.2, a second strategy is heuristically derived for general fitness functions with varying mutation strength adaption policies.

2.1  Maximizing the Quality Gain on Simple Fitness Functions

The decomposition of the components of the progress vector and the quality gain calculation, shown in Equation (1) below, is derived again for completeness but comes from Arnold and Beyer (2002) and Arnold (2005). In Figure 1, the centroid in parameter space of the top offspring from the previous generation is where the points are known as the parental population of the previous generation and becomes the start point of the current generation. Mutation occurs from the creation of offspring by adding N-dimensional normal vectors with mean 0 and variance so that . Evaluation is accomplished in this section using the sphere fitness function which is the square of the Euclidean distance in parameter space from the point in question to the optimum, so where is the optimum. Selection is accomplished by finding the top offspring (as ordered in the evaluation process); and recombination starts the process over again at the new centroid The progress vector is the step taken during each generation which depends on the number of offspring selected to be parents ( and is found using where is the mutation vector associated with the kth best fitness. In Figure 1, let the distance from the previous generation to the optimal point be the distance and let the current generation’s distance to the optimum be .

Figure 1:

This shows one generational step toward the optimum. The components of the z vector have been broken into the amount of the vector, z, that points in the direction and the orthogonal component to that vector, z. As such, .

Figure 1:

This shows one generational step toward the optimum. The components of the z vector have been broken into the amount of the vector, z, that points in the direction and the orthogonal component to that vector, z. As such, .

Geometrically, the quality gain ( can be derived as follows
formula
where zA is the component of the progress vector that points in the direction of the optimum, is the component of the progress vector which is orthogonal to that, and .
formula
1
The goal of changing our population size on a generational basis would be to use the fitness values from the mutation vectors to determine the maximum quality gain attainable by the progress vector, . Our objective is to find the parental population size which maximizes our step toward the optimum.
formula
2
Making a slight extension to Equation (1), we define ri as the distance of the ith mutation to the optimum. Each generation has of these mutations with the following quality gains associated with them
formula
3
where R is the starting distance to the optimum of a generation, is the mutation strength of the ES, is the component of the ith mutation vector that points in the direction of the optimum, is the ith mutation vector and is the number of offspring for the strategy. Using Equation (3) as an estimate for the quality gain for the mutation vectors on a sphere fitness function is a critical assumption which does not hold for general fitness functions.

With the information we obtain during the evolutionary process, it is impossible to calculate from Equation (2). Therefore, we seek an approximation to this measure that can be used to estimate this value for a particular population size. For the rest of this section, we intend to show the relationship between k-means clustering and the objective function in Equation (2). From Beyer (1996), we know that for large dimensional search spaces, roughly one-fourth of the offspring population should become parents in the next generation. For this reason, we begin our analysis by clustering the fitness values into groups of four. Our intention is not to simply cluster the fitness values into four groups and set the population size equal to the cardinality of the best group, but to use the information gained by clustering the fitness values to find an optimal which maximizes the objective function in Equation (1).

We now introduce the following definitions and notation from Krishna and Murty (1999) for the clustering algorithm which have been simplified from the more general results of their work. The main objective is to put like quality gains together by minimizing the total within cluster variation (TWCV). The main equations and general clustering notation are
formula
4
formula
5
By ordering the quality gains associated with the ith mutation vector , we simplify Equation (4):
formula
6
To simplify notation, we label , , and , so that ; is the number of elements in the first set; is the number of elements in the second set; is the number of elements in the third set; and is the number of elements in the last set.
Figure 2 is a graphical representation of clustering the quality gains into the top two groups. If the quality gain estimates were the only consideration, then finding the matrix which minimizes the TWCV would give a variable population size of . Since we would like to consider the length of the mutation and progress vectors as well, we continue with the derivation. Assuming that the best population size is relatively close to , the sum remains unchanged since moving members from group 1 to group 2 or from group 2 to group 1 would not change S3 or S4. In our comparison, we are only concerned with the simplified version of Equation (6):
formula
7
Expanding the general equation for the TWCV in Equation (7) we get
formula
Since the first two terms do not change regardless of our choice for , we treat them as constants and define as
formula
8
so that
formula
using the definition of c2, , we substitute to get:
formula
We now substitute Equation (3) for di and simplify to get:
formula
9
Since , we could get a lower bound to the value that we are trying to maximize in Equation (2). Instead of using a lower bound, we opt to use a few more calculations to find an approximation to the quality gain. As discussed earlier, it is impossible to find zA from the information gained during the evolutionary process. Using Equation (9), we find an estimate for and then subtract the square of the mutation strength times the square of the norm of the progress vector to get:
formula
10
Equation (10) gives us an approximation for the quality gain in Equation (2). We can maximize this quantity with the information gained during the evolutionary process in order to get a parental population size that maximizes the fitness gain of simple fitness functions.
Figure 2:

Grouping using k-means clustering.

Figure 2:

Grouping using k-means clustering.

2.2  Maximizing the Quality Gain on General Fitness Functions

In the previous section, a major assumption was that the quality gain could be calculated using Equation (3), which was shown geometrically. In this section, we no longer consider objective functions in which Equation (3) is true. Therefore, we seek another method of determining the best parental population size to use each generation. For general fitness functions, we equate changing with changing the direction in which our optimization proceeds (see Figure 3). This leads to the definition of an evolutionary directional derivative as the change in fitness over the length of the progress vector. For a particular parental population size , the directional derivative can be estimated by
formula
where and is the kth best of mutation vectors ordered by their fitness function values. The current fitness value is estimated using the average of all the offspring fitness values or . An estimate of the fitness associated with the progress vector is the average of the top fitness values or where is the kth best of fitness values. By finding the difference of these two values, we estimate the fitness gain for the progress vector. The length of the progress vector, or the distance covered from one generation to the next, is . In this context, we estimate the directional derivative using the equation
formula
where is the evolutionary directional derivative estimate with respect to the parental population size.
Figure 3:

The direction that the optimization proceeds depends on the size of the parental population. The dashed vector represents choosing only one offspring, while the solid vector represents choosing the top two offspring.

Figure 3:

The direction that the optimization proceeds depends on the size of the parental population. The dashed vector represents choosing only one offspring, while the solid vector represents choosing the top two offspring.

Finding the estimated directional derivatives for the potential values of and choosing the parental population size which maximizes these estimates is our evolutionary version of the gradient. Since the mutation vectors are realizations from random vectors and the intermediate recombination of these mutation vectors restricts the possible directions, we know that it is highly unlikely that the direction chosen corresponds to the actual gradient. Instead, we define the direction which maximizes the estimated directional derivative with respect to the information gained during the evolutionary process as the evolutionary gradient.

As the size of the parental population increases, the difference in the numerator of (or fitness gain) gets smaller, while the length of the progress vector also gets smaller. Maximizing the numerator supports keeping the parental population size as small as possible since the offspring fitness values are ordered. Minimizing the denominator supports allowing the parental population size to grow as large as possible (to a certain point) because of what Beyer termed “genetic repair” (Beyer, 1995). An issue with allowing the population size to grow without bound is that the mutation vectors which point in the opposite direction of the optimum (which are not desirable) have a significant and negative effect on the ratio . Therefore, large offspring population sizes would be detrimental to the progress of the evolution strategy. To remedy this, we limit the size of the parental population to the interval . By enumerating all of the sensible population sizes and finding the population size that maximizes the evolutionary directional derivative, we select the population size which maximizes
formula
11
where = the generational population size which maximizes the directional derivative estimate. Equation (11) finds the best parental population size on a generational basis which has the largest increase in estimated fitness gain per distance covered in parameter space.

3  Experimental Verification of the Variable Population Size

The purpose of this section is to test the validity of our variable population size strategy against a fixed population ES in a noiseless environment under the following conditions: (1) sphere fitness function and normalized mutation strengths, (2) sphere fitness function and cumulative mutation strength adaption, and (3) general fitness functions and CMA-ES. In Section 3.1, we compare the fixed population evolution strategy of the sphere fitness function with normalized mutation strengths in order to take the mutation strength adaption mechanism out of the problem. This allows us to remove some of the randomness by comparing one step of the optimization using the same mutation vectors for both strategies. In Section 3.2, we compare the fixed and variable population strategies using the sphere fitness function and cumulative mutation strength adaption as described by Arnold (2002). In this experiment, we choose a starting mutation strength and let the adaption mechanism change its value. This allows observation of multiple generations of the evolution process to better understand whether the variable population strategy has an effect on the progress rate of a standard ES. In Section 3.3, we test the variable population size strategy in a more general setting with small offspring population sizes and a CMA-ES strategy as described by Hansen and Ostermeier (2001).

3.1  Sphere Function with Normalized Mutation Strength

In this section, we present results which verify the usefulness of a variable population size strategy. We hold as many of the evolutionary parameters constant as possible and use the same mutation vectors for a one step comparison of the standard () ES versus the variable population strategy. This is done so that we are able to isolate the part of the algorithm that we propose to change. At this point, the only difference in the fixed population and variable population strategies for this one step analysis is the number of mutation vectors that are used in intermediate recombination. The one step distance (OSD) experiment is conducted in MATLAB and is described in Algorithm 1.

formula

We chose to use a standard and let the parameter dimension be 40 in order to use similar settings to Arnold and Beyer (2002). Using Beyer’s progress rate equations, we find that is a good setting for a wide range of mutation strengths.

After 5,000 repetitions of the OSD procedure for a (6/6,32)-ES compared with the variable population strategy described in Section 2.1, Figure 4 shows the average difference in the distance to the optimum for a fixed population strategy and a variable population strategy. The line represents how much closer, on average, the variable population strategy progressed toward the optimum for the mutation strengths 0.5, 1, after one generational step.

Figure 4:

Variable population versus standard evolution strategy (difference in distance to optimum, with difference greater zero meaning the variable population size performs better).

Figure 4:

Variable population versus standard evolution strategy (difference in distance to optimum, with difference greater zero meaning the variable population size performs better).

Figure 4 demonstrates that for every normalized mutation strength between 0.5 and 11, the variable population size method (VP1 from Section 2.1) is better than the standard ES with a fixed population size of 6. This variable population strategy is used for the remaining experiments on the simple sphere function. The variable population size strategy from Section 2.2 (VP2) does not perform as well as the variable population size strategy from Section 2.1. This is due to the fact VP1 is designed specifically for the sphere fitness function and VP2 is for the general fitness functions. In order to get an idea of what the variable population strategy (VP1) accomplishes with respect to its normalized mutation strength, we provide a histogram of the population size distribution for small, close to optimal, and large normalized mutation strengths in Figure 5. This graph shows the number of times (frequency) that VP1 chooses a particular population size over the 5,000 replications, for each normalized mutation strength that is given. Figure 6 shows the mean of the population size distributions in Figure 5 as well as all of the mutation strengths considered in the experiment.

Figure 5:

The distribution of population sizes for the variable population for small, close to optimal, and large normalized mutation strengths. Frequency is the number of times, out of 5,000 replications, that a particular population size was chosen for each normalized mutation strength.

Figure 5:

The distribution of population sizes for the variable population for small, close to optimal, and large normalized mutation strengths. Frequency is the number of times, out of 5,000 replications, that a particular population size was chosen for each normalized mutation strength.

Figure 6:

Average population size by mutation strength.

Figure 6:

Average population size by mutation strength.

The smaller mutation strengths result in smaller average population sizes and the larger mutation strengths result in larger average population sizes, consistent with Beyer’s theoretical results. The average population size in Figure 6 over the 5,000 one step computations coincides with the optimal values from Beyer’s equations. For example, we find that for a mutation strength of 5.5, the best population size is 6. The average population size in Figure 6 for a mutation strength of 5.5 is approximately 6. The difference is that our average population size is not constrained to an integer, and it can vary depending on the mutation vectors and fitness values.

Instead of comparing one fixed population strategy against a variable population strategy, we use Table 1 to compare all of the viable fixed population strategies against a variable one for the normalized mutation strengths between 0.5 and 11. The column listed as VP is the average one step distance to optimum for the VP strategy, the column listed as VP pop size is the average population size that the strategy chose, and the columns labeled 2, are the average one step distance to optimum for a fixed ES strategy of that parental population size and said mutation strengths. The table also highlights the best fixed population strategy for each mutation strength noting that the VP strategy is closer to the optimum than the best fixed population strategy at every normalized mutation strength. The VP strategy is robust with respect to the mutation strength selected, since there are three mutation strengths that seem to do equally well compared with most fixed population sizes where only one or two mutation strengths are close to optimal.

Table 1:
Average one step distance to optimum with given mutation strength and population size for a (/, 32)-ES. VP is the variable population size strategy, while 2, are the fixed population size ( strategies.
graphic
graphic
The final test in this section consists of using the progress rate formula (Equation (38) from Beyer, 1995) in order to numerically compute the normalized progress rate and optimal settings for an evolution strategy that has 32 offspring and 40 dimensions.
formula

Using Mathematica, we find that and are good settings for a search space dimension of 40. Table 1 also confirms that these parameter settings are good. Using the equation, we find that . We compare a variable population strategy to the optimally chosen fixed population ES. After 50,000 one step comparisons, we get with a standard error of 0.007187, compared to with a standard error of 0.007165. We use these results for two purposes. First, we verify that the fixed population progress rate is approximately equal to Beyer’s formula. Since we use numerical integration for the constant, , we consider these two approximations close. Second, we compare the variable population approach to the fixed population and find that there is an approximate 15% increase in the progress rate over one of the best fixed population strategies. In the next section, we test on a more realistic mutation strength adaption strategy over a number of generations while utilizing the same mutation vectors.

3.2  Sphere Fitness Function with Cumulative Mutation Strength Adaption

In this section, we take the next step in our investigation of the variable population strategy by testing to see whether the strategy will increase the progress rate over the course of a number of generations. Instead of relying on a normalized mutation strength which is based on the actual distance from the optimum, we now test the variable population strategy using a cumulative mutation strength adaption (CMSA) as introduced by Hansen and Ostermeier and described by Arnold (2002). This will allow us to observe multiple steps of the evolutionary path in order to determine whether the strategy is capable of progressing toward the optimum of the sphere fitness function faster than a fixed population ES.

The difference between this experiment and the OSD procedure is how we calculate the mutation strength and the number of generations that we let it run. In this test, we use an initial mutation strength and change it based on the selected mutation vectors chosen to move forward in the next generation. The basic idea as described by Arnold is if consecutive steps are parallel, then increase the mutation strength and decrease the mutation strength if consecutive steps are anti-parallel. Since both strategies are converging to the optimum, we look at the first 20 generations since the most significant gains are made in these steps. Algorithm 2, the generation (GEN) algorithm, is used to test the variable population strategy in multiple generations with a standard mutation strength adaption technique.

formula

Despite using the same mutation vectors, Algorithm 2 allows for two separate evolutionary paths for the fixed and variable population ES. The CMSA process has = the accumulated progress vector and is used in this calculation with and = initial mutation strength for both the fixed and variable population strategies. The constant c determines the influence of the history much like the exponential smoothing constant, while D is a damping constant. The constants c and D are set to and , respectively, according to Hansen and Ostermeier (1996) and Arnold (2002).

Since selecting the wrong mutation strength can be detrimental in the analysis of evolution strategies, we determine the initial mutation strength that has good progress toward the optimum point. This is done by averaging the distance from the optimum of 100 replications of a 20 generation run for various starting mutation strengths. Algorithm GEN is used except that in this case we use a fixed starting point of [100] and the fixed strategy of (6/6,32)-ES is used since it was one of the best in the normalized mutation strength case. The results are summarized in Figure 7. The error bars in Figure 7 represent the standard error associated with 1,000 twenty-generation replications.

Figure 7:

Distance from optimum after 20 generations for a (6/6,32)-ES and VP strategy with standard error.

Figure 7:

Distance from optimum after 20 generations for a (6/6,32)-ES and VP strategy with standard error.

Figure 7 shows that a mutation strength between 50–70 is a good initial mutation strength for this fixed population ES problem. It also shows that initial mutation strength of 30 is less than optimal, while 100 is greater than optimal. We test the variable population ES compared to the fixed population strategy for mutation strengths near, above, and below the optimal. We use the values of 30, 60, and 100 as initial mutation strengths to compare the standard ES and variable population ES on a generational basis. Figures 8–10 show the average difference in the distance to the optimum of 1,000 runs over 100 generations for the three different starting mutation strengths. The relative difference between the ES distance and the variable population distance is calculated using the following equation
formula
where the average distance to the optimum of a fixed population evolution strategy at generation g; the average distance to the optimum of the variable population strategy at generation g; and NumGen the number of generations to test. The relative distance is used because both methods converge to the optimum which results in the absolute difference being very close to zero.
Figure 8:

Relative and actual difference in distance to optimum by generation ( = 30) for a (6/6,32)-ES.

Figure 8:

Relative and actual difference in distance to optimum by generation ( = 30) for a (6/6,32)-ES.

Figure 9:

Relative and actual difference in distance to optimum by generation ( = 60) for a (6/6,32)-ES.

Figure 9:

Relative and actual difference in distance to optimum by generation ( = 60) for a (6/6,32)-ES.

Figure 10:

Relative and actual difference in distance to optimum by generation ( = 100) for a (6/6,32)-ES.

Figure 10:

Relative and actual difference in distance to optimum by generation ( = 100) for a (6/6,32)-ES.

Figures 8, 9, and 10 show that for varying initial mutation strengths, the variable population size strategies perform better on the sphere fitness function with a cumulative mutation strength adaption for every generation of the evolutionary cycle. The relative distance reaches a steady state due to the fact that both strategies are converging after a number of generations. The steady state value that it converges to is influenced by the initial mutation strength. Figure 7 shows a much greater difference in the convergence for a mutation strength of 30 than it does for 100. Therefore, we expect the steady state relative distance to be larger for a starting mutation strength of 30, which it is.

A final test on the noiseless sphere with cumulative mutation strength adaption is to ensure that the best fixed population size does not perform better than the variable population strategy for the experimental conditions (, N = 40, CMSA, noiseless sphere fitness function, ). We search for the best possible parental population size by running Algorithm 2 on a fixed population ES from to .

Figure 11 is a graph of the average distance to the optimum after 20 generations of 100 runs each with different population sizes comparing the results to the variable population strategy. This experiment is conducted to ensure that the results are better than any fixed parental population size with the given experimental conditions. For the conditions used, 4 is the best strategy for the first 20 generations, but it is still not better than the variable population method with an initial mutation strength of 60. Figure 12 is a histogram of the parental population distribution associated with the variable population strategy in Figure 11. This histogram has 16,000 separate population sizes and shows that the sizes were spread out evenly over many different possibilities. The frequency referred to in Figure 12 is the relative frequency that a particular population size is chosen by the strategy. This shows that the population sizes that the variable strategy chooses are not concentrated on a narrow subset of numbers, but have a large range of possibilities, implying the importance of choosing the size correctly.

Figure 11:

ESs with CMSA and a starting mutation strength of 60.

Figure 11:

ESs with CMSA and a starting mutation strength of 60.

Figure 12:

Histogram with relative frequency of population sizes chosen by variable population ES.

Figure 12:

Histogram with relative frequency of population sizes chosen by variable population ES.

3.3  General Fitness Function with CMA-ES

The fitness functions used in this section are more general than the sphere. For these general fitness functions, Equation (3) from Section 2.1 can no longer be used as an approximation for the quality gain of an individual mutation vector, which was a key assumption in the derivation of Equation 10. Therefore, the variable population strategy used in the comparisons that follow is the evolutionary gradient version of Section 2.2. The fitness functions which are used to test the variable population strategy are listed in Table 2.

Table 2:
Test functions.
NameFunctionInitial point
Cigar   
Ellipsoid   
Rosenbrock   0.1 
Rastrigin   2.5 
NameFunctionInitial point
Cigar   
Ellipsoid   
Rosenbrock   0.1 
Rastrigin   2.5 

The mutation vector adaption policy used for these general fitness functions is the CMA-ES. The starting mutation strength and starting point are from the MATLAB code in Hansen and Ostermeier (2001) for the cigar fitness function, the MATLAB code in Hansen (2011) for the ellipsoid fitness function, and Table 1 in Hansen and Kern (2004) for the rastrigin fitness function.

Hansen and Ostermeier (2001) give default parameter values for the offspring population size parental population size , and logarithm-based weights for a given dimensionality of the problem. The parameter values are summarized in Table 3.

Table 3:
Parameter values for specific search space dimensions.
Nwi
 
 
10 10  
Nwi
 
 
10 10  
The values from Table 3 are used in weighted recombination of the top offspring to get the start point of the next generation
formula
where is the ith best of offspring of generation g. The population sizes in Table 3 are for the fixed strategy. The variable population size is computed using Equation 11, the fitness values, and mutation vectors for each generation. The variable population size uses a weighting scheme similar to the one in the CMA-ES paper .
We compare a fixed population -CMA-ES to a variable population -CMA-ES on a cigar fitness function using the same mutation vectors each generation, and the single difference between the two algorithms is the number of parents used in weighted recombination. The results in Figure 13 use the absolute difference in fitness values and the relative fitness difference defined as
formula
where is the average fitness of the fixed population ES at generation g, and is the average fitness of the variable population ES at generation g. The relative fitness difference is needed because both strategies are quickly converging to the optimum, which causes the difference between their fitness values to go to zero. Since we want to plot the differences on the same graph, we use the relative fitness difference so that we normalize the scale. After running 1,000 replications of 100 generations of Algorithm 3, the results are summarized in Figure 13 for the cigar function.
formula
Figure 13:

Fitness difference for the cigar fitness function () with a CMA-ES and a variable population version.

Figure 13:

Fitness difference for the cigar fitness function () with a CMA-ES and a variable population version.

Figure 13 shows that the variable population strategy performs well over the first 100 generations. Next, we determine the number of function evaluations required to reach a fitness value less than for each strategy in low dimensional search spaces.

Table 4 shows that the results of the first 100 generations carries forward to longer running comparisons of the two strategies. On average it takes fewer function evaluations for the variable population strategy to reach its goal for the three tested search space dimensionalities than the fixed population version.

Table 4:
Average number of function evaluations for a fixed and variable population CMA-ES on a cigar fitness function to reach .
(,)-CMA-ES(VP,)-CMA-ES
NAverageSEAverageSE
701.90 1.76 681.58 1.60 
2,154.35 2.87 2,085.22 2.67 
10 4,506.07 4.10 4,441.62 3.71 
(,)-CMA-ES(VP,)-CMA-ES
NAverageSEAverageSE
701.90 1.76 681.58 1.60 
2,154.35 2.87 2,085.22 2.67 
10 4,506.07 4.10 4,441.62 3.71 

For the ellipsoid and Rosenbrock fitness functions, we produce the same figure except we use 200 generations instead of 100. Figures 14 and 15 represent an average of 1,000 iterations of 200 generations, graphing both the absolute and relative difference between each strategy. Again, we determine the number of function evaluations that are necessary for the fixed and variable population CMA-ES to reach a fitness level below . The three dimensions tested are listed in Tables 5 and 6. For the Rosenbrock function, not all of the runs reached a fitness level of in 1,000 generations, so Table 6 also gives the number of times out of 1,000 that this occurred.

Figure 14:

Fitness difference for the ellipsoid fitness function () with a CMA-ES and a variable population version.

Figure 14:

Fitness difference for the ellipsoid fitness function () with a CMA-ES and a variable population version.

Figure 15:

Fitness difference for the Rosenbrock fitness function () with a CMA-ES and a variable population version.

Figure 15:

Fitness difference for the Rosenbrock fitness function () with a CMA-ES and a variable population version.

Table 5:
Average number of function evaluations for a fixed and variable population CMA-ES on an ellipsoid fitness function to reach .
(,)-CMA-ES(VP,)-CMA-ES
NAverageSEAverageSE
704.98 1.73 683.36 1.63 
2,554.22 3.64 2,469.81 3.39 
10 7,858.55 6.01 7,669.32 5.75 
(,)-CMA-ES(VP,)-CMA-ES
NAverageSEAverageSE
704.98 1.73 683.36 1.63 
2,554.22 3.64 2,469.81 3.39 
10 7,858.55 6.01 7,669.32 5.75 
Table 6:
Average number of function evaluations for a fixed and variable population CMA-ES on a Rosenbrock fitness function to reach . Converge is the number of times out of 1,000 that the strategy reached after 1,000 generations.
 (,)-CMA-ES (VP,)-CMA-ES 
N Average SE Converge Average SE Converge 
649.94 3.07 1,000.00 624.58 2.90 1,000.00 
2,121.47 30.15 977.00 1,911.30 19.27 992.00 
10 6,090.99 67.67 918.00 4,984.14 58.65 957.00 
 (,)-CMA-ES (VP,)-CMA-ES 
N Average SE Converge Average SE Converge 
649.94 3.07 1,000.00 624.58 2.90 1,000.00 
2,121.47 30.15 977.00 1,911.30 19.27 992.00 
10 6,090.99 67.67 918.00 4,984.14 58.65 957.00 

The last fitness function that we test is a multimodal problem which we do not expect to converge unless we use the rank- update version of CMA-ES and a restart strategy with an increasing offspring population similar to the one in Hansen and Kern (2004). Our intent is to show an improvement of a variable population strategy over a fixed population strategy and not necessarily its convergence to the optimum. Therefore, we show the actual fitness values for the fixed and variable population CMA-ES for weighted and intermediate recombination. Figure 16 shows the variable population CMA-ES strategy with intermediate recombination performs better in the short term on the Rastrigin fitness function than the others. We do not say anything about the number of function evaluations needed to reach a certain level since there is an extremely small chance that this strategy converges with such a small offspring population size.

Figure 16:

Fitness values for (2, 10)-CMA-ES, (5,10)-CMA-ES, (VP2-5,10)-CMA-ES, and (VP2-5,10)-CMA-ES on a Rastrigin fitness function starting at [5].

Figure 16:

Fitness values for (2, 10)-CMA-ES, (5,10)-CMA-ES, (VP2-5,10)-CMA-ES, and (VP2-5,10)-CMA-ES on a Rastrigin fitness function starting at [5].

4  Conclusions

The main purpose of this paper was to demonstrate the benefit of adapting the parent population size each generation based on the sampled fitness values, and two adaptation mechanisms were proposed. It has been shown that the adapting the parent population size increases the progress compared to even the best fixed parent population size on the sphere function with normalized mutation strengths, and using a CMSA adaption mechanism. Additionally, tests on general unimodal and multimodal functions using a CMA-ES with weighted and intermediate recombination has been shown to improve fitness values and convergence over its fixed population counterpart. The conclusion is that compared to a fixed population ES, the variable population ES is able to make modest improvements in fitness gain over several generations.

Future work in this area includes expanding the deterministic settings to determine how increasing the dimensionality of the search space and changing the number of offspring affect the results. More research is needed on a hybrid method which utilizes the strength of both algorithms depending on the objective function and determining whether a variable population strategy can be used to optimize stochastic fitness functions.

Acknowledgments

This work has been supported in part by National Science Foundation under Award CMMI-1233376, Department of Energy under Award DE-SC0002223, NIH under Grant 1R21DK088368-01, and National Science Council of Taiwan under Award NSC-100-2218-E-002-027-MY3.

References

Affenzeller
,
M. S.
,
Wagner
,
S.
, and
Winkler
,
S
. (
2007
).
Self-adaptive population size adjustment for genetic algorithms
. In
Proceedings of the International Conference on Computer Aided Systems Theory
, pp.
820
828
.
Arnold
,
D
. (
2002
).
Noisy optimization with evolution strategies
.
Dordrecht, The Netherlands
:
Kluwer Academic Publishers
.
Arnold
,
D.
(
2005
).
Optimal weighted recombination
. In
K. A.
DeJong
and
L. M.
Schmitt
(Eds.),
Foundations of genetic algorithms. Lecture notes in computer science
,
Vol. 3469 (pp. 215–237). Berlin
:
Springer-Verlag
.
Arnold
,
D.
, and
Beyer
,
H
. (
2002
).
Performance analysis of evolution strategies with multi-recombination in high-dimensional rn search spaces disturbed by noise
.
Theoretical Computer Science
,
289
(
1
):
629
647
.
Auger
,
A.
, and
Hansen
,
N.
(
2005
).
A restart CMA evolution strategy with increasing population size
. In
Proceedings of the Congress on Evolutionary Computation
,
1769
1776
.
Beyer
,
H.-G
. (
1993
).
Toward a theory of evolution strategies: Some asymptotical results form the -theory
.
Evolutionary Computation
,
1
(
2
):
165
188
.
Beyer
,
H.-G
. (
1994
).
Toward a theory of evolution strategies: The -theory
.
Evolutionary Computation
,
2
(
4
):
381
407
.
Beyer
,
H.-G
. (
1995
).
Toward a theory of evolution strategies: On the benefits of sex—The -theory
.
Evolutionary Computation
,
3
(
1
):
81
111
.
Beyer
,
H.-G.
(
1996
).
On the asymptotic behavior of multirecombinant evolution strategies
. In
Parallel problem solving from nature (PPSN). Lecture notes in computer science
, Vol.
1141
(pp. 
122
133
).
Berlin
:
Springer-Verlag
.
Eiben
,
G.
,
Michalewicz
,
Z.
,
Schoenauer
,
M.
, and
Smith
,
J.
(
2007
).
Parameter control in evolutionary algorithms
. In
Proceedings of Parameter Setting in Evolutionary Algorithms
,
19
46
.
Hansen
,
N.
(
2011
).
The CMA evolution strategy: A tutorial. Retrieved from
https://www.lri.fr/∼hansen/cmatutorial.pdf
Hansen
,
N.
,
Gawelczyk
,
A.
, and
Ostermeier
,
A.
(
1995
).
Sizing the population with respect to the local progress in evolution strategies—A theoretical analysis
. In
Proceedings of the International Conference on Evolutionary Computation
,
80
85
.
Hansen
,
N.
, and
Kern
,
S.
(
2004
).
Evaluating the CMA evolution strategy on multimodal test functions
. In
Parallel problem solving from nature (PPSN). Lecture notes in computer science
, Vol. 
3242
(pp.
282
291
).
Berlin
:
Springer-Verlag
.
Hansen
,
N.
, and
Ostermeier
,
A.
(
1996
).
Adapting arbitrary nomal mutation distributions in evolution strategies: The covariance matrix adaptation
. In
Proceedings of the International Conference on Evolutionary Computation
,
312
317
.
Hansen
,
N.
, and
Ostermeier
,
A
. (
2001
).
Completely derandomized self-adaptation in evolution strategies
.
Evolutionary Computation
,
9
(
2
):
159
195
.
Hu
,
T.
, and
Banzhaf
,
W.
(
2009
).
The role of population size in rate of evolution in genetic programming
. In
Proceedings of the European Conference on Genetic Programming. Lecture notes in computer science
, Vol.
5481
(pp.
85
96
).
Berlin
:
Springer-Verlag
.
Hu
,
T.
,
Harding
,
S.
, and
Banzhaf
,
W.
(
2010
).
Variable population size and evolution acceleration: A case study with a parallel evolutionary algorithm
.
Genetic Programming and Evolvable Machines
,
11
(
2
):
205
225
.
Jansen
,
T.
,
DeJong
,
K. A.
, and
Wegener
,
I
. (
2005
).
On the choice of the offspring population size in evolutionary algorithms
.
Evolutionary Computation
,
13
(
4
):
413
440
.
Krishna
,
K.
, and
Murty
,
N
. (
1999
).
Genetic k-means algorithm
.
Systems, Man and Cybernetics Part B
,
29
(
3
):
433
439
.
Lareo
,
J. L.
,
Fernandes
,
C.
,
Merelo
,
J. J.
, and
Gagne
,
C.
(
2009
).
Improving genetic algorithms performance via deterministic population shrinkage
. In
Proceedings of the Genetic and Evolutionary Computation Conference
,
819
826
.
Lobo
,
F. G.
, and
Lima
,
C. F.
(
2007
).
Adaptive population sizing schemes in genetic algorithms
. In
F. G.
Lobo
,
C. F.
Lima
, and
Z.
Michalewicz
(Eds.),
Parameter Setting in Evolutionary Algorithms
, (pp.
185
204
).
Berlin
:
Springer-Verlag
.
Rowe
,
J. E.
, and
Sudholt
,
D.
(
2012
).
The choice of the offspring population size in the (1,) EA
. In
Proceedings of the Genetic and Evolutionary Computation Conference
,
1349
1356
.
Storch
,
T
. (
2008
).
On the choice of the parent population size
.
Evolutionary Computation
,
16
(
4
):
557
578
.
Tomassini
,
M.
,
Vanneschi
,
L.
,
Cuendet
,
J.
, and
Fernandez
,
F.
(
2004
).
A new technique for dynamic size populations in genetic programming
. In
Proceedings of the Congress on Evolutionary Computation
,
486
493
.