Adaptive population sizing aims at improving the overall progress of an evolution strategy. At each generation, it determines the parental population size that promises the largest fitness gain, based on the information collected during the evolutionary process. In this paper, we develop an adaptive variant of a evolution strategy. Based on considerations on the sphere, we derive two approaches for adaptive population sizing. We then test these approaches empirically on the sphere model using a normalized mutation strength and cumulative mutation strength adaption. Finally, we compare the methodology on more general functions with a fixed population, covariance matrix adaption evolution strategy (CMA-ES). The results confirm that our adaptive population sizing methods yield better results than even the best fixed population size.
Evolutionary algorithms (EAs) usually have a number of parameters that require tuning, such as mutation rate, crossover rate, population size, or selection pressure. How to set these parameters properly has been subject to a lot of research over the past 20 years. Approaches range from offline methods that try to determine the most suitable parameters on a set of training problem instances before the algorithm is applied to the actual problem instance, to online methods that adapt the parameters during the optimization run. A good overview and case study can be found in Eiben et al. (2007).
Determining a proper population size for an evolution strategy (ES) is still a challenge. This paper proposes two new mechanisms to adaptively control the parental population parameter of a ES based on the location and fitness values of the generated offspring. The first mechanism derives an approximation for the quality gain depending on the population size of a sphere fitness function, and then chooses the population size that maximizes this approximation. The second mechanism estimates the evolutionary directional derivative, and chooses the population size where this becomes maximal. We proceed to show the improvement made by varying the parental population size in three different test environments: (1) the sphere fitness function with normalized mutation strengths, (2) the sphere fitness function with cumulative mutation strength adaption, and (3) the general fitness functions with covariance matrix adaption ES (CMA-ES; Hansen and Ostermeier, 2001).
The research into determining the appropriate population size can be broken down into two main categories offline and online. The necessity of finding good parameter values has led to extensive work on finding the progress rate equations for evolution strategies based on their fitness function, normalized mutation strength, recombination strategy, parental population, and offspring population in a deterministic setting (Beyer, 1993, 1994, 1995). For some simple fitness functions which have progress rate equations, it is a matter of computation to find the population size maximizing the progress rate for a particular ES. In the expected value, the fixed parental population sizes derived from maximizing these formulas have been proven to have the largest progress rates. In applications where the shape of the fitness function is more complex, it is impossible to estimate a priori all of the parameters needed to maximize the progress rate equations. In those cases, attempts at tuning the offspring population parameter of an ES include, for example, the restart CMA-ES by Auger and Hansen (2005) where the population size is increased by a factor of two for each independent restart.
Parameter control, on the other hand, attempts to change the population size based on the information gained during the evolutionary process. The CMA-ES developed by Hansen and Ostermeier (2001) is used by Eiben et al. (2007) as an example of how parameters of an EA can be controlled during evolution. The idea of controlling the population size of an EA has produced very encouraging results in the genetic algorithm (GA) community. An overview of the adaptive population sizing schemes in GAs can be found in Lobo and Lima (2007). Additional work done in this area since 2007 include the following: Affenzeller et al. (2007) adapt the size of the population based on the fitness of potential offspring, Lareo et al. (2009) provide an empirical study on the benefits of decreasing the population size of a GA following a predetermined schedule which is adjusted according to a speed and severity parameter, Hu et al. (2010) vary the population size based on the aim of efficiently using all of the computing power available. Additionally, the genetic programming (GP) class of EAs has also benefitted from the dynamic sizing of the population size. Tomassini et al. (2004) decreases the negative effects of bloat by decreasing the size of the population when the algorithm is progressing and increases the population when the algorithm is in stagnation, while Hu and Banzhaf (2009) adjust the population size based on the ratio of the rate of adaptive genetic substitutions relative to a background silent substitution rate. The overall theme in the GA and GP class of EAs is to adjust the population size so that the algorithm maintains a certain level of search capability, or genetic diversity, while simultaneously remaining as small as possible to conserve the number of fitness function evaluations that are made on a generational basis.
The ES branch of EAs has not seen as much research in controlling the parental population parameter. Hansen et al. (1995) sought to size the offspring population with respect to the estimated local serial progress rate, while Jansen et al. (2005) varied the offspring population size of a -ES based on the success rate of an offspring replacing its parent. It has promising results of correctly tuning the population size to the complexity of the problem. Rowe and Sudholt (2012) found a threshold for the choice of the offspring population size which efficiently searches for unique optimums of a -EA and demonstrated those results on the ONEMAX function. In an attempt to control the parental population parameter, Storch (2008) investigates the strengths and weaknesses of maintaining diversity through its population size on the problem of maximizing pseudo-Boolean functions using a -EA.
Since we attempt to control the size of the parental population in evolution strategies, our research is most closely related to Storch’s work. It is different from the work mentioned above in several ways. First, our optimization searches the real-valued parameter space . Next, we are interested in varying the parental population parameter in an ES that uses intermediate recombination -ES. Our basic assumption is that we can make a small change to a current fixed population size strategy as dictated by the fitness values and mutation vectors of the offspring in order to generate slightly better results. The remainder of the paper is organized into two sections. The variable population size techniques are formulated and explained in Section 2. The methods are tested and the aggregate results are shown in Section 3. The paper concludes with a summary and some ideas for future work.
2 Methods of Population Control
We derive two methods in this section, both using the fitness values and mutation vectors of each generation to dynamically adapt the parent population size. In Section 2.1, we show theoretically the usefulness of adapting the parental population size on a generational basis. By finding the set of mutation vectors which maximizes the quality gain estimate on a generational basis, we are able to use the fitness values and mutation vectors to determine the best parental population size for the sphere fitness function. In Section 2.2, a second strategy is heuristically derived for general fitness functions with varying mutation strength adaption policies.
2.1 Maximizing the Quality Gain on Simple Fitness Functions
The decomposition of the components of the progress vector and the quality gain calculation, shown in Equation (1) below, is derived again for completeness but comes from Arnold and Beyer (2002) and Arnold (2005). In Figure 1, the centroid in parameter space of the top offspring from the previous generation is where the points are known as the parental population of the previous generation and becomes the start point of the current generation. Mutation occurs from the creation of offspring by adding N-dimensional normal vectors with mean 0 and variance so that . Evaluation is accomplished in this section using the sphere fitness function which is the square of the Euclidean distance in parameter space from the point in question to the optimum, so where is the optimum. Selection is accomplished by finding the top offspring (as ordered in the evaluation process); and recombination starts the process over again at the new centroid The progress vector is the step taken during each generation which depends on the number of offspring selected to be parents ( and is found using where is the mutation vector associated with the kth best fitness. In Figure 1, let the distance from the previous generation to the optimal point be the distance and let the current generation’s distance to the optimum be .
With the information we obtain during the evolutionary process, it is impossible to calculate from Equation (2). Therefore, we seek an approximation to this measure that can be used to estimate this value for a particular population size. For the rest of this section, we intend to show the relationship between k-means clustering and the objective function in Equation (2). From Beyer (1996), we know that for large dimensional search spaces, roughly one-fourth of the offspring population should become parents in the next generation. For this reason, we begin our analysis by clustering the fitness values into groups of four. Our intention is not to simply cluster the fitness values into four groups and set the population size equal to the cardinality of the best group, but to use the information gained by clustering the fitness values to find an optimal which maximizes the objective function in Equation (1).
2.2 Maximizing the Quality Gain on General Fitness Functions
Finding the estimated directional derivatives for the potential values of and choosing the parental population size which maximizes these estimates is our evolutionary version of the gradient. Since the mutation vectors are realizations from random vectors and the intermediate recombination of these mutation vectors restricts the possible directions, we know that it is highly unlikely that the direction chosen corresponds to the actual gradient. Instead, we define the direction which maximizes the estimated directional derivative with respect to the information gained during the evolutionary process as the evolutionary gradient.
3 Experimental Verification of the Variable Population Size
The purpose of this section is to test the validity of our variable population size strategy against a fixed population ES in a noiseless environment under the following conditions: (1) sphere fitness function and normalized mutation strengths, (2) sphere fitness function and cumulative mutation strength adaption, and (3) general fitness functions and CMA-ES. In Section 3.1, we compare the fixed population evolution strategy of the sphere fitness function with normalized mutation strengths in order to take the mutation strength adaption mechanism out of the problem. This allows us to remove some of the randomness by comparing one step of the optimization using the same mutation vectors for both strategies. In Section 3.2, we compare the fixed and variable population strategies using the sphere fitness function and cumulative mutation strength adaption as described by Arnold (2002). In this experiment, we choose a starting mutation strength and let the adaption mechanism change its value. This allows observation of multiple generations of the evolution process to better understand whether the variable population strategy has an effect on the progress rate of a standard ES. In Section 3.3, we test the variable population size strategy in a more general setting with small offspring population sizes and a CMA-ES strategy as described by Hansen and Ostermeier (2001).
3.1 Sphere Function with Normalized Mutation Strength
In this section, we present results which verify the usefulness of a variable population size strategy. We hold as many of the evolutionary parameters constant as possible and use the same mutation vectors for a one step comparison of the standard () ES versus the variable population strategy. This is done so that we are able to isolate the part of the algorithm that we propose to change. At this point, the only difference in the fixed population and variable population strategies for this one step analysis is the number of mutation vectors that are used in intermediate recombination. The one step distance (OSD) experiment is conducted in MATLAB and is described in Algorithm 1.
We chose to use a standard and let the parameter dimension be 40 in order to use similar settings to Arnold and Beyer (2002). Using Beyer’s progress rate equations, we find that is a good setting for a wide range of mutation strengths.
After 5,000 repetitions of the OSD procedure for a (6/6,32)-ES compared with the variable population strategy described in Section 2.1, Figure 4 shows the average difference in the distance to the optimum for a fixed population strategy and a variable population strategy. The line represents how much closer, on average, the variable population strategy progressed toward the optimum for the mutation strengths 0.5, 1, after one generational step.
Figure 4 demonstrates that for every normalized mutation strength between 0.5 and 11, the variable population size method (VP1 from Section 2.1) is better than the standard ES with a fixed population size of 6. This variable population strategy is used for the remaining experiments on the simple sphere function. The variable population size strategy from Section 2.2 (VP2) does not perform as well as the variable population size strategy from Section 2.1. This is due to the fact VP1 is designed specifically for the sphere fitness function and VP2 is for the general fitness functions. In order to get an idea of what the variable population strategy (VP1) accomplishes with respect to its normalized mutation strength, we provide a histogram of the population size distribution for small, close to optimal, and large normalized mutation strengths in Figure 5. This graph shows the number of times (frequency) that VP1 chooses a particular population size over the 5,000 replications, for each normalized mutation strength that is given. Figure 6 shows the mean of the population size distributions in Figure 5 as well as all of the mutation strengths considered in the experiment.
The smaller mutation strengths result in smaller average population sizes and the larger mutation strengths result in larger average population sizes, consistent with Beyer’s theoretical results. The average population size in Figure 6 over the 5,000 one step computations coincides with the optimal values from Beyer’s equations. For example, we find that for a mutation strength of 5.5, the best population size is 6. The average population size in Figure 6 for a mutation strength of 5.5 is approximately 6. The difference is that our average population size is not constrained to an integer, and it can vary depending on the mutation vectors and fitness values.
Instead of comparing one fixed population strategy against a variable population strategy, we use Table 1 to compare all of the viable fixed population strategies against a variable one for the normalized mutation strengths between 0.5 and 11. The column listed as VP is the average one step distance to optimum for the VP strategy, the column listed as VP pop size is the average population size that the strategy chose, and the columns labeled 2, are the average one step distance to optimum for a fixed ES strategy of that parental population size and said mutation strengths. The table also highlights the best fixed population strategy for each mutation strength noting that the VP strategy is closer to the optimum than the best fixed population strategy at every normalized mutation strength. The VP strategy is robust with respect to the mutation strength selected, since there are three mutation strengths that seem to do equally well compared with most fixed population sizes where only one or two mutation strengths are close to optimal.
Using Mathematica, we find that and are good settings for a search space dimension of 40. Table 1 also confirms that these parameter settings are good. Using the equation, we find that . We compare a variable population strategy to the optimally chosen fixed population ES. After 50,000 one step comparisons, we get with a standard error of 0.007187, compared to with a standard error of 0.007165. We use these results for two purposes. First, we verify that the fixed population progress rate is approximately equal to Beyer’s formula. Since we use numerical integration for the constant, , we consider these two approximations close. Second, we compare the variable population approach to the fixed population and find that there is an approximate 15% increase in the progress rate over one of the best fixed population strategies. In the next section, we test on a more realistic mutation strength adaption strategy over a number of generations while utilizing the same mutation vectors.
3.2 Sphere Fitness Function with Cumulative Mutation Strength Adaption
In this section, we take the next step in our investigation of the variable population strategy by testing to see whether the strategy will increase the progress rate over the course of a number of generations. Instead of relying on a normalized mutation strength which is based on the actual distance from the optimum, we now test the variable population strategy using a cumulative mutation strength adaption (CMSA) as introduced by Hansen and Ostermeier and described by Arnold (2002). This will allow us to observe multiple steps of the evolutionary path in order to determine whether the strategy is capable of progressing toward the optimum of the sphere fitness function faster than a fixed population ES.
The difference between this experiment and the OSD procedure is how we calculate the mutation strength and the number of generations that we let it run. In this test, we use an initial mutation strength and change it based on the selected mutation vectors chosen to move forward in the next generation. The basic idea as described by Arnold is if consecutive steps are parallel, then increase the mutation strength and decrease the mutation strength if consecutive steps are anti-parallel. Since both strategies are converging to the optimum, we look at the first 20 generations since the most significant gains are made in these steps. Algorithm 2, the generation (GEN) algorithm, is used to test the variable population strategy in multiple generations with a standard mutation strength adaption technique.
Despite using the same mutation vectors, Algorithm 2 allows for two separate evolutionary paths for the fixed and variable population ES. The CMSA process has = the accumulated progress vector and is used in this calculation with and = initial mutation strength for both the fixed and variable population strategies. The constant c determines the influence of the history much like the exponential smoothing constant, while D is a damping constant. The constants c and D are set to and , respectively, according to Hansen and Ostermeier (1996) and Arnold (2002).
Since selecting the wrong mutation strength can be detrimental in the analysis of evolution strategies, we determine the initial mutation strength that has good progress toward the optimum point. This is done by averaging the distance from the optimum of 100 replications of a 20 generation run for various starting mutation strengths. Algorithm GEN is used except that in this case we use a fixed starting point of  and the fixed strategy of (6/6,32)-ES is used since it was one of the best in the normalized mutation strength case. The results are summarized in Figure 7. The error bars in Figure 7 represent the standard error associated with 1,000 twenty-generation replications.
Figures 8, 9, and 10 show that for varying initial mutation strengths, the variable population size strategies perform better on the sphere fitness function with a cumulative mutation strength adaption for every generation of the evolutionary cycle. The relative distance reaches a steady state due to the fact that both strategies are converging after a number of generations. The steady state value that it converges to is influenced by the initial mutation strength. Figure 7 shows a much greater difference in the convergence for a mutation strength of 30 than it does for 100. Therefore, we expect the steady state relative distance to be larger for a starting mutation strength of 30, which it is.
A final test on the noiseless sphere with cumulative mutation strength adaption is to ensure that the best fixed population size does not perform better than the variable population strategy for the experimental conditions (, N = 40, CMSA, noiseless sphere fitness function, ). We search for the best possible parental population size by running Algorithm 2 on a fixed population ES from to .
Figure 11 is a graph of the average distance to the optimum after 20 generations of 100 runs each with different population sizes comparing the results to the variable population strategy. This experiment is conducted to ensure that the results are better than any fixed parental population size with the given experimental conditions. For the conditions used, 4 is the best strategy for the first 20 generations, but it is still not better than the variable population method with an initial mutation strength of 60. Figure 12 is a histogram of the parental population distribution associated with the variable population strategy in Figure 11. This histogram has 16,000 separate population sizes and shows that the sizes were spread out evenly over many different possibilities. The frequency referred to in Figure 12 is the relative frequency that a particular population size is chosen by the strategy. This shows that the population sizes that the variable strategy chooses are not concentrated on a narrow subset of numbers, but have a large range of possibilities, implying the importance of choosing the size correctly.
3.3 General Fitness Function with CMA-ES
The fitness functions used in this section are more general than the sphere. For these general fitness functions, Equation (3) from Section 2.1 can no longer be used as an approximation for the quality gain of an individual mutation vector, which was a key assumption in the derivation of Equation 10. Therefore, the variable population strategy used in the comparisons that follow is the evolutionary gradient version of Section 2.2. The fitness functions which are used to test the variable population strategy are listed in Table 2.
|Name .||Function .||Initial point .||.|
|Name .||Function .||Initial point .||.|
The mutation vector adaption policy used for these general fitness functions is the CMA-ES. The starting mutation strength and starting point are from the MATLAB code in Hansen and Ostermeier (2001) for the cigar fitness function, the MATLAB code in Hansen (2011) for the ellipsoid fitness function, and Table 1 in Hansen and Kern (2004) for the rastrigin fitness function.
|N .||.||.||wi .|
|N .||.||.||wi .|
Figure 13 shows that the variable population strategy performs well over the first 100 generations. Next, we determine the number of function evaluations required to reach a fitness value less than for each strategy in low dimensional search spaces.
Table 4 shows that the results of the first 100 generations carries forward to longer running comparisons of the two strategies. On average it takes fewer function evaluations for the variable population strategy to reach its goal for the three tested search space dimensionalities than the fixed population version.
|.||(,)-CMA-ES .||(VP,)-CMA-ES .|
|N .||Average .||SE .||Average .||SE .|
|.||(,)-CMA-ES .||(VP,)-CMA-ES .|
|N .||Average .||SE .||Average .||SE .|
For the ellipsoid and Rosenbrock fitness functions, we produce the same figure except we use 200 generations instead of 100. Figures 14 and 15 represent an average of 1,000 iterations of 200 generations, graphing both the absolute and relative difference between each strategy. Again, we determine the number of function evaluations that are necessary for the fixed and variable population CMA-ES to reach a fitness level below . The three dimensions tested are listed in Tables 5 and 6. For the Rosenbrock function, not all of the runs reached a fitness level of in 1,000 generations, so Table 6 also gives the number of times out of 1,000 that this occurred.
|.||(,)-CMA-ES .||(VP,)-CMA-ES .|
|N .||Average .||SE .||Average .||SE .|
|.||(,)-CMA-ES .||(VP,)-CMA-ES .|
|N .||Average .||SE .||Average .||SE .|
The last fitness function that we test is a multimodal problem which we do not expect to converge unless we use the rank- update version of CMA-ES and a restart strategy with an increasing offspring population similar to the one in Hansen and Kern (2004). Our intent is to show an improvement of a variable population strategy over a fixed population strategy and not necessarily its convergence to the optimum. Therefore, we show the actual fitness values for the fixed and variable population CMA-ES for weighted and intermediate recombination. Figure 16 shows the variable population CMA-ES strategy with intermediate recombination performs better in the short term on the Rastrigin fitness function than the others. We do not say anything about the number of function evaluations needed to reach a certain level since there is an extremely small chance that this strategy converges with such a small offspring population size.
The main purpose of this paper was to demonstrate the benefit of adapting the parent population size each generation based on the sampled fitness values, and two adaptation mechanisms were proposed. It has been shown that the adapting the parent population size increases the progress compared to even the best fixed parent population size on the sphere function with normalized mutation strengths, and using a CMSA adaption mechanism. Additionally, tests on general unimodal and multimodal functions using a CMA-ES with weighted and intermediate recombination has been shown to improve fitness values and convergence over its fixed population counterpart. The conclusion is that compared to a fixed population ES, the variable population ES is able to make modest improvements in fitness gain over several generations.
Future work in this area includes expanding the deterministic settings to determine how increasing the dimensionality of the search space and changing the number of offspring affect the results. More research is needed on a hybrid method which utilizes the strength of both algorithms depending on the objective function and determining whether a variable population strategy can be used to optimize stochastic fitness functions.
This work has been supported in part by National Science Foundation under Award CMMI-1233376, Department of Energy under Award DE-SC0002223, NIH under Grant 1R21DK088368-01, and National Science Council of Taiwan under Award NSC-100-2218-E-002-027-MY3.