## Abstract

During the recent decades, many niching methods have been proposed and empirically verified on some available test problems. They often rely on some particular assumptions associated with the distribution, shape, and size of the basins, which can seldom be made in practical optimization problems. This study utilizes several existing concepts and techniques, such as taboo points, normalized Mahalanobis distance, and the Ursem’s hill-valley function in order to develop a new tool for multimodal optimization, which does not make any of these assumptions. In the proposed method, several subpopulations explore the search space in parallel. Offspring of a subpopulation are forced to maintain a sufficient distance to the center of fitter subpopulations and the previously identified basins, which are marked as taboo points. The taboo points repel the subpopulation to prevent convergence to the same basin. A strategy to update the repelling power of the taboo points is proposed to address the challenge of basins of dissimilar size. The local shape of a basin is also approximated by the distribution of the subpopulation members converging to that basin. The proposed niching strategy is incorporated into the covariance matrix self-adaptation evolution strategy (CMSA-ES), a potent global optimization method. The resultant method, called the covariance matrix self-adaptation with repelling subpopulations (RS-CMSA), is assessed and compared to several state-of-the-art niching methods on a standard test suite for multimodal optimization. An organized procedure for parameter setting is followed which assumes a rough estimation of the desired/expected number of minima available. Performance sensitivity to the accuracy of this estimation is also studied by introducing the concept of robust mean peak ratio. Based on the numerical results using the available and the introduced performance measures, RS-CMSA emerges as the most successful method when robustness and efficiency are considered at the same time.

## 1 Introduction

In recent decades, optimization of practical problems by using meta-heuristic methods has gained much interest (Rechenberg, 2000; Coello and Lamont, 2004), where the problem landscape may exhibit challenging features such as multimodality, discontinuity, ill-condition, and correlation (Hansen et al., 2009). On the other hand, practical considerations have evolved the flexibility of optimization methods in order to solve problems in which more than one objective is pursued (multi-objective optimization) (Deb, 2001), uncertainties exist (robust optimization) (Beyer and Sendhoff, 2007), or distinct good solutions are desired (multimodal optimization) (Das et al., 2011). Finding a set of good solutions, instead of a single one, may provide a variety of distinct and reasonable alternatives for the decision maker. This allows him/her to consider some other factors that were possibly overlooked during the mathematical modeling of the problem, to select the final solution. Moreover, finding all the high-fitness optima might be inherently critical. A typical example is finding all the resonance frequencies that lead to high vibration amplitude (Das et al., 2011). Even when a single minimum is sought, niching can still be utilized to prevent premature convergence. Niching is also an inevitable part of multi-objective optimization methods, without which diversity of the nondominated set significantly declines and a good representation of the Pareto front cannot be reached (Deb, 2001).

Multimodal optimization is usually achieved by using a diversity preservation strategy, called *niching*, incorporated to a global optimization method, which we call the *core algorithm*, to enable parallel convergence to different minima. Early niching methods were originally proposed for genetic algorithms (GAs), including crowding (De Jong, 1975) and fitness sharing (Goldberg and Richardson, 1987). Several subsequent studies analyzed and developed the niching methods in realm of GAs (Mahfoud, 1995; Sareni and Krahenbuhl, 1998; Mengshoel and Goldberg, 2008). Similar or new niching strategies were also incorporated to other metaheuristics such as particle swarm optimization (PSO) (Qu et al., 2013; Schoeman and Engelbrecht, 2010) and differential evolution (DE) (Basak et al., 2013; Biswas et al., 2014).

Regardless of the core algorithm, niching strategies can be classified into two groups: radius-based and non-radius-based (Stoean et al., 2010).

### 1.1 Radius-Based Methods

Radius-based niching methods are those that rely on a distance threshold, generally referred to as *niche radius*, to check whether two individuals share the same niche. The oldest technique in this group is fitness sharing, which reduces the fitness of similar individuals that share a niche. A similar idea is followed in some other techniques including clearing (Pétrowski, 1996), in which inferior individuals within the niche distance are eliminated, and also clustering. A comprehensive review of classical niching methods for continuous parameter optimization can be found in the work of Singh and Deb (2006), and a review paper by Das et al. (2011).

Beasley et al. (1993) used a sequential niching technique according to which the fitness function in the region of already found solutions is deteriorated so that the subsequent restarts avoid such regions. The technique relies on definition of a threshold distance, within which the fitness function is degraded. Species conserving genetic algorithm (SCGA) (Li et al., 2002) iteratively identifies sufficiently distant fittest individuals as seeds and allocates close individuals, with respect to a threshold distance, to those seeds. A similar procedure is followed in niching with evolution strategies (Shir and Bäck, 2009), in which classification into niches is performed iteratively over the whole population, with respect to a preset niche radius parameter.

Multimodal optimization by artificial immune networks (AIN) (de Castro and Timmis, 2002) resembles a nested evolution strategy, where evolution is performed within the subpopulation and the elite of each subpopulation survives to the next generation. The niching strategy employs the Euclidean distance between the elites as affinity of the subpopulations and may suppress inferior ones if their affinity is less than the suppression threshold. New subpopulations are randomly generated to replace the suppressed subpopulations. The user should set the value of the suppression threshold (de Castro and Timmis, 2002), or the rate at which the new subpopulations are introduced (Freschi and Repetto, 2006). The latter case can be interpreted as a radius-free niching method; nevertheless, empirical results from AIN and comparison with other multimodal optimization methods are scarce.

In the grenade explosion method (GEM) (Ahrari et al., 2009; Ahrari and Atai, 2010), several subpopulations explore the search space at the same time. Selection and recombination are rendered locally (within the subpopulations). Interaction among the subpopulations is limited to the sampling step, which forces individuals of a subpopulation to lie sufficiently far from the center of superior subpopulations. The required distance is gradually decreased to let the subpopulations converge to the basins that are close to each other. The main problem of the method is the high amount of tuning effort.

### 1.2 Radius-Free Methods

Radius-free niching methods, in contrast, do not depend upon definition of the threshold distance, and hence, can be considered more robust and desirable than radius-based methods. The oldest method in this group is crowding, in which a descendant replaces the most similar parent. Mengshoel and Goldberg (2008) performed a comprehensive theoretical study on deterministic and stochastic crowding in genetic algorithms. A similar idea is followed in restricted tournament selection, in which the descendant competes with the closest among the *k* randomly selected individuals (Das et al., 2011). A radius-free variant of fitness sharing with an asymmetric sharing function was proposed in another study (van der Goes et al., 2008). This strategy assigns a niche radius to each individual which undergoes self-adaptation.

Stoean et al. (2007) improved SCGA (Li et al., 2002) by proposing topological species conservation (TSC), which checks whether two individuals share the same basin. This task is rendered by calling a function named *DetectMultimodal*, the idea of which was proposed by Ursem (1999). This function generates and evaluates a number of samples along the line segment connecting the individuals. If a sample has a lower fitness than both individuals, then it is concluded that those individuals belong to different niches; otherwise, they are assumed to share the same niche. Although the dependency on the user-tuned niche radius parameter was eliminated, the iterative calls of the mentioned function appeared to be costly. Additionally, it seems that the performance of the proposed algorithm depends on well-tuning of some parameters including mutation and crossover rates, since these parameters were tuned for each problem independently. The same authors revised the method later (Stoean et al., 2010), but the challenge of performance sensitivity to the control parameters remained unsolved.

Li (2010) proposed different variants of particle swarm optimization (PSO) with ring topology, which restricts the interaction to the neighbor particles. In distance-based locally informed particle swarm (LIPS) (Qu et al., 2013), particles are relocated based on information from their nearest neighbors (measured in terms of the Euclidean distance). This method requires only the population size to be tuned and was demonstrated to surpass nine other niching methods, including lbest-PSO, proposed by Li (2010). In higher dimensions, only LIPS and the niching covariance matrix adaptation (NCMA) (Shir et al. 2010) could detect a reasonable fraction of the desired optima.

A quite similar niching technique was pursued by Qu et al. (2012) to perform local mutation in neighborhood-based species-based differential evolution (NSDE), which demonstrated promising results when compared to earlier niching methods. Biswas et al. (2014) proposed parent-centric normalized mutation with proximity-based crowding differential evolution (PNPCDE), which probabilistically selects parents based on their similarities, measured in terms of their Euclidean distance. Their comparison with different variants of PSO and DE demonstrated a considerable improvement.

Some recent studies perform multimodal optimization by defining another objective. In shifting balance genetic algorithm (Wineberg and Chen, 2004), several small subpopulations (colonies) search the space around a larger subpopulation (core). The second objective ensures that the colonies remain sufficiently far from the core, and thus searching new regions. The method was proposed merely to boost diversity, and multimodal optimization was not pursued. Bi-objective multipopulation genetic algorithm (BMPGA) (Yao et al., 2010) defines minimization of the norm of the gradient vector as the second objective. The algorithm requires derivatives of the objective function, calculation of which might be more costly than the objective function itself, even if it is mathematically defined. Performance of the method depends on a user-tuned problem-dependent convergence threshold parameter. Deb and Saha (2012) developed a bi-objective GA for multimodal optimization in which the second objective was minimization of the derivatives of the original function. A heuristic to avoid analytical calculation of the derivatives of the objective function was employed and the method was validated on problems having up to 10 variables; however, in the employed test problems, distribution of the global minima were quite uniform and their attraction regions were roughly circular, which might make them relatively easy problems. The second objective in multimodal optimization with bi-objective DE (MOBiDE) (Basak et al., 2013) is maximization of the average distance to other solutions in the search space. The method requires only the population size to be tuned by the user and the recommended values for other parameters were demonstrated to work well on a comparatively large test suite which included some hard composite test problems. Bandaru and Deb (2013) applied a parameter-free bi-objective GA for multimodal optimization, in which the second objective was increasing diversity of the individuals. Despite its robustness, it ranked seventh among the 15 methods participating in the CEC2013 competition on multimodal optimization (Li et al., 2013a).

Covariance matrix adaptation with adaptive niching (NCMA) (Shir et al., 2010) was shown to be successful in finding high-fitness local minima in addition to the global minimum. The niche radius was coupled to the mutation strength and adjusted such that every niche consists of ten individuals. The Mahalanobis distance metric, a scaled version of the Euclidean distance metric, was employed to identify the limit of a niche, which allows for an elliptic instead of spherical estimation of the niche shape; however, using a fixed and rather small number of individuals per desired niche can be a drawback when there are many nondesirable local minima, since CMA-ES needs a large population size to find the global minimum in such landscapes (Ahrari and Shariat-Panahi, 2015). The absence of recombination, one of the main operators in ESs, and having only one parent per niche, in comparison with the recommended setting in CMA-ES (Hansen and Ostermeier, 2001; Beyer and Sendhoff, 2008), can drastically deteriorate the global search capability of the core algorithm.

Preuss (2010) incorporated the concept of nearest-better clustering (NBC) to CMA-ES. The resulting algorithm (NBC-CMA) identifies basins by finding the nearest better solution and cutting the longest edges to separate them into different clusters. NBC-CMA and its later variant, called NEA2 (Preuss, 2012), were demonstrated to be successful when tested on problems of higher dimensions; however, in the employed experimental setup, the goal was to find a single global minimum. NEA2 also ranked first among the niching methods participating in CEC2013 competition on multimodal optimization (Li et al., 2013a). Particularly, for composite functions in higher dimensions, it outperformed the closest competitors by a clear margin.

Fieldsend (2014) proposed the niching migratory multiswarm optimizer (NMMSO), in which multiple swarms search the space. The goal of NMMSO is that each swarm searches a distinct basin. NMMSO allows a particle in vicinity of another basin to migrate from the parent swarm and form a new swarm. NMMSO can merge two swarms if they are searching the same basin as well. NMMSO is dynamic in the number of swarms and relatively outperformed NEA2 in the CEC2015 competition on multimodal optimization (Li et al., 2015).

### 1.3 Critiques on Multimodal Optimization

The main advantage of EAs over deterministic methods is their robustness with respect to the assumptions on the fitness landscape of the objective function. Similarly, in multimodal optimization, algorithms with minimal assumptions on properties of the fitness landscape are highly desirable; hence, radius-free niching methods show a significant advantage over those that depend on a user-tuned niche radius parameter. In most radius-based techniques with fixed niche radius, the default value of the niche radius is set based on the recommendation of Deb and Goldberg (1989), which assumes the basins are almost uniformly distributed in the entire search space. In general, even fine-tuning of the niche radius might be of little use, especially when distribution of minima is not uniform or basins are of dissimilar shape and size. Other control parameters such as window size in restricted tournament or crowding factor in crowding (Das et al., 2011) do not make such explicit assumptions of the fitness landscape.

Regardless of the underlying idea, niching methods exploit some distance measures which determine whether two solutions share the same basin. Quite often, the Euclidean distance is measured to group individuals to different niches, according to which closer individuals are more likely to share the same basins, or equivalently, farther individuals belong to different niches. This strategy implicitly presumes that niches are roughly spherical; otherwise it can be misleading. Figure 1 illustrates this challenge for a typical case, where basins have a moderate condition number of 100. Considering ill-condition problems are one of the main challenges in real-parameter optimization (Hansen et al., 2009), exploitation of the Euclidean distance metric may considerably deteriorate reliability of the results. As an alternative, some studies utilized the Mahalanobis distance metric (Shir and Bäck, 2009; Shir et al., 2010), which may handle this problem to a great extent, even though in general, basins might have any arbitrary shape.

Another shortcoming is the lack of well-developed comprehensive test suites for analyzing pros and cons of different niching methods. Most commonly used niching benchmark functions are quite primitive. The most critical point is the low dimension of the search space, probably because these functions were originally defined for 1D or 2D space. Some popular examples are Branin, Himmelblau, and Six Hump functions, which have frequently been used to validate niching methods (Ahrari and Atai, 2010; Wang et al., 2012; Schoeman and Engelbrecht, 2010; Liu et al., 2012; Liang and Leung, 2011; Roy et al., 2013; Li et al., 2012; Epitropakis et al., 2011). A few of these functions are scalable, such as the Shubert or the Vincent function; however, the number of global minima exponentially grows with respect to the problem dimension, which practically hinders employing dimensions higher than three. In such low dimensions, the search space can be exhaustively searched, and hence results from such evaluations can hardly be generalized to higher dimensions, which are of significant practical interest. Considering multimodal optimization as a generalization of global optimization, niching test problems should reflect a variety of difficulties that are encountered in global optimization, including correlation and ill-condition (Hansen et al., 2009), in addition to those peculiar to multimodal optimization such as nonuniform distribution of minima.

In an attempt to introduce more challenging benchmarks for multimodal optimization, Qu and Suganthan (2010) introduced a procedure to pose functions with several global minima using the weighted sum of some basic functions, in which the weights are determined based on the distance to the global minimum of the basic functions. This method enables formation of basins with different shape and size while the number of global minima is equal to the number of the basic functions. The drawback is the high computation cost of a function evaluation if many global minima are desired, since each basic function should be called independently. This method was also employed to compose some scalable test problems for the CEC2013 special session on multimodal optimization (Li et al., 2013b).

Although finding high-fitness local minima might be practically interesting, the criteria specifying whether a local minimum is desirable should be provided for the method; for example, asking the method to find the best minima, or local optima within a specific tolerance of the global optimum value. Nevertheless, most available niching methods do not show such flexibility and numerical results commonly aim at finding only global minima.

### 1.4 Contribution of This Study

This study develops a novel niching method for multimodal optimization, called covariance matrix self-adaptation with repelling subpopulations (RS-CMSA). RS-CMSA adapts and reformulates multiple existing concepts from different methods, as summarized in Table 1. In comparison with the existing methods for multimodal optimization, RS-CMSA can learn the relative size and possibly the shape of the basins, and properly handle the challenge of noncircular basins, as depicted in Figure 1. Limitations of RS-CMSA are only those imposed by the core algorithm, such as continuity of the design parameters. No assumption on the distribution of global minima, the shape, or the size of the basins is required. RS-CMSA has no niche radius parameters and for all other parameters, the default values are shown to be robust provided that a rough estimate for the expected/desired number of minima is given.

Concept . | Available form . | This study . |
---|---|---|

Taboo regions (Glover, 1986) (Siarry and Berthiau, 1997) (Ahrari et al., 2009) | The size of the taboo regions is fixed or decreases dynamically. | Taboo regions are adaptively resized and reshaped. |

Taboo regions are similar for all solutions. | Taboo regions differ for different subpopulations. | |

The size of the taboo regions is independent of the basin size. | The size of the taboo region is adapted to the basin size. | |

Taboo regions are the recently visited regions. | Taboo regions are the previously identified basins and the center of fitter subpopulations. | |

All taboo regions are checked. | Only critical taboo regions are checked. | |

Core algorithm (CMSA-ES) (Beyer and Sendhoff, 2008) | Comma selection is employed. | Elite selection is employed. |

The global step size is adapted by simple averaging. | A bias compensation is applied in adaptation of the global step size. | |

Restarts (Auger and Hansen, 2005) | Independent restarts are performed and population size increases. | Outcome of a restart affects the subsequent restarts. (Similarly employed) |

Hill-valley function (Ursem, 1999) | Checks whether two solutions share the same basin. | (Similarly employed) |

Semi-random initialization (Ahrari et al., 2009) | Initialization maximizes the initial distance to other subpopulations. | Initialization maximizes the initial distance to other subpopulations and the archived points. |

Concept . | Available form . | This study . |
---|---|---|

Taboo regions (Glover, 1986) (Siarry and Berthiau, 1997) (Ahrari et al., 2009) | The size of the taboo regions is fixed or decreases dynamically. | Taboo regions are adaptively resized and reshaped. |

Taboo regions are similar for all solutions. | Taboo regions differ for different subpopulations. | |

The size of the taboo regions is independent of the basin size. | The size of the taboo region is adapted to the basin size. | |

Taboo regions are the recently visited regions. | Taboo regions are the previously identified basins and the center of fitter subpopulations. | |

All taboo regions are checked. | Only critical taboo regions are checked. | |

Core algorithm (CMSA-ES) (Beyer and Sendhoff, 2008) | Comma selection is employed. | Elite selection is employed. |

The global step size is adapted by simple averaging. | A bias compensation is applied in adaptation of the global step size. | |

Restarts (Auger and Hansen, 2005) | Independent restarts are performed and population size increases. | Outcome of a restart affects the subsequent restarts. (Similarly employed) |

Hill-valley function (Ursem, 1999) | Checks whether two solutions share the same basin. | (Similarly employed) |

Semi-random initialization (Ahrari et al., 2009) | Initialization maximizes the initial distance to other subpopulations. | Initialization maximizes the initial distance to other subpopulations and the archived points. |

PID . | Function . | Dimension . | Niche radius . | . | MaxEvals . |
---|---|---|---|---|---|

1 | Five-Uneven-Peak Trap | 1 | 0.01 | 2 | 50,000 |

2 | Equal Maxima | 1 | 0.01 | 5 | 50,000 |

3 | Uneven Decreasing Maxima | 1 | 0.01 | 1 | 50,000 |

4 | Himmelblau | 2 | 0.01 | 4 | 50,000 |

5 | Six-Hump Camel Back | 2 | 0.5 | 2 | 50,000 |

6 | Shubert | 2 | 0.5 | 18 | 200,000 |

7 | Vincent | 2 | 0.2 | 36 | 200,000 |

8 | Shubert | 3 | 0.5 | 81 | 400,000 |

9 | Vincent | 3 | 0.2 | 216 | 400,000 |

10 | Modified Rastrigin | 2 | 0.01 | 12 | 200,000 |

11 | Composition Function 1 | 2 | 0.01 | 6 | 200,000 |

12 | Composition Function 2 | 2 | 0.01 | 8 | 200,000 |

13 | Composition Function 3 | 2 | 0.01 | 6 | 200,000 |

14 | Composition Function 3 | 3 | 0.01 | 6 | 400,000 |

15 | Composition Function 4 | 3 | 0.01 | 8 | 400,000 |

16 | Composition Function 3 | 5 | 0.01 | 6 | 400,000 |

17 | Composition Function 4 | 5 | 0.01 | 8 | 400,000 |

18 | Composition Function 3 | 10 | 0.01 | 6 | 400,000 |

19 | Composition Function 4 | 10 | 0.01 | 8 | 400,000 |

20 | Composition Function 4 | 20 | 0.01 | 8 | 400,000 |

PID . | Function . | Dimension . | Niche radius . | . | MaxEvals . |
---|---|---|---|---|---|

1 | Five-Uneven-Peak Trap | 1 | 0.01 | 2 | 50,000 |

2 | Equal Maxima | 1 | 0.01 | 5 | 50,000 |

3 | Uneven Decreasing Maxima | 1 | 0.01 | 1 | 50,000 |

4 | Himmelblau | 2 | 0.01 | 4 | 50,000 |

5 | Six-Hump Camel Back | 2 | 0.5 | 2 | 50,000 |

6 | Shubert | 2 | 0.5 | 18 | 200,000 |

7 | Vincent | 2 | 0.2 | 36 | 200,000 |

8 | Shubert | 3 | 0.5 | 81 | 400,000 |

9 | Vincent | 3 | 0.2 | 216 | 400,000 |

10 | Modified Rastrigin | 2 | 0.01 | 12 | 200,000 |

11 | Composition Function 1 | 2 | 0.01 | 6 | 200,000 |

12 | Composition Function 2 | 2 | 0.01 | 8 | 200,000 |

13 | Composition Function 3 | 2 | 0.01 | 6 | 200,000 |

14 | Composition Function 3 | 3 | 0.01 | 6 | 400,000 |

15 | Composition Function 4 | 3 | 0.01 | 8 | 400,000 |

16 | Composition Function 3 | 5 | 0.01 | 6 | 400,000 |

17 | Composition Function 4 | 5 | 0.01 | 8 | 400,000 |

18 | Composition Function 3 | 10 | 0.01 | 6 | 400,000 |

19 | Composition Function 4 | 10 | 0.01 | 8 | 400,000 |

20 | Composition Function 4 | 20 | 0.01 | 8 | 400,000 |

At a secondary level, a general framework to quantify performance sensitivity of multimodal optimization methods to the expected/desired number of minima is provided; and a performance measure, called robust mean peak ratio (*RMPR*), is developed to encompass such sensitivity in comparison.

The rest of this article is organized as follows. Section 2 elaborates different components of RS-CMSA in detail. In Section 3, a few descriptive experiments are performed to demonstrate how the method can tackle the challenge of differently sized basins. In Section 4, RS-CMSA is compared with some of the state-of-the-art niching methods on the CEC2013 test problems for multimodal optimization. Finally, conclusions are drawn in Section 5.

## 2 The Proposed Niching Method

Different parts of the proposed method are explained in this section.

### 2.1 The Core Algorithm

The core algorithm of the proposed method is a variant of CMA-ES, which falls in the category of evolution strategies that adapt the full covariance matrix of multivariate normal distribution (Hansen and Ostermeier, 2001). The adaptation process consists of adaptation of the global mutation strength, the so called step size, , based on the concept of cumulation, and a few other heuristics to update the covariance matrix (**C**). By adapting the covariance matrix, distribution of sampled solutions in CMA-ES may gradually conform to the global/local structure of the fitness landscape in order to handle ill-conditioned problems efficiently (Hansen and Kern, 2004). One may speculate that this property can be particularly utilized to learn the shape of the basin to which the algorithm is converging, in order to cope with the challenge of non-spherical basins (see Figure 1), as investigated in a former study (Shir et al., 2010).

In a variant of CMA-ES, called CMSA-ES (Beyer and Sendhoff 2008), the mutative step size control replaced the cumulative step size adaptation and a simplified procedure to adapt the covariance matrix was proposed. The new variant was demonstrated to relatively outperform the original CMA-ES in some multimodal problems; however, in ill-conditioned problems, where efficient adaptation of the covariance matrix plays the critical role, it falls behind CMA-ES. Nevertheless, CMSA has a significant advantage from the practical point of view: it relies on fewer assumptions on the problem, which makes it more flexible. For example, it can employ intermediate selection schemes, in which a fraction of parents may survive to the next generation. Employing such strategies in CMA-ES violates the assumptions on distribution of samples, upon which the concept of cumulative step size adaptation relies, and thus a performance decline can be assumed. CMSA-ES was preferred in this study and employed as the core algorithm because of the mentioned advantages.

### 2.2 Main Niching Ideas

In the proposed niching method, the population size is divided into subpopulations of size (, ), which explore the search space in parallel. Every subpopulation has its own mutation parameters (, ), center () and elite members. Members of each subpopulation must maintain a sufficient distance from some specific points in the search space, called *taboo points*. The set of taboo points for is the union of:

Previously identified minima (), which are stored in an array called

**Archive**, unless is better than .The centers of fitter subpopulations.

The fitness of a subpopulation is measured by the fitness of its best individual. It is remarkable that the set of taboo points is subpopulation-dependent, and a better subpopulation has a smaller taboo set.

A sampled solution is called *taboo acceptable* if it satisfies the distance criterion with respect to all the taboo points; otherwise, it is discarded without evaluation. This process goes on until taboo acceptable solutions are generated. The overall effect of such rejection is reshaping distribution of solutions so that subpopulations do not search previously explored regions or those which are being explored by other subpopulations. This distance-based rejection defines taboo regions around the taboo points, where a subpopulation may not produce any offspring. The taboo regions are ellipsoids whose centers lie on the taboo points. The shape of the taboo regions for is exclusively determined by the strategy parameters of , while the size is affected by a property of the taboo point, the *normalized taboo distance* () as well. ’s are adapted so that a larger basin has a larger , which enables the RS-CMSA to handle the challenge of unequally sized basins efficiently. The niching strategy affects only the sampling step, which means selection, recombination, and adaptation are performed locally; therefore, subpopulations may converge at different times and may have totally different strategy parameters, which allows for identification of basins with different shape and size.

The idea of taboo regions may remind of the tabu search (TS) (Glover, 1986), in which a short-term memory helps the algorithm avoid the recently visited regions (Siarry and Berthiau, 1997); however, the size and the shape of the regions are preset in TS while RS-CMSA can learn the relative size and possibly the shape of the basins. The archived solutions in RS-CMSA, in contrast to the memory in TS, contains minima identified during the optimization process, disregarding how recent the identification has occurred. More importantly, taboo regions are coupled to the mutation strength of the subpopulation, which means they shrink as the subpopulation converges. Even for a fixed iteration, the taboo regions perceived by different subpopulations differ in number, shape and size.

### 2.3 Evolution of Subpopulations

In RS-CMSA, evolution of a subpopulation consists of sampling, selection, and recombination, in which the diversity preservation strategy is applied to the sampling stage. The augmentations applied to RS-CMSA, in comparison with the original CMSA-ES, are discussed in this section.

#### 2.3.1 Distance Metric

*j*-th solution () of satisfies the distance condition with respect to the

*k*-th taboo point (). A simple way is to calculate the normalized Euclidean distance () as follows: where

*u*’s are the square roots of the eigenvalues of . encompasses the effect of on the mutation strength, although the mutation strength is mainly controlled by .

_{ir}*d*) of the point

_{M}*x*from the center with respect to the covariance matrix is defined as follows: The Mahalanobis distance can be interpreted as the scaled version of the Euclidean distance, where the scaling factors and directions are determined by . This definition can be employed to define the normalized Mahalanobis distance () between and with respect to the parameters of the sampling distribution () as follows:

Contours of points with identical normalized Euclidean distance from form concentric spheres whose centers lie on . For the case of the normalized Mahalanobis distance, they form concentric ellipsoids, the shape of which is defined by . In both cases, the normalized distance increases inversely proportional to , which means for similar values of the absolute distance (), the normalized distance () increases as the subpopulation converges, which can be beneficial for landscapes with multiple global minima close to each other. The superscript denotes that the normalized distance can be either Euclidean or Mahalanobis.

#### 2.3.2 Rejection Rule

*K*is the number of the taboo points for . is accepted and evaluated if it satisfies the distance condition with respect to all the taboo points; otherwise, it is rejected and a new candidate solution is generated. This process continues until solutions are accepted.

_{i}If the normalized Euclidean distance is employed, the taboo region around perceived by is a ball of radius , while for the Mahalanobis case, it is an ellipsoid with main axes of . This means for the latter case, the shape of the taboo region is defined by the covariance matrix, which is assumed to be able to gradually conform to the shape of the basin to which the subpopulation is converging. Consequently, the taboo region may gradually conform to the basin shape to overcome the challenge of nonspherical basins, which was highlighted in Section 1.2. For the rest of this study, the distance metric is assumed to be the Mahalanobis, unless mentioned otherwise.

Taboo regions are illustrated for a typical case in Figure 2, where two subpopulations face two archived points () with and . is better than and both are worse than the archived points. The subpopulations have a default normalized taboo distance of . The boundary of a taboo region, depicted by a thick continuous line, forms an ellipsoid, on the center of which lies the taboo point. The taboo regions perceived by (see Figure 2a) and (see Figure 2b) differ in three aspects:

The number of taboo regions is different for and , since is a taboo point for but is

*not*a taboo point for .For each subpopulation, the shape of the taboo regions is similar to the shape of the isodensity contours of the sampling distribution of that subpopulation.

The size of a taboo region for a subpopulation is proportional to the normalized taboo distance of the corresponding taboo point; therefore, for either of the subpopulations, the taboo region of is times larger than the taboo region of .

Sampled solutions that fall inside the taboo regions are rejected without evaluation. Rejection because of proximity to a taboo point results in an anisotropy in distribution of offspring such that is biased to move farther from the taboo regions; however, whether it actually moves that way depends on the fitness landscape of the objective function as well.

#### 2.3.3 Selection and Recombination

While contemporary evolution strategies, including CMSA, employ the comma selection scheme (Bäck et al., 2013), preserving the elite member has been preferred for niching with ESs (Debski et al., 2008). Some previous studies demonstrated superiority of the plus scheme in multimodal optimization for the special cases of (1+)-ES and (1,)-ES (Shir and Bäck, 2007, 2009), in which the global step-size is updated iteratively while the covariance matrix is updated only if the selected offspring is better than the parent.

RS-CMSA-ES employs a more general elite preservation scheme, in which parents may survive to the next generation. The surviving parents may participate in subsequent update of the subpopulation parameters (). Similar to the concept of finite life span (Schwefel and Rudolph, 1995), in which the life span of offspring is limited to generations, or gradual fitness decay (Ahrari and Kramer, 2015), in which the fitness of individuals declines as they grow older, this general case of elitism enables arbitrary trade-off between advantages and disadvantages of both extreme selection schemes, and it can even surpass both extremes in some landscapes (Ahrari and Kramer, 2015).

*successful directions*in a similar way: preserving successful directions () of the surviving parents, which is employed in this study. At the end of each generation, the solutions (the union of recently generated and surviving solutions from the previous iteration) are sorted in increasing order of the function value. Parameters of the subpopulations are then updated as follows: where ’s are logarithmically decreasing weights (Hansen and Ostermeier, 2001), is the number of parents in the subpopulation, and is the adaptation interval constant for (Beyer and Sendhoff, 2008). Equivalently, can be interpreted as the learning rate for the covariance matrix. If there are bounds on the search space, is relocated to the nearest point inside the search space. Note that parameters of the surviving elites () from the previous iteration are included in Equations 6a, 6b, and 6c provided that they are among the best solutions of the current iteration. The best solutions of this iteration will survive to the next iteration accordingly. By default, .

A few deviations from the original CMSA-ES (Beyer and Sendhoff, 2008) are observed in the update scheme. First, the parental subpopulation size has been doubled, however, since weighted recombination is used, the effective number of parents () (Hansen et al., 2013) is similar for both cases. The adaption of the global mutation strength has also been revised to compensate for anisotropic distribution of ’s in the subpopulation, caused by rejection of certain samples, such that under random selection of the parents, the expected change of is zero. Ignoring such a compensation results in a bias towards increasing . The reason is that a greater is more likely to produce a sample farther from , which is more likely to lie outside the taboo regions. A larger means larger taboo regions, and preference of greater ’s again. This ever increasing risks divergence of . Although some statistical criteria will be provided to terminate nonconverging subpopulations, such subpopulations waste many evaluations.

### 2.4 Restarts with Increasing Population Size

The optimal population size for CMA-ES and presumably CMSA-ES varies from a small value () for unimodal functions to several hundreds for highly multimodal ones (Ahrari and Shariat-Panahi, 2015). Auger and Hansen (2005) employed the restart strategy with increasing population size (IPOP-CMA-ES) to overcome the challenge of parameter tuning. Starting with the recommended minimum value, the population size is doubled after each restart. The goal is to reach a restart with a sufficiently large population size which can identify the global minimum of highly multimodal functions. The drawback emerges when a large population size is required. In such a situation, several restarts should be performed so that the population size becomes large enough to be able to find the global minimum. Since the restarts are independent, the evaluation effort in the earlier restarts is actually wasted, resulting in considerable increase in the number of evaluations (Ahrari and Shariat-Panahi, 2015). More sophisticated schemes were later proposed to gain more feedback from the previous restarts (Wessing et al., 2011) or to enhance local search capabilities of the method (Hansen, 2009). The method proposed in another study (Ahrari and Shariat-Panahi, 2015) adapts the population size based on variation of the fitness of the recombinant point, which was demonstrated to outperform the original CMA-ES in both efficiency and robustness; however, it still requires a parameter to be tuned.

*Condition C*: the condition number of**C**exceeds .*Stagnation*: the median of the 20 newest values is not smaller than the median of the 20 oldest values, respectively, in the two arrays containing the best recent function values and the median of recent function values of the last iterations, excluding elites.*TolHistFun*: the range of the best recent function values (excluding elites) during the last iterations is smaller than*TolHistFun*.

The terminated subpopulations are analyzed at the end of the restart to check whether they specify new basins. The newly identified basins are added to **Archive** and the normalized taboo distance of each archived solution is updated based on the attraction power of the corresponding basin.

### 2.5 Adaptation of the Normalized Taboo Distance

A universal value for the normalized taboo distance may work well if all the basins are roughly similar in size; otherwise, it would be too small for large basins and too large for the others. The magnitude of can be considered as the repelling power of the taboo point . For an arbitrary basin, this parameter is adapted based on the number of subpopulations that converge to that basin. When a restart concludes, the best solution of every terminated subpopulation is compared to the solutions in **Archive**, to check whether it is desirable. For the case when only global minima are desired, a converged solution is desirable only if its fitness is not worse than the fitness of the best solution in **Archive** minus (target tolerance on the objective function). All undesirable solutions are discarded. For all the remaining (desirable) solutions (), the hill-valley function of Ursem (1999) is utilized in order to determine whether a solution refers to a new basin. The line search is rendered using Golden Section Search with a maximum of ten function evaluations. The solution is added to **Archive** if it appears to be a new basin; otherwise, the corresponding archived point is identified. At the end, the normalized taboo distance of each solution in **Archive** is updated based on the number of subpopulations () that have converged to the corresponding basin. If this number is large, is increased to reduce the probability of convergence to the same basin in the future restarts; otherwise, it is decreased since it might be unnecessarily large. Algorithm 1 explains the procedure.

is the default value of the normalized taboo distance, which is assigned to the new members of **Archive** as well as the subpopulations during the restart. is a parameter which specifies the fraction of solutions that can converge to an already identified basin without further increase in the corresponding taboo distance. If , is decreased since it might be unnecessary large. A greater indicates convergence of more subpopulations to the *m*-th archived solution, which demonstrates that the current value of is too small. specifies the learning rate of the normalized taboo distance. Default values are and . is equal to the 25th percentile of normalized taboo distances of the solutions stored in **Archive**. The justification is that basins detected in the latest restart are probably harder to find than those identified in the previous restarts. Among the factors that make a basin hard to find, the size of the basin is a critical one, and thus the recently detected minima are likely to have a narrower basin than those found previously. If **Archive** is empty, then .

### 2.6 Boosting Time Efficiency

The main time complexity of the proposed niching strategy originates from the sampling part, according to which the distance of each sampled solution should be checked against all the taboo points. It is also likely that most sampled solutions are rejected because of the distance criteria (Equation 4), especially in the early iterations of each restart when the mutation strength is great. Each sampled solution should on average be checked against taboo points, where *M* and *N _{s}* refer to the number of archived points and subpopulations, respectively. For the best-case scenario, when no sampled solutions are rejected, the time complexity is proportional to

*N*, assuming that . This intensifies the time complexity of generating a taboo acceptable solution when the number of desired/estimated minima is great. Two different strategies are proposed and employed in RS-CMSA to alleviate this problem: temporary shrinkage of the taboo regions and checking for more critical taboo points earlier. These two strategies are explained in this section.

_{s}#### 2.6.1 Temporary Shrinkage of the Taboo Regions

The fraction of the rejected sampled points is directly related to the overall size of the taboo regions. There might be situations in which the taboo regions are so huge that almost all the sampled solutions of a subpopulation are rejected. To avoid stagnation, the distance condition is loosened by temporary reduction of the normalized taboo distance of every taboo point whenever a sampled solution is rejected: . This modification is valid only for the current subpopulation and only for the current iteration. When taboo acceptable solutions are generated and evaluated, the original values of ’s are restored. A greater value of speeds up reduction of the normalized taboo distances and thus reduces the fraction of taboo-rejected samples; however, it may detectably suppress the effect of the proposed niching strategy. The default setting is , which means the size of each taboo region shrinks by 1% whenever a sample is rejected.

#### 2.6.2 Checking for More Critical Taboo Points Earlier

According to Equation 4, each taboo point () may reject the sampled solution () with the probability of . If violates the distance condition to any taboo point, it is rejected, and checking the distance condition with respect to the other taboo points is not required. This can save a lot of computation if the most critical taboo points, those that are most likely to reject a sampled solution, are known so that the distance condition is checked for them earlier. This means that if the sampled solution is going to be rejected, the algorithm discovers it sooner.

We propose mean rejection probability () as a measure to quantify *criticality* of for . Before starts sampling, the criticality of the taboo points is calculated. A sampled solution is checked against the taboo points in order of their criticality. Note that the criticality measure is not a function of the sampled solution; therefore, for each subpopulation, it is calculated once per iteration and used to generate all the taboo acceptable solutions. Moreover, it can be safely assumed that a taboo point never rejects a sampled solution if its criticality is sufficiently small. Such a taboo point is called *non-critical*, and ignored while checking the distance condition. It will be demonstrated in Section 3.2 that most taboo points become non-critical as the subpopulation converges. By default, if , then is considered as a non-critical taboo point. A conservative estimation for is provided in this section.

#### 2.6.3 Calculation of the Mean Rejection Probability

Figure 3 depicts the subpopulation and the taboo point . The origin of the auxiliary Cartesian coordinate system lies on , while *x*_{1} is along the eigenvector corresponding to the greatest eigenvalue of . For simplicity of calculations, is relocated to such that is aligned with *x*_{1} and . The iso-density contour line of sampling distribution that passes through an arbitrary point in the relocated taboo region has a higher value than the one passing through the corresponding point in the original taboo region. This means for an arbitrary sample, the probability of falling inside the relocated taboo region is more than the original taboo region; therefore, this relocation results in a greater than the actual one.

*x*

_{1}axis of the auxiliary coordinate system (). Among all the points on , has the highest rejection probability. The mean rejection probability of when

*R*varies from to is computed as follows: is one inside the taboo region and zero outside it. Therefore: where computes the of standard normal distribution. computed using Equation 11 gives an upper limit for the actual mean rejection probability of because of the violation of the distance metric to the taboo point . It is the employed measure to calculate and compare criticality of taboo points.

Since the normalized distance is inversely proportional to , the number of critical taboo points gradually decreases as the algorithm converges. This significantly reduces time complexity of sampling especially for large values of *N _{s}*. The employed model for computation of the mean rejection probability reminds one of the corridor model described by Schwefel (1993). In our model, the infeasible region is cut out of the feasible one whereas it is the other way around in the corridor model.

### 2.7 Initialization of Subpopulations

At the beginning of each restart, subpopulation are initialized such that their centers lie far from each other and the archived solutions. All new subpopulations have similar initial strategy parameters: , in which function returns a diagonal matrix whose diagonal elements are the elements of . Starting from a conservatively great , a candidate point for the center () is randomly sampled in the search space and its normalized Mahalanobis distance to the archived solutions () and the centers of other subpopulations () is calculated. If the distance condition is satisfied, the candidate is accepted as the center of , otherwise discarded, and a new candidate is sampled. If multiple consecutive tries (say, 100) fail, is slightly reduced (), which reduces the overall size of the taboo regions. This process continues until subpopulations are generated (Algorithm 2).

Such initialization, although not critically important, helps RS-CMSA avoid taboo regions more efficiently in early iterations of a restart, especially in lower dimensions. In higher dimensions, contribution of Algorithm 2 subsides due to the distance concentration effect (Beyer et al., 1999); however, for dimensions commonly used for empirical evaluation of multimodal optimization methods (), the effect of Algorithm 2 should be detectable, since there is still a considerable variance among distances of randomly generated points (Beyer et al., 1999).

### 2.8 Pseudocode

### 2.9 Parameter Tuning

Except for the default number of subpopulations (), all the parameters are set to their default values, as follows: , , , , , , , , and criticality threshold = 0.01.

should be proportional to the desired number of minima. Smaller values result in a higher short-term success. One subpopulation per desired minimum is the recommended minimal value. Stopping criteria *TolHistFun* is set equal to the desired tolerance of the objective function. The crafting effort is thus zero, provided that an estimate for the number of the desired/available minima is given.

## 3 Descriptive Experiments

A few descriptive experiments are rendered in this section to highlight some aspects of RS-CMSA. At this point, the objective is not to benchmark and compare RS-CMSA with other methods but to show these four factors:

Contribution of proper initialization of subpopulations (Algorithm 2).

Adaptation of the normalized taboo distance and its importance when basins significantly vary in size, shape, and relative distance (Algorithm 1).

Effect of the criticality measure (Equation 11) on reduction of the number of the critical taboo points.

Possible advantages of the Mahalanobis distance metric, in comparison with the Euclidean distance metric.

### 3.1 First Descriptive Experiment

In the first descriptive experiment, RS-CMSA is tested on the 2D Vincent function. It has 36 minima, all global, which significantly vary in size and relative distance (see Figure 4a). Restarts are performed, but for this experiment, the population size is kept unchanged (). Initialized subpopulations are depicted with black circles, where the center and radius of each circle represent and , respectively. The gray circles correspond to the archived points and their taboo regions (), respectively. At this point of each restart, and ; therefore, all of the taboo regions are spherical and the difference in size of the taboo regions originates merely from the difference in the normalized taboo distances (). It is remarkable that illustration of these regions is much harder for an arbitrary point during the restart, since the shape, the size, and the number of taboo regions would be different for each subpopulation.

Figure 4b depicts the generated subpopulations immediately after the initialization part in the zeroth restart. There is no archived point at this moment, and the centers of the subpopulations are at least far from each other, in which is the geometric average of the square root of the initial covariance matrix eigenvalues.

Figure 4c illustrates the solutions in **Archive** and their absolute taboo distances, immediately after initialization of the subpopulations in the first restart. There are 21 archived solutions, which implies 21 global minima were identified during the zeroth restart. Comparing taboo regions of the archived points with the basin sizes of the global minima (see Figure 4a) demonstrates that RS-CMSA has correctly increased the normalized taboo distances of minima with a wider basin.

At the beginning of the second restart (see Figure 4d), there are 27 solutions in **Archive**, which means six new minima were located during the first restart. The four largest taboo regions pertaining to the four largest basins have defined huge taboo regions in the top-right part of the search space. This not only pushes subpopulations away from these minima in the current restart, but also disrupts the distribution of the initialized subpopulations such that most of them are generated on the bottom-left side of the search space, where most of the unidentified minima lie. These factors result in a more efficient identification of new minima in the current restart.

There are 33 archived solutions at the beginning of the third restart (see Figure 4e), which implies six new minima were identified in the second restart. It seems few solutions of the second restart have converged to the largest minima, and thus their normalized taboo distances have dwindled for the third restart.

At the beginning of the fourth restart, the size of **Archive** is 34, which means only one new minimum was identified in the third restart. Nevertheless, the third restart has significantly resized the normalized taboo distances of the archived points. It seems many subpopulations of the third restart have converged to the largest basin, such that its taboo region covers most of the search space in the fourth restart. All new subpopulations are initialized at the bottom-left side, and at the end of this restart, all 36 minima were identified.

As it can be observed, this procedure may reasonably adapt the normalized taboo distance of each archived point. If it is larger than what it should be, more subpopulations will converge to that basin in the subsequent restarts, which increases the corresponding normalized taboo distance, and vice versa.

### 3.2 Second Descriptive Experiment

In the second experiment, the importance of the adaptation of the normalized taboo distance and the effect of ignoring non-critical taboo points on time complexity of the method are studied on the 3D Vincent function, which has 216 minima, all global. Different learning rates () for the normalized taboo distance are tried with and (fixed). Figure 5a illustrates peak ratio (*PR*), which is the fraction of identified minima, versus the number of function evaluations, averaged over 50 independent runs. Markers represent the average time in which a restart was performed. For , adaptation of the normalized taboo distances is suppressed and thus for . Results after the zeroth restart are predictably similar, since the adaptation of the normalized taboo distance is performed when a restart is concluded. A considerable decline in the rate of identification of new basins is observed when is small. It seems for this problem, is a logical choice.

Figure 5b illustrates the fraction of the taboo points that turned out to be critical, averaged over all subpopulations in each iteration. The horizontal axis is the normalized iteration number, which is obtained by dividing the iteration number by the maximum number of the iterations in the corresponding restart. As it can be observed, the fraction of the critical taboo points is fairly small, which demonstrates that most taboo points are not critical, and therefore, they can be ignored. This fraction, as predicted earlier, reduces quite fast with the iteration number of the restart. For example, when 20% of the iterations of a restart are completed, only about 2% of the taboo points remain critical. The result is a significant reduction of the algorithmic computation time, since samples of each subpopulation are checked against only the critical taboo points.

### 3.3 Third Descriptive Experiment

In the third experiment, the benefits of the Mahalanobis distance metric is investigated and the capability of CMSA in learning the fitness landscape structure is explored. Five variants of RS-CMSA are tested and compared to reach this goal:

RS-CMSA-M: The proposed method with the Mahalanobis distance metric.

RS-CMSA-E: The proposed method with the Euclidean distance metric.

RS-CMSA-M-I: Similar to RS-CMSA-M, but the covariance matrix is set to after initialization in each restart, in which is the geometric average of the search space side lengths.

RS-CMSA-E-I: Similar to RS-CMSA-E, but the covariance matrix is set to after initialization in each restart.

RS-CMSA-M-I0: Similar to RS-CMSA-M-I, but is set to (no learning for the covariance matrix).

*s*determines the scaling factor. Since the condition number of the original Shubert function () is about one and the search space is linearly scaled in our problem, the condition number of this function is about . The search space undergoes the same scaling; therefore, the number of the global minima is independent of the scaling factor.

Each variant is tested on this problem for , 0.5, 1.0, and 1.5 with and , for a maximum of seven restarts. Each problem is solved 50 times independently and the fraction of identified global minima is plotted versus the number of evaluations in Figure 6, in which the markers specify the end of restarts. This figure demonstrates that:

increasing the scaling factor has no effect on the performance of RS-CMSA-M. This was predictable, since the initial covariance matrix is scaled accordingly and the Mahalanobis distance metric neutralizes the scaling effect when calculating the normalized taboo distance.

RS-CMSA-E falls slightly behind RS-CMSA-M when the scaling factor increases. This inferiority exacerbates for greater scaling factors, but can be observed only for a great .

no considerable difference between performance of RS-CMSA-M-I and RS-CMSA-E-I can be detected. It can be justified by the insignificance of performance difference between RS-CMSA-E and RS-CMSA-M for a low and the fact that the adaptation of the covariance matrix is gradual.

starting with the isotropic distribution instead of the near-optimal one results in a significant performance decline; however, it is much more drastic for RS-CMSA-M-I0 than RS-CMSA-M-I. This demonstrates that RS-CMSA can gradually learn the structure of the fitness landscape, the contribution of which is significant.

It might be concluded that the normalized Euclidean distance metric should be employed; however, since there could be situations in which the Mahalanobis distance metric may appear superior, it is selected as the default choice. Computation of , required by the Mahalanobis metric, is not expensive, since the core algorithm performs eigen decomposition of , from which can be easily computed.

## 4 Performance Evaluation

A number of the best available methods in the literature for multimodal optimization are selected. These algorithms are tested and compared to RS-CMSA on the CEC2013 test suite on multimodal optimization, which has also been employed for the CEC2015 and CEC2016 competitions on multimodal optimization. By selecting a standard and up-to-date test suite, we avoid any ad-hoc selection of test problems and biased evaluation of the methods.

### 4.1 Selection of the Methods

Several factors are considered to select the niching methods for comparing RS-CMSA, including the year of publication, strength of the reported numerical results, minimal tuning effort, and niche-radius independence. Following these criteria, the following methods are selected: NMMSO (Fieldsend, 2014), NEA2 (Preuss, 2012), LIPS (Qu et al., 2013), NSDE (Qu et al., 2012), PNPCDE (Biswas et al., 2014), and IPOP-CMA-ES (Auger and Hansen, 2005).

Each of the selected methods was demonstrated, in the corresponding publications, to surpass several other methods, including more traditional ones like crowding. For example, LIPS was compared in the referenced study with nine other niching methods, including crowding differential evolution, and was demonstrated to outperform all of them; therefore, we exclude the other nine algorithms. The other methods have been selected on a similar basis. Methods for which several problem-dependent parameters should be manually set have been excluded and the selected methods were tested either by using a default parameter setting or at most one parameter was tuned for each problem so that by performing an exhaustive search in the control parameter space, a reasonable and robust default value for the control parameter could be derived. Most of these methods were also tested and compared on more recent and challenging test problems including those proposed by Qu and Suganthan (2010) or employed in CEC2013 niching test suite (Li et al., 2013b). Although IPOP-CMA-ES is not tailored for multimodal optimization, it is employed as a benchmark to demonstrate the contribution of a niching strategy. We employed the default parameter setting for this method, except the increase factor, which may strongly affect the number of identified basins. *TolHhistFun* was set to . NMMSO, NEA2, and IPOP-CMA-ES were benchmarked without any ad-hoc parameter tuning; however, for PNPCDE, LIPS, and NSDE, population size was tuned for each problem in the referenced studies.

In practice, the exact number of minima is not known a priori; however, one usually has a rough idea of the desired number of minima. For benchmarking, it is assumed a rough estimate for is available which can be used for parameter setting. This estimate () is used for parameter setting as follows:

The optimum population size for LIPS, NSDE, PNPCDE, and NEA2 is a function of the number of global minima and probably problem dimension; however, our preliminary results suggested dependence on only.

The optimum increase factor for the population size in IPOP-CMA-ES depends on .

The only problem sensitive parameter of NMMSO is the swarm size, which was set to by Fieldsend (2014). can hardly be used to set this parameter, since the swarm size is intuitively independent of the number of global minima; therefore, the optimal swarm size is assumed to be proportional to the problem dimension only.

The optimum for RS-CMSA is presumably proportional to . We also investigate the case where is not available and a predefined should be used for the problem at hand ()

A parameter study is performed in the next section to find the best value of for each method. Performance sensitivity to accuracy of the estimated number of global minima is also analyzed by varying for a given problem.

### 4.2 Performance Measures

*MPR*) is computed as a function of in Equation 13, assuming that : where subscript 0 refers to the assumption that . is the average fraction of global minima found over 50 independent runs of the

*i*-th problem with respect to the

*j*-th target precision. is considered to be a too-loose tolerance and hence it is removed. The precision of the provided global minimum value for the 2D Shubert function (

*PID*= 6) was not sufficient, and thus for this problem, peak ratios for and are extrapolated to compute

*PR*for .

*MPR*is a function of both and : Robust mean peak ratio (

*RMPR*) computes the expected performance when varies from to : where is a random variable with probability density function of , which is assumed to be symmetric around the origin.

*RMPR*measures efficiency and robustness with respect to at the same time.

### 4.3 Results and Discussion

Several aspects of these methods are compared by post-processing the results. Figure 7 illustrates and for all the methods. The range of 99% confidence interval for these plots is fairly narrow (on the order of 0.001), and thus the effect of random nature of the results is ignored, unless the difference between two methods is in the same range. Figure 8 plots *PR* (averaged over all the target function tolerances) versus *PID* for each method when and . Figure 9 demonstrates *PR* versus *PID* for different values of , assuming .

Figure 7a demonstrates that for a large range of , RS-CMSA and outperform NMMSO and NEA2, and they all significantly surpass PNPCDE, LIPS, NSDE, and IPOP-CMA when is regarded.

Figure 7b demonstrates that for , (

*RMPR*= 0.850) and RS-CMSA (*RMPR*= 0.846) relatively outperform NMMSO (*RMPR*= 0.825) and NEA2 (*RMPR*= 0.804), and they all significantly surpass NSDE, LIPS, PNPCDE, and IPOP-CMA. It is remarkable that although IPOP-CMA is not specialized for multimodal optimization, it can fairly compete with LIPS and PNPCDE and to some extent, with NSDE.Superiority of over RS-CMSA is an unexpected and interesting observation. It demonstrates that the proposed method can still excel even if no estimate for the number of minima is available. The evaluation budget in the CEC2013 competition setup is not proportional to the number of global minima, which may justify suitability of a fixed for all the problems. A parameter setting scheme in which grows at a slower rate, for example, proportional to the square root of the estimated number of minima, can further improve

*RMPR*of RS-CMSA, at least on this test suite.The first five problems are the easy ones for which most niching methods can reach

*PR*= 1 within a small evaluation budget (see Figure 8); nevertheless, LIPS cannot reach a good*PR*for*PID*= 2, a 1D problem with two peaks at the bounds. This implies that LIPS cannot handle this situation efficiently, and some revisions of its operators seem necessary. IPOP-CMA cannot reach*PR*= 1 for the 4 problem, a classical and rather easy test problem for multimodal optimization.NMMSO outperforms , RS-CMSA, and NEA2 on 3D Shubert and Vincent functions (PID = 8, 9). For the composite functions (

*PID*= 11–20), a detectable domination of , RS-CMSA, and NEA2 over the other methods is observed.It is unlikely that for identical number of iterations, increasing results in a lower

*PR*, and hence performance decline for a larger is attributed to termination of the optimization process before convergence (see Figure 9). This means increasing/decreasing the evaluation budget can modify the optimum value of for each method.Figure 9 demonstrates that NEA2 has the minimum sensitivity to the value of when

*PR*for each problem is regarded. Aside from , this sensitivity is small for , RS-CMSA, and NMMSO as well, while it is quite significant for the other methods. The more critical point is that the optimum value of is highly problem dependent for these methods, which challenges finding a good universal parameter setting. For example, for NSDE; however, the highest*PR*for*PID*= 8 is achieved when , and increasing significantly reduces*PR*for this problem. The reason is depletion of the evaluation budget before convergence. It seems the sequential nature of NEA2 and IPOP-CMA and to some extent, RS-CMSA, mitigates this problem, since at least a few restarts are performed even if the evaluation budget might be small.

## 5 Summary and Conclusions

A new niching method called covariance matrix self-adaptation with repelling subpopulations (RS-CMSA) was developed in this study, which hybridizes and integrates several concepts and techniques from different existing methods, such as the taboo points in tabu search, normalized Mahalanobis distance, and the hill-valley function. RS-CMSA mainly consists of the covariance matrix self-adaptation evolution strategy (CMSA-ES), reinforced with elitism, as the core search engine of several equally sized subpopulations. Distribution of members of each subpopulation is reshaped by presence of taboo points to prevent searching previously explored or concurrently under exploration regions.

RS-CMSA was evaluated and compared with a number of the state-of-the-art niching methods on a recent test suite proposed for the CEC2013 competition on multimodal optimization. Robust mean peak ratio was proposed and utilized to compare the benchmarked methods, not only to quantify their success in finding different minima, but also to evaluate their performance sensitivity to accuracy of the given estimate for the number of global minima. According to the employed performance measures, RS-CMSA emerged as the most successful method, followed by NMMSO and NEA2. A considerable superiority of NEA2 and RS-CMSA over the other methods was observed on the more difficult composite functions.

RS-CMSA can adapt the niche radius parameter, referred to here as the normalized taboo distance, for each taboo point independently. This enables RS-CMSA to address the critical challenge of dissimilar and nonuniformly distributed minima, without dependence on a user-specified niche radius parameter. Another hallmark of this method is that no problem-dependent parameter tuning is required, although providing a rough estimate for the desired number of minima can improve the performance. Simplicity of the main niching idea, its robustness, and superior results amply demonstrate the usefulness and practicality of RS-CMSA as a potent niching evolutionary algorithm.

## Acknowledgments

Computational work in support of this research was performed at Michigan State University's High Performance Computing Facility. The source codes for NSDE and LIPS were provided by Ponnuthurai N. Suganthan. The source code of PNPCDE was provided by Subhodip Biswas. The source code of NMMSO was provided by Jonathan Fieldsend. The authors would like to thank them for sharing their source codes and providing guidelines to tune the control parameters. The authors would also like to thank Matt Ryerkerk for his comments on this article.

This material is based in part upon work supported by the National Science Foundation under Cooperative Agreement No. DBI-0939454. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.