Abstract

Four methods for global numerical black box optimization with origins in the mathematical programming community are described and experimentally compared with the state of the art evolutionary method, BIPOP-CMA-ES. The methods chosen for the comparison exhibit various features that are potentially interesting for the evolutionary computation community: systematic sampling of the search space (DIRECT, MCS) possibly combined with a local search method (MCS), or a multi-start approach (NEWUOA, GLOBAL) possibly equipped with a careful selection of points to run a local optimizer from (GLOBAL). The recently proposed “comparing continuous optimizers” (COCO) methodology was adopted as the basis for the comparison. Based on the results, we draw suggestions about which algorithm should be used depending on the available budget of function evaluations, and we propose several possibilities for hybridizing evolutionary algorithms (EAs) with features of the other compared algorithms.

1.  Introduction

Global optimization is a subfield of applied numerical analysis which studies methods that should be able to find the globally optimal solution to an optimization problem. The issues and methods of global optimization are studied in several different communities. This article focuses on the mathematical programming (MP) and evolutionary computation (EC) communities.

The ultimate goal of global black box optimization—to make the fastest possible progress toward the best possible solution—is certainly common to both communities. There are, however, certain differences. The MP community strives for methods with sound theoretical properties. The methods often search the space systematically, build and use models of the objective function, and/or store all the points sampled during the run. Maintaining the models or using the archive of sampled points is very time-consuming and space-consuming. Due to practical limits in available CPU time and storage space, the MP community usually tests these algorithms using relatively small budgets of allowed function evaluations. As a result, the MP methods are designed to show good progress right from the beginning of the search. The EC community studies algorithms with roots in nature and biology, often using randomly initialized populations. An EC algorithm needs some time to move the population to a promising region of the search space, and its performance in the initial phases is usually not as good as it could be. On the other hand, EC methods usually do not use any complex model or solution-archive maintenance procedures, and can be tested with higher evaluation budgets. As a consequence of these differences, the findings based on experimental results in these communities are often contradictory. This contributes to the gap between these two communities, and despite the fact that we can learn a lot from each other, this does not happen very often.

This article adds a brick to the bridge between the MP and EC communities. It (re-)introduces several MP algorithms to the EC community and by means of experimental comparison, it highlights the differences in them, identifies the suitable algorithms for various goals and situations, and finally points out the features that may be profitable for the members of the EC community, and vice versa.

The article focuses on three MP methods with features not widely known in the EC community. These are complemented with two reference algorithms which are not discussed in such detail as the three main algorithms, since the experiments were not performed by us (the authors of this article) and we have thus only a limited experience with them. The first method chosen for the comparison is the DIRECT algorithm (Jones et al., 1993). It systematically samples points from the search space and does not contain any dedicated local search method. The second algorithm, MCS (Huyer and Neumaier, 1999), works on similar principles as DIRECT, but also contains a specialized local search procedure. Both algorithms were described as “good complete general purpose global optimization algorithms” by Neumaier (2004, p. 298). The third algorithm, GLOBAL (Csendes, 1988), is a multi-start method equipped with a filter meant to prevent starting a local search in the basin of attraction of an already known local optimum. To contrast the effect of this filter with a usual multi-start method, the restarted version of the NEWUOA algorithm (Powell, 2006) was also included in the comparison. It is a local optimizer proposed quite recently, but its reported results are promising. The final algorithm is the BIPOP-CMA-ES by Hansen (2009), a restarted version of the state of the art CMA-ES algorithm using different population sizes in individual restarts. It represents the only evolutionary approach in the comparison and serves as the baseline algorithm. All the methods are described in Section 2.

A suitable experimental framework must be chosen to discover the potentially profitable features of the algorithms. The framework must be able to show the differences among the algorithms at all stages of the search, not just after a certain number of evaluations, as is the usual practice. The COCO (comparing continuous optimizers) methodology (Hansen, Auger, et al., 2009) was chosen since it fulfills these requirements. It was used as the basis of the black box optimization benchmarking (BBOB) workshops of the GECCO 2009 and 2010 conferences. The testbed consists of 24 carefully chosen scalable noiseless benchmark functions (Hansen, Finck, et al., 2009) which represent various types of difficulties observed in real-world problems (ill-conditioning, multimodality, etc.). The COCO experimental framework is described in Section 3.

The results of the algorithms were already separately presented as workshop articles (Hansen, 2009; Pošík, 2009; Ros, 2009b), or as unpublished reports (Huyer and Neumaier, 2009; Pál et al., 2009). One of the original contributions of this article is to collect these results, compare them conveniently in one place, and provide a discussion of the pros and cons of the algorithms compared to each other. In the original articles, the discussion (if any) was based solely on the results of the respective algorithm and no comparison was made. We also discuss the results in more detail than the summary article of Hansen et al. (2010).

The setup of the experiments and the algorithm settings are described in Section 4. The results of the comparison are presented in Section 5. Sections 6 and 7 contain the discussion of the results broken down by the function group and by the algorithm, respectively. Section 8 summarizes the article and suggests several possible ways of using some of the MP principles to improve the evolutionary algorithms (EAs).

2.  The Compared Algorithms

All the described algorithms are iterative. They sequentially sample points from the real-valued search space , where D is the search space dimensionality. It is assumed hereafter that the points are evaluated as soon as they are sampled, and that the variables holding the best point found so far, xbest, its function value, fbest, and the number of objective function evaluations, are updated accordingly.

2.1.  DIRECT

The DIRECT algorithm was introduced by Jones et al. (1993). The algorithm name not only expresses that it belongs to the class of direct search algorithms, it also describes the main principle of the algorithm: the DIRECT acronym stands for DIviding RECTangles. A slightly modified MATLAB implementation of DIRECT by Finkel (2003) is used. Only the basic algorithm design principles are described here; for the detailed description, see the original article by Jones et al. (1993) or the implementation description by Finkel (2003). The pseudocode is shown in Algorithm 1.
formula

The algorithm is a branching scheme which recursively divides the search space and forms a tree of hyper-rectangles (boxes). The leaves of the tree form a set of nonoverlapping boxes; in each time instant, the whole search space is completely covered by all the leaves. The point c in the middle of each box—the base point—is evaluated. Each box thus has two important characteristics: (1) the function value of its base point, and (2) the size of the box. There are many possible definitions of the box size; here, the distance from the basepoint to the box corner is used.

In each iteration, the algorithm decides which of the existing boxes should be split (see Algorithm 1, line 3). The potentially optimal boxes are identified using two design principles. It is expected that the chance of finding an improvement inside a box is proportional to

  • the fitness of the base point (exploitation), and to

  • the box size (exploration, global search).

The identification of the potentially optimal boxes is thus basically a multi-objective problem. Each iteration, all the nondominated boxes described by their size and their base point function value are divided by the algorithm (see Algorithm 1, lines 5–8). The division of boxes which are not potentially optimal, that is, small boxes and boxes with worse base points, is thus postponed to the following iterations.

The DIRECT algorithm does not contain any local search method which could be used to improve its efficiency. The algorithm is guaranteed to eventually sample a point arbitrarily close to the global optimum, if it is allowed to run for sufficient time, and if the splitting procedure is not constrained by a maximal depth.

2.2.  MCS

Inspired by DIRECT, the global optimization algorithm multilevel coordinate search (MCS) was developed by Huyer and Neumaier (1999) to minimize an objective function on a box [u, v] with finite or infinite bounds. The algorithm proceeds by splitting the search space into smaller boxes, but the splitting procedure is much more irregular than the one in DIRECT. By starting a local search from certain good points, an improved result is obtained. The pseudocode of the basic steps of MCS can be found in Algorithm 2. The implementation used for the experiments is available for download.1
formula

Each box in the partitioning process is characterized by (1) its bounds, (2) its base point, and by (3) its level . The function is evaluated at the base points. The base point may lie on the box boundary and therefore the same base point can be shared by two or more boxes. The level of a box is a rough measure of the number of times the box has been processed. Like DIRECT, the MCS algorithm combines exploration (splitting boxes with a large unexplored territory) and exploitation (splitting boxes with good function values). Boxes with the level smax are considered too small for further splitting. Whenever a box with the level s<smax is split, its descendants get the level s+1 or s+2. At each stage of the algorithm, the partitioning of the search space consists of a set of boxes with levels between 1 and smax.

The algorithm starts with the so-called initialization procedure (lines 2–5 of Algorithm 2). For each coordinate , at least three values in [ui, vi] are needed, where and the coordinates x0i of the initial point belong to . These values are used whenever a box is split in the coordinate i for the first time (in the initialization procedure or later). Splits are made (at the values of the initialization list and between) into at least , parts along each coordinate .

The main iteration loop (lines 6–20 of Algorithm 2) proceeds (in the absence of other stopping criteria) until all boxes of the current partitioning have the level smax. Additional stopping criteria, such as reaching a target function value or a limit on the number of function evaluations, are implemented but not shown in the pseudocode. In each iteration, the algorithm splits one box at each level, starting with the smallest nonempty level (i.e., with the largest boxes). When a box with the base point x is split, that is done along a single coordinate i and the function is evaluated at one or more points differing from x only in the coordinate i.

To split a box at the level s with the base point x and the bounds and , the algorithm has to choose the splitting dimension i and the position of the split (based on information gained from already sampled points). Two kinds of splits can occur, splitting by rank and splitting by expected gain.

  1. Splitting by Rank. If a box has already reached a high level but still has not been split very often in some coordinate i, the function is evaluated at a point obtained by changing the ith coordinate of x to a value depending on xi, , and , and the box is split into three parts.

  2. Splitting by Expected Gain. Otherwise, the splitting coordinate i and the ith coordinate of the new point are determined by building a separable local quadratic model around x and minimizing it, with safeguards to prevent too narrow splits. Two or three sub-boxes are obtained.

In both cases, the given recipes only apply to the case that the box has already been split along the coordinate i. If that is not the case, the function is evaluated at the points obtained by changing xi to the other values of the initialization list. The splits between two points where the function has been evaluated (according to the initialization list or otherwise) are not made symmetrically: the part with the lower function value gets the larger space. The larger parts of splits get the level s+1 and the smaller parts get the level min(s+2, smax).

MCS with local search (line 20) tries to accelerate the convergence of the algorithm by starting local searches from the points belonging to boxes of level smax. The local search algorithm essentially consists of building a local quadratic model by triple searches, then defining a promising search direction by minimizing the quadratic model on a suitable box and finally making a line search along this direction. This is repeated until the maximal number of iterations nsloc for the local search algorithm is reached, the algorithm does not make any further progress, or the estimated gradient becomes too small (unless one of the stopping criteria used for MCS is satisfied first).

If the number of levels smax goes to infinity, MCS is guaranteed to converge to the globally optimal function value if the objective function is continuous in the neighborhood of a global optimizer. This follows from the fact that the set of points sampled by MCS is a dense subset of the search space.

2.3.  GLOBAL

The stochastic global optimization method called GLOBAL (Csendes, 1988) was inspired by Boender et al. (1982) and was developed to solve bound constrained global optimization problems with black box type objective functions. The goal of GLOBAL is to find all local minima that are potentially global. For this purpose, it is equipped with a multi-start strategy and clustering to promote finding distinct local optima.

Based on the old GLOBAL method (Csendes, 1988), after a careful study, a new version (Csendes et al., 2008) was developed, which achieved better reliability and efficiency while allowing higher dimensional problems to be solved. In the new version, we use the quasi-Newton local search method with the BFGS update instead of the earlier DFP. The algorithm implementation is available for academic and nonprofit purposes.2
formula

The main steps of GLOBAL are summarized in Algorithm 3. As a multi-start method, GLOBAL iteratively samples new points from the search space X according to the uniform distribution (global phase, line 3 of Algorithm 3), and executes a local search procedure starting from some of those points (local phase, line 8). The GLOBAL differs from the other multi-start methods in two important aspects:

  1. Not all the points sampled during the global phase (the cumulated sample SC) are considered to be good candidates for starting the local search. Only the best percent of them are used (the reduced sample SR, line 4).

  2. These selected points are further filtered. The algorithm tries to prevent running a local search in the basin of attraction of an already detected local minimizer.

The filter is realized by a clustering procedure. The goal of the clustering procedure is to maintain one cluster per basin of attraction of a local optimum. GLOBAL uses the single linkage clustering rule (Boender et al., 1982; Rinnooy Kan and Timmer, 1987). The clusters are updated at each iteration and only grow with time. A new point is added to the cluster if it is within a critical distance from a point already in the cluster initiated by a seed point. If the new point is not close enough to any already clustered point, it remains unclustered and is thus a candidate for local search initiation. The seed points are the local optima found so far, that is, the members of the set. The distribution of all the clustered points approximates the level set of the function; each connected component of the level set (each cluster) then approximates the shape of one basin of attraction initiated by its respective local optimum .

The filtering (using clustering) is applied for each iteration after the reduced set creation (line 4), and after the identification of a new seed point (line 12). In the first iteration, is empty, and thus no clustering takes place. The applied critical distance depends on the total sample size |SC| and it is constructed in such a way that the probability that the local method will be started tends to zero when the size of the sample grows (Boender et al., 1982; Rinnooy Kan and Timmer, 1987). The algorithm stops the search when it does not find any new local minimizer during the last iteration.

2.4.  Reference Algorithms

Two other optimization algorithms were selected as competitors for the just described global search algorithms: the NEWUOA procedure and the bipopulation evolutionary strategy with covariance matrix adaptation (BIPOP-CMA-ES).

NEWUOA (Powell, 2006) was selected since it is a relatively recent optimization procedure with very promising reported results on various test functions. It is a deterministic (with the exception of initialization) local search procedure using quadratic modeling and a trust-region approach. The method maintains a quadratic model of the objective function in the trust region. Before each iteration, the model must interpolate the function at m points, with m typically equal to 2D+1, which is a much lower number of constraints than would be needed to specify a full quadratic model. The remaining degrees of freedom are taken up by minimizing the Frobenius norm of the difference between the new and the old quadratic model.

BIPOP-CMA-ES (Hansen, 2009) was chosen since it was one of the best algorithms in the BBOB-2009 comparison regarding the proportion of functions solved (Hansen et al., 2010). It is a multi-start strategy using the original CMA-ES algorithm (with slightly modified parameter values) as the basic local search engine. The individual restarts differ only in the population size. Two strategies of population size setting are interlaced. The first strategy multiplies its population size by a factor of two each time it is executed. The second strategy chooses the population size randomly, somewhere between the initial minimal population size and the half of the last population size used by the first strategy. Increasing the population size slows down the algorithm convergence; on the other hand, it results in a more global and robust search.

3.  Experimental Framework Description

The experiments were carried out using the COCO framework (Hansen, Auger, et al., 2009), which was also used as the basis for the black box optimization benchmarking workshop at the GECCO 2009 and 2010 conferences. The numerical experiments are performed on a testbed consisting of 24 noiseless test functions (Finck et al., 2009a; Hansen, Finck, et al., 2009). These functions reflect real-world application difficulties and are categorized by function properties as multimodality, ill-conditioning, global structure, and separability. The role of the categories is to reveal the different aspects of the algorithms. All functions are scalable with the dimension D and their search domain is [−5; 5]D. Each of the functions has five instances which differ in rotation and offset. The experiment shall be repeated three times for each instance, which means 15 trials for an algorithm on each function. Since DIRECT is a deterministic algorithm, only one trial of each instance was carried out.

An optimization problem is defined as a particular (function, requested target value) pair. Each function is used to define several optimization problems differing in the requested target value , where fopt is the optimal function value, and is the precision (or tolerance) to reach. The success criterion of a trial (for each optimization problem) is to reach the requested target value ft. Many precision levels are defined. If the optimizer solves a function to the ultimate precision value 10-8, it actually solves many optimization problems along the way, and we shall say that it has found the optimum of the function. If the optimizer cannot reach the ultimate precision, it can gain some points for optimizing the function at least partially.

The main performance measure used in the COCO framework is the expected running time, ERT (Hansen, Auger, et al., 2009; Price, 1997). The ERT estimates the expected number of function evaluations needed to reach the particular target function value if the algorithm is restarted until a single success. The ERT thus depends on the given target function value, ft, and is computed as “the number of function evaluations conducted in all trials, while the best function value was not smaller than ft during the trial, divided by the number of trials that actually reached ft” (Hansen, Auger, et al., 2009, p. 12).

The results are presented using the empirical cumulative distribution function (ECDF). The ECDF shows the empirical cumulated probability of success on the considered problems depending on the allocated budget. The ECDF of the ERT is constructed as a bootstrap distribution of the ERT divided by the problem dimension D. In the bootstrapping process, 100 instances of ERT are generated by repeatedly drawing single trials with replacement until a successful trial is drawn for each optimization problem.

Since the ECDF graphs express the proportion of solved problems, rather than the reached function values, it is possible to meaningfully aggregate the ECDF graphs for several functions of the same class into one graph. The downside of this aggregation is that we are not able to distinguish the individual functions. In an ECDF graph aggregating the results of five functions, reaching the 20% level of solved problems after n evaluations may mean many things. On the one hand, the algorithm could have found the minimum of one of the five functions, while the other functions may still remain completely unsolved. On the other hand, it may mean that only the problems related to the loose target levels were solved across all the aggregated functions. The latter case is the usual one. If the former explanation is the right one, we will point it out explicitly.

An additional measure used in COCO is the crafting effort (Price, 1997; Hoos and Stützle, 1998) that characterizes the parameter tuning effort for an algorithm. The crafting effort should be calculated for each dimension in the following way: , where K is the number of different parameter settings, n is the number of functions in the testbed, and each nk is the number of functions for which the kth parameter setting was used. The CrE is zero in a given dimension D when the setting was identical for all functions.

4.  Algorithm and Experiment Parameter Settings

This section describes the experimental setup and the parameter settings of DIRECT, MCS, and GLOBAL. For the settings of the reference algorithms, we refer the reader to the original reports (Ros, 2009b; Hansen, 2009).

All experiments were run using the BBOB-2009 settings which required us to benchmark all the algorithms in the dimensions D=2, 3, 5, 10, 20 and optionally in D=40. In this article, we do not consider the 40D case since the 20D space is already enough to show the main characteristics of the individual algorithms and to emphasize the differences among them.

4.1.  DIRECT

The DIRECT algorithm was not restarted; a single run was carried out and stopped after reaching the final precision or after 105 function evaluations.

The Jones factor is the minimal amount of improvement which is considered to be significant by the algorithm. The value was set to .

The maximal depth of the division tree was set to 21. It is roughly equivalent to setting the minimal allowed distance between two neighboring sampled points (under the assumption that the division always takes places along the shortest box side, which is not true). With the maximal depth set to 21, the theoretical minimal distance is on the order of 10-9, but it is larger in practice.

The initial bounding hypercube was set to despite the fact that all the benchmark functions have a global optimum in (Hansen, Auger, et al., 2009). Several of the benchmark functions have a global optimum near (or directly on) the search space boundary. Since DIRECT is pretty bad at approaching such solutions, the larger box was chosen.3

DIRECT is completely deterministic—only one run (instead of three) for each function instance was carried out. The same parameter settings were used for all experiments on all functions, and the crafting effort is CrE=0 for all D.

4.2.  MCS

MCS is equipped with meaningful default values for all parameters. We use the values smax =5D+10 (default) for the number of levels, for the limit on the overall number of function calls, (much larger than the default value 50) for the limit on the number of iterations in a local search, and reaching the final precision as an additional stopping criterion.

The bounding box is given by and v=−u. The default MCS initialization list for finite u and v consists of the boundaries and the midpoint, with the midpoint as the starting point. The second initialization list for finite bounds uses , , , , and again . The third option is to use global line searches along each coordinate, starting from the absolutely smallest point in the box, and generate at least three values for each coordinate. We call the MCS algorithm with these three kinds of initialization lists MCS1, MCS2, and MCS3, respectively. A user-defined initialization list is another option. After the initialization list has been chosen, MCS is purely deterministic.

In order to give MCS another chance to solve a problem in the case that the algorithm gets stuck in a nonglobal minimizer, and to introduce a random element in the algorithm at the same time (so that repeating the experiment three times for each instance becomes meaningful), we do not make a single call to MCS with a larger function evaluation budget but instead each experiment consists of up to 10 independent calls to MCS with the above parameters (i.e., each call to MCS does not use any results of the previous calls). First, MCS1, MCS2, and MCS3 are applied to the problem. Then initialization lists with the values x1i<x2i<x3i drawn uniformly from [ui, vi] for and x0i=x2i are used for at most seven times for the dimensions D=2, 3, 5 and at most five times for the dimensions D=10, 20 (in order to save CPU time).

Since the same parameter settings were used for all experiments on all functions, the crafting effort is CrE=0 for all D.

4.3.  GLOBAL

The COCO framework suggests a comparison of the multi-start versions of the base algorithms, that is, to conduct independent restarts during each trial. However, the GLOBAL algorithm itself is a multi-start procedure, so no restarts of GLOBAL were carried out.

GLOBAL has six parameters to set: the number N of points to sample in each iteration, the proportion of the best points selected for the reduced sample, the stopping criterion for the local search, the maximum number of function evaluations allowed for local search, the maximum number of local minima to be found (i.e., the maximum number of clusters to be maintained), and the type of local search method. All these parameters have a default value and usually it is enough to change only the first three of them.

In all dimensions and for all functions, we sampled N=300 new points;4, so that the reduced sample contains less than 1% of the best points ever sampled.

The following settings were used for D=2, 3, 5. We used the Nelder-Mead simplex method (Nelder and Mead, 1965) implemented in MATLAB by Kelley (1999) as the local search procedure. The termination tolerance parameter TolFun was set to 10-8 and the maximum number of function evaluations was equal to 5,000.

For D=10, 20, two different settings were used. For the functions f3, f4, f7, f16, and f23, we used the previous settings with the TolFun parameter set to 10-9. The reason for this choice was that the functions f7, f16, and f23 are not smooth, and the BFGS method performs poorer on them. On functions f3 and f4, the simplex method performs slightly better. For the remaining functions, we used the MATLAB fminunc function as the local search method using the BFGS update formula with 10,000 as the maximum number of function evaluations and with TolFun set to 10-9. The meaning of the termination tolerance TolFun is different for each of the two local search methods. In case of the Nelder-Mead simplex method, it is related to the diameter of the simplex, while in the case of BFGS, it relates to the size of the gradient.

The crafting effort CrE is equal to 0 for dimensions 2, 3, and 5. However, for D=10, 20, two different settings were used. The crafting effort can be calculated as .

5.  Results

The results from experiments according to Hansen, Auger, et al. (2009) on the benchmark functions (Finck et al., 2009b; Hansen, Finck, et al., 2009) are presented in Figures 1 and 2. Only the results for D=5 (exemplar of low dimensionality) and D=20 (exemplar of higher dimensionality) are presented.

Figure 1:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 5D. The best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Figure 1:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 5D. The best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Figure 2:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 20D. The best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Figure 2:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 20D. The best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Tables 1 to 10 give the ERT for the target precisions 101,0,-1,-3,-5,-7 divided by the best ERT obtained during BBOB 2009 (given in the ERTbest row), together with a measure of its spread (the value typeset in parentheses with a smaller font gives the half of the range between the 10th and 90th percentile). Bold entries correspond to the three best values among the algorithms compared. The median number of conducted function evaluations is additionally given in italics, if . The number of trials that reached the final target fopt+10-8 is given as #succ.

Table 1:
ERT on f1f5 in 5D over ERTbest obtained in BBOB-2009.
1 Sphere
ftarget1e11e01e–11e–31e–51e–7#succ
ERT11121212121215/15
BIPOP-CMA-ES3.29.01527405315/15
GLOBAL6.8262832353913/15
DIRECT2.07.01944841535/5
MCS11.82.52.62.62.615/15
NEWUOA1.11111115/15
2 Ellipsoid separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 83 87 88 90 92 94 15/15 
BIPOP-CMA-ES 13 16 18 20 21 22 15/15 
GLOBAL 6.3 6.9 7.3 7.8 8.2 8.5 15/15 
DIRECT 5.7 7.2 8.4 14 22 381 4/5 
MCS 1.1 1.5 2.2 4.7 6.5 29 14/15 
NEWUOA 5.7 22 45 85 129 166 15/15 
3 Rastrigin separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 716 1,622 1,637 1,646 1,650 1,654 15/15 
BIPOP-CMA-ES 1.4 16 139 139 139 140 14/15 
GLOBAL 3.3      2613 0/15 
DIRECT 45 304     1e5 0/5 
MCS 1.2 24 216 215 214 214 2/15 
NEWUOA 6.1 229     3e4 0/15 
4 Skew Rastrigin-Bueche separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 809 1,633 1,688 1,817 1,886 1,903 15/15 
BIPOP-CMA-ES 2.7      2e6 0/15 
GLOBAL 8.3      3,167 0/15 
DIRECT 192 105 249    1e5 0/5 
MCS 4.1      5e4 0/15 
NEWUOA 27 305     3e4 0/15 
5 Linear slope 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 10 10 10 10 10 10 15/15 
BIPOP-CMA-ES 4.5 6.5 6.6 6.6 6.6 6.6 15/15 
GLOBAL 32 33 34 34 34 34 15/15 
DIRECT 9.2 12 13 13 13 13 5/5 
MCS 1 1 1 1 1 1 15/15 
NEWUOA 1.3 1.5 1.5  1.5 1.5 1.5 15/15 
1 Sphere
ftarget1e11e01e–11e–31e–51e–7#succ
ERT11121212121215/15
BIPOP-CMA-ES3.29.01527405315/15
GLOBAL6.8262832353913/15
DIRECT2.07.01944841535/5
MCS11.82.52.62.62.615/15
NEWUOA1.11111115/15
2 Ellipsoid separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 83 87 88 90 92 94 15/15 
BIPOP-CMA-ES 13 16 18 20 21 22 15/15 
GLOBAL 6.3 6.9 7.3 7.8 8.2 8.5 15/15 
DIRECT 5.7 7.2 8.4 14 22 381 4/5 
MCS 1.1 1.5 2.2 4.7 6.5 29 14/15 
NEWUOA 5.7 22 45 85 129 166 15/15 
3 Rastrigin separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 716 1,622 1,637 1,646 1,650 1,654 15/15 
BIPOP-CMA-ES 1.4 16 139 139 139 140 14/15 
GLOBAL 3.3      2613 0/15 
DIRECT 45 304     1e5 0/5 
MCS 1.2 24 216 215 214 214 2/15 
NEWUOA 6.1 229     3e4 0/15 
4 Skew Rastrigin-Bueche separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 809 1,633 1,688 1,817 1,886 1,903 15/15 
BIPOP-CMA-ES 2.7      2e6 0/15 
GLOBAL 8.3      3,167 0/15 
DIRECT 192 105 249    1e5 0/5 
MCS 4.1      5e4 0/15 
NEWUOA 27 305     3e4 0/15 
5 Linear slope 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 10 10 10 10 10 10 15/15 
BIPOP-CMA-ES 4.5 6.5 6.6 6.6 6.6 6.6 15/15 
GLOBAL 32 33 34 34 34 34 15/15 
DIRECT 9.2 12 13 13 13 13 5/5 
MCS 1 1 1 1 1 1 15/15 
NEWUOA 1.3 1.5 1.5  1.5 1.5 1.5 15/15 

For notation, see text.

The BIPOP-CMA-ES algorithm was used as the baseline for the statistical comparison of the other algorithms studied in this article. Each algorithm is tested if it improved the results obtained by BIPOP-CMA-ES. The statistical significance is tested with the rank-sum test for a given target ft using, for each trial, either the number of needed function evaluations to reach ft (inverted and multiplied by −1), or, if the target was not reached, the best -value achieved, measured only up to the smallest number of overall function evaluations for any unsuccessful trial under consideration if available. Entries with the symbol are statistically significantly better (according to the rank-sum test) compared to the BIPOP-CMA-ES, with p=.05 or p=10-k where k>1 is the number following the symbol, with a Bonferroni correction of 24.

6.  Discussion by Function Group

In this section, the discussion of the results is broken down by function groups. The discussion mostly applies to the presented results for 5D and 20D. For a discussion on the individual algorithms, see Section 7.

6.1.  All Functions Aggregated

The results for all functions are aggregated in the ECDF graphs of ERT for the 5D and 20D functions in Figures 1 and 2, respectively, in the upper left part.

In the 5D space, for very low budgets of function evaluations (#FEs<20D), NEWUOA and MCS are (close to) the best of all algorithms ever compared using the BBOB methodology. They stay the best among the algorithms compared in this article for #FEs<200D. For 200D<#FEs<500D, GLOBAL takes over solving the highest proportion of the problems. But for budgets larger than 500D, BIPOP-CMA-ES is the best algorithm, solving almost 100% of the problems, while the other algorithms solved about 65% of the problems with GLOBAL being fastest, followed by NEWUOA, MCS and DIRECT.

In the 20D space, the differences start being more pronounced. For low evaluation budgets (#FEs<100D), NEWUOA holds the lead, closely followed by MCS. For 100D<#FEs<1000D, GLOBAL followed by NEWUOA is most successful. And again, for budgets larger than 1000D, BIPOP-CMA-ES is the best, solving about 92% of the problems, followed by NEWUOA, GLOBAL, MCS, and DIRECT solving about 60%, 50%, 40%, and 20% of the problems, respectively.

6.2.  Separable Functions f1f5

The results for the separable functions f1f5 are aggregated in the ECDF graphs of ERT for the 5D and 20D functions in Figures 1 and 2, respectively, in the upper right part. The detailed results are presented in Table 1 for the 5D functions, and in Table 2 for the 20D functions.

Table 2:
ERT on f1f5 in 20D over ERTbest obtained in BBOB-2009.
1 Sphere
ftarget1e11e01e–11e–31e–51e–7#succ
ERT43434343434315/15
BIPOP-CMA-ES7.9142033455715/15
GLOBAL8.08.08.08.08.08.015/15
DIRECT481122254858741,3934/5
MCS2.46.46.87.07.07.015/15
NEWUOA1.01.01.01.01.01.015/15
2 Ellipsoid separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 385 386 387 390 391 393 15/15 
BIPOP-CMA-ES 35 40 44 47 48 50 15/15 
GLOBAL 18 23 26 33 51 63 13/15 
DIRECT 134 471 487 537   1e5 0/5 
MCS 5.4 14 21 43 45  8e4 0/15 
NEWUOA 18 42 71 125 174 219 15/15 
3 Rastrigin separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 5,066 7,626 7,635 7,643 7,646 7,651 15/15 
BIPOP-CMA-ES 12      6e6 0/15 
GLOBAL       5e4 0/15 
DIRECT       1e5 0/5 
MCS 28      8e4 0/15 
NEWUOA       1e5 0/15 
4 Skew Rastrigin-Bueche separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 4,722 7,628 7,666 7,700 7,758 1.41e5 9/15 
BIPOP-CMA-ES       6e6 0/15 
GLOBAL       8e4 0/15 
DIRECT       1e5 0/5 
MCS       8e4 0/15 
NEWUOA       2e5 0/15 
5 Linear slope 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 41 41 41 41 41 41 15/15 
BIPOP-CMA-ES 5.1 6.2 6.3 6.3 6.3 6.3 15/15 
GLOBAL 10 11 11 11 11 11 15/15 
DIRECT 180 224 226 226 226 226 5/5 
MCS  1 1 1 1 1 1 15/15 
NEWUOA 1.2 1.5 1.6 1.6 1.6 1.6 15/15 
1 Sphere
ftarget1e11e01e–11e–31e–51e–7#succ
ERT43434343434315/15
BIPOP-CMA-ES7.9142033455715/15
GLOBAL8.08.08.08.08.08.015/15
DIRECT481122254858741,3934/5
MCS2.46.46.87.07.07.015/15
NEWUOA1.01.01.01.01.01.015/15
2 Ellipsoid separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT 385 386 387 390 391 393 15/15 
BIPOP-CMA-ES 35 40 44 47 48 50 15/15 
GLOBAL 18 23 26 33 51 63 13/15 
DIRECT 134 471 487 537   1e5 0/5 
MCS 5.4 14 21 43 45  8e4 0/15 
NEWUOA 18 42 71 125 174 219 15/15 
3 Rastrigin separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERT