Abstract

Several local search algorithms for real-valued domains (axis parallel line search, Nelder-Mead simplex search, Rosenbrock's algorithm, quasi-Newton method, NEWUOA, and VXQR) are described and thoroughly compared in this article, embedding them in a multi-start method. Their comparison aims (1) to help the researchers from the evolutionary community to choose the right opponent for their algorithm (to choose an opponent that would constitute a hard-to-beat baseline algorithm), (2) to describe individual features of these algorithms and show how they influence the algorithm on different problems, and (3) to provide inspiration for the hybridization of evolutionary algorithms with these local optimizers. The recently proposed Comparing Continuous Optimizers (COCO) methodology was adopted as the basis for the comparison. The results show that in low dimensional spaces, the old method of Nelder and Mead is still the most successful among those compared, while in spaces of higher dimensions, it is better to choose an algorithm based on quadratic modeling, such as NEWUOA or a quasi-Newton method.

1  Introduction

Local search algorithms still constitute a very popular class of optimization problem solvers. Often, they are easy to use, easy to understand, and even easy to construct. In the black box scenario, they are usually the first choice when an experimenter needs to solve a problem.

Evolutionary algorithms (EAs) are not primarily used to solve unimodal problems, for which local search algorithms are suitable. On the other hand, unimodal problems (1) often constitute basic test cases that should ensure that the EA does not fail on a trivial function, and (2) can be made hard for a large subset of EAs, for example, by making them ill-conditioned. Moreover, although a particular EA is effective on unimodal problems, it is usually not sufficient; we want it to also be efficient, that is, to be similarly fast as a good local optimizer (and of course, to also be effective and hopefully efficient on multimodal problems). The real-valued EAs should thus be compared to good exemplars of local optimizers.

The results of this article will also be of high interest for the designers of memetic algorithms, MAs (Moscato, 1989). MAs are hybrids between EAs and local search techniques, which often exhibit both the robustness of EAs and the speed of local search methods at the same time. It is an important asset to know which local search algorithm is suitable for a particular situation. This article compares several local search methods and shows which of them are good under which circumstances.

Seven different algorithms for real-valued black box optimization are compared.1 The first two fall into the class of the axis parallel line search, LS (Whitley et al., 1996). They differ in the method used for the univariate search along the axes. One of them, LSfminbnd, uses the MATLAB function , a univariate bounded local search method. The second one, LSstep, uses the STEP procedure (Swarzberg et al., 1994), a univariate bounded global search method.

The third method was proposed by Rosenbrock (1960). His algorithm, RA, expresses a certain amount of similarity to the line search: it also searches along some perpendicular directions in the space. However, in RA, the steps in individual directions alternate and the directions are adaptive.

The next method, valley exploration based on QR factorization, VXQR1 (Neumaier et al., 2011), is rather recent. Line searches along adaptive perpendicular directions are also part of this method, similar to RA. VXQR1 also adds quadratic modeling of the objective function and a certain amount of stochastic elements which should help it to prevent getting stuck in local optima.

The above mentioned four algorithms are described and their results are discussed in detail since we performed the experiments with them ourselves. We chose three other algorithms for this comparison as a reference. For the reference algorithms, we use the results obtained by others. Consequently, the description of these focuses on their main principles only, and the discussion is limited.

BFGS, a restarted quasi-Newton method with the BFGS update formula (as implemented in the MATLAB function ), and NEWUOA (Powell, 2006), a recent successful local search method, were selected since—similarly to VXQR1—they also model the objective function by a quadratic model, though of course in a different way. The last method in the comparison is the well-known Nelder-Mead simplex search method, NMSS (Nelder and Mead, 1965). It differs from the above mentioned algorithms in that it maintains a population of points (while the others use only a single point possibly complemented with certain kind of model of its neighborhood).

The COCO (Comparing Continuous Optimizers) methodology (Hansen, Auger, et al., 2009) was chosen as the tool for the comparison. The framework is able to show the differences among the algorithms at all stages of the search, not just after a certain number of evaluations, as is the usual practice. It was used as the basis of the black box optimization benchmarking (BBOB) workshops of the GECCO-2009 and 2010 conferences. The testbed consists of 24 carefully chosen noiseless benchmark functions (Hansen et al., 2009b) which represent various types of difficulties observed in real-world problems (ill-conditioning, multimodality, etc.). The dimension of the search space varied from two to 40 during the experiments.

The results of the algorithms obtained using the COCO framework were already separately presented as workshop articles (Pošík, 2009a, 2009b; Ros, 2009a, 2009b; Hansen, 2009). The results for VXQR1 are, however, presented for the first time and constitute one of the original contributions of this article. Another goal of this article is to collect the results of all the above mentioned algorithms, compare them conveniently in one place, provide a discussion of the pros and cons of the algorithms, and suggest successful exemplars of local search methods. In the above mentioned original articles, the discussion (if any) was based solely on the results of the respective algorithm and no comparison was made. We also discuss the results in more detail than the summary article of Hansen et al. (2010).

There are some prior works that attempt to systematically compare various direct search methods (e.g., Schwefel, 1995). These comparisons, however, differ in the set of chosen optimizers and in the set of benchmark problems. Even for the benchmark functions which the studies have in common with the BBOB function set, a direct comparison of the result is questionable. The COCO framework transforms individual design variables with smooth nonlinear monotonic functions to break the symmetries observed in many benchmark functions. It also uses the functions in a rotated and shifted form. All of these features make the functions effectively different and render the results incomparable.

The rest of the article is organized as follows. After describing the algorithms in Section 2, the experimental framework in Section 3, and the experiment and algorithm parameter settings in Section 4, the article continues with the presentation of the benchmarking results in Section 5 and discusses them in Sections 6 and 7. The discussion is broken down by the individual function groups and by the algorithms, respectively. The article is summarized and concluded in Section 8.

2  Local Search Algorithms

The algorithms used in the comparison are described in this section. In the experiments, they are embedded into the multi-start method, that is, they are restarted after they detect slow progress, or after the budget for a single run is exhausted. The following subsections describe the LS, RA, and VXQR1 algorithms in detail, and briefly also the reference algorithms, BFGS, NEWUOA, and NMSS.

2.1  Axis Parallel Line Search

The line search algorithm is one of the basic and simplest optimization algorithms. In any comparison, it should serve as a baseline algorithm (Whitley et al., 1996). The axis parallel line search is effective and often also efficient for separable functions. The results of the line search algorithm should thus indicate those test functions which are (nearly) separable or functions which are easy for algorithms exploiting separability. The line search algorithm is not expected to be effective for nonseparable functions.

The algorithm starts from a randomly selected point. Then it iterates through individual directions and optimizes the function with respect to the chosen direction, keeping the other solution components fixed. After the optimization of one direction, it moves to the best solution found and switches to another direction. If the solution does not change after going through all the directions, the algorithm finishes since a local optimum (with respect to its neighborhood) was found.

In this article, two multi-start versions of the axis parallel line search method are considered and compared; each trial begins in a different initial point chosen uniformly from the search space. The two versions differ in the univariate optimization technique used, either the MATLAB function , or the STEP algorithm. Both multivariate optimization algorithms based on the two above-mentioned univariate procedures are not invariant with respect to the search space rotation, and both algorithms also directly use the objective function values when deciding where to sample a new point, thus are not invariant with respect to the order-preserving transformations of the objective function.

2.1.1  Line Search with fminbnd

The MATLAB function (revision 1.18.4.11 was used) is based on the golden-section search and parabolic interpolation. It is able to identify the optimum of quadratic functions in a few steps. On the other hand, it is a local search technique; it can miss the global optimum (of the 1D function). Since this is a rather standard ready-to-use algorithm, it will not be described here in more detail.

2.1.2  Line Search with STEP

The acronym STEP stands for “Select The Easiest Point.” The STEP method (Swarzberg et al., 1994) is a univariate global search algorithm based on interval division. It starts from one interval initialized with xl and xu, lower and upper bound of the interval, with both points evaluated. In each iteration, it selects one interval and divides it in half by sampling and evaluating the point in the middle.

The STEP algorithm selects the interval used to sample the next point based on the interval difficulty, that is, by its belief of how difficult it would be to improve the best-so-far solution by sampling from the respective interval. The measure of the interval difficulty chosen in STEP is the value of the coefficient a from the quadratic function f(x)=ax2+bx+c which goes through both interval boundary points and somewhere on the interval reaches the value of (where is a small positive number, typically from 10-3 to 10-8, meaning an improvement of fbest by a nontrivial amount). An example of a few STEP iterations can be seen in Figure 1.

Figure 1:

Demonstration of the behavior of the STEP algorithm. Bold line: the objective function. Dots: previously sampled data points, interval boundaries. Dashed horizontal line: fbest level. Solid horizontal line: level. Parabolic curves: start and end in the interval boundaries and touch the objective level . Large square: last sampled data point. Vertical line: the place where the next point will be sampled. Numbers: the difficulty indices of the respective intervals—the coefficients a of the respective parabolas.

Figure 1:

Demonstration of the behavior of the STEP algorithm. Bold line: the objective function. Dots: previously sampled data points, interval boundaries. Dashed horizontal line: fbest level. Solid horizontal line: level. Parabolic curves: start and end in the interval boundaries and touch the objective level . Large square: last sampled data point. Vertical line: the place where the next point will be sampled. Numbers: the difficulty indices of the respective intervals—the coefficients a of the respective parabolas.

2.2  Rosenbrock's Method

The Rosenbrock algorithm, RA (Rosenbrock, 1960), is a classical local search technique for unconstrained black box optimization. It maintains the best-so-far solution and searches in its neighborhood for improvements. What distinguishes this algorithm from many other local search techniques is the fact that it also maintains a model of the current local neighborhood: it adapts the model orientation and size. This feature can be observed in many recent successful optimization techniques, for example, in CMA-ES (Hansen and Ostermeier, 2001).

The RA local search technique is depicted as Algorithm 1. The model of the local neighborhood consists of D vectors forming the orthonormal basis, and of D multipliers (or step lengths) , where D is the dimensionality of the search space. In each iteration, the algorithm performs a kind of pattern line search along the directions given by the orthonormal basis. If in one direction ei an improvement is found, then the next time (after trying all other directions), a point times farther in that direction is sampled; if no improvement is found in the ei direction, then the next time, a closer point on the other side is sampled (governed by the parameter). Usually, the values of parameters are and .

As soon as at least one successful and one unsuccessful move in each direction are carried out, the algorithm updates its orthonormal basis to reflect the cumulative effect of all successful steps in all directions. It also resets the multipliers to their original values (not used in the current implementation). The update of the orthonormal basis is done using Palmer's orthogonalization method (Palmer, 1969) so that the first basis vector is always parallel to the last vector xx0.

The demonstration of the RA behavior on the 2D sphere and the 2D Rosenbrock function can be seen in Figure 2.
formula
Figure 2:

The behavior of Rosenbrock's optimization algorithm on the sphere function (left) and on Rosenbrock's function (right). The circles indicate successful steps.

Figure 2:

The behavior of Rosenbrock's optimization algorithm on the sphere function (left) and on Rosenbrock's function (right). The circles indicate successful steps.

To improve the performance on multimodal functions, a restarting strategy was used. Each restart begins with an initial point uniformly chosen from the search space. The original RA resets the multipliers di at each stage of the algorithm (line 14 of Algorithm 1), that is, each time the orthonormal basis is updated. In the particular implementation used in this article, the multipliers are not reset. It was observed (Pošík, 2010) that this modification improves the results on many—mostly low-dimensional— benchmark problems; it spares some function evaluations needed to adapt the multipliers at each stage. Instead, it converges faster, allowing for more algorithm restarts.

With the exception of initialization, the algorithm is invariant with respect to translation and rotation. The algorithm is also invariant with respect to order-preserving transformations of the objective function since it uses only comparisons between two individuals.

2.3  Valley Exploration Based on QR Factorization (VXQR)

Based on the results of the BBOB-2009 comparison (Hansen et al., 2010), Neumaier et al. (2011) developed the class of VXQR algorithms (valley exploration based on QR factorizations) for bound-constrained optimization problems with the aim of preserving the advantages of the multilevel coordinate search, MCS (Huyer and Neumaier, 1999), in case of a low budget or low dimensions while improving the performance in case of a generous budget and in high dimensions. The main features of MCS, (1) a successful initialization strategy and (2) local searches building quadratic models, developed into the scout phase and the subspace phase of the new algorithm, respectively. The deterministic multilevel strategy of dividing boxes was replaced by stochastic techniques in order to prevent getting stuck in a nonglobal minimum. VXQR1 is a particular realization of this class of algorithms tuned to yield good results on a set of test problems.2 In the following paragraphs, we present the main characteristics of the algorithm.

The VXQR1 algorithm uses two kinds of line searches: the global line search and the local line search. Both contain random elements and form the stochastic part of the algorithm.

The global line search has two parts. In the first part, it tries to improve the current best point by evaluating the function at 10 new points (five on each side of the best point) on a random subsegment of a line along the search direction. This procedure is executed as long as an improvement of the best point is found, but at most 10 times. In the second part, a further improvement of the best solution is sought by safeguarded quadratic and piecewise linear interpolation steps made from the local minimizers among the 11 points from the last iteration of the first part.

The local line search evaluates the function at a random point along the search direction. Then it obtains the third point by reflecting the worse point at the better point. Then four more points are generated, always from the current best point, by choosing from several methods (safeguarded quadratic interpolation, geometric mean step, and piecewise linear interpolation) according to heuristic criteria.

The VXQR1 algorithm starts with an initial scaling phase. Afterward, the so-called scout phase and the subspace phase alternate until a stopping criterion is fulfilled. In the scaling phase, the algorithm searches for a well-scaled initial point with a finite function value. It is done by making a local line search from the initial point (an input parameter) in the direction of the point in the search region closest to the origin.

The scout phase consists of a sequence of local line searches in the direction of an orthonormal basis (to be efficient for nonseparable smooth problems), occasionally preceded or followed by global line searches in all coordinate directions (to be efficient for approximately separable problems). At the beginning of each scout phase, the orthonormal basis is created anew by a QR factorization with column pivoting of the matrix . The point xbest is the current best point. For the first scout phase, the points are randomly generated in the search space, and for all of the subsequent scout phases, they are the results of the line searches of the preceding scout phase.

In the subspace phase, an affine na-dimensional quadratic model is constructed, with . The saturation dimension namax depends on the dimension D of the problem: namax =2 for D=1, and namax =max(3, min((D+1)/2, 11)) otherwise. The affine subspace initially consists of the initial point. The point x obtained after a scout phase is either added to the affine basis if the affine basis has less than namax elements, or replaces the worst point otherwise. In the new subspace, a local quadratic model is created and minimized, subject to some safeguards. Then a local line search is performed from the current best point to the model minimizer. Since the local quadratic models are only generated in subspaces, these models can already be built when only a few evaluated points are available. The limit namax makes the quadratic models tractable for high dimensions.

Emphasis is put on a fast descent with a low number of function evaluations, not necessarily on finding the global minimum. Since all line searches start from the current best point, the algorithm has a greedy tendency.

2.4  Reference Algorithms

We selected three other local search algorithms for the comparison. The first of them is the quasi-Newton method with the BFGS update formula. Newton methods search for a stationary point of a function (a point with a zero gradient). They assume that in the neighborhood of an optimum, the function can be approximated by a quadratic function. Newton methods use both first-order (gradient) and second-order (Hessian matrix) information about the function. Quasi-Newton methods do not need the precise Hessian matrix; instead, they are able to approximate it based on the individual successive gradients. The individual quasi-Newton methods differ in the way they perform the update of the Hessian matrix. BFGS is one of such methods. In this article, BFGS denotes the restarted quasi-Newton method with the BFGS update formula (as implemented in the MATLAB function ). The details can be found in and the results are taken from Ros (2009a).

NEWUOA (Powell, 2006) was chosen as a competitor since it is a relatively recent optimization procedure with very promising reported results on various test functions. It is a deterministic (with the exception of initialization) local search procedure using quadratic modeling and a trust-region approach. The method maintains a quadratic model of the objective function in the trust region. Before each iteration, the model must interpolate the function at m points, with m typically equal to 2D+1, which is a much lower number of constraints than the number required to specify a full quadratic model. The remaining degrees of freedom are taken up by minimizing the Frobenius norm of the difference between the new and the old quadratic model. In this article, a restarted version of NEWUOA is used as described by Ros (2009b).

Any comparison of local search methods in the real-valued space would not be complete without the Nelder-Mead simplex search (NMSS) method. The algorithm is rather old (Nelder and Mead, 1965), but is still used very often—it survived the test of time, although it was shown by McKinnon (1999) that the algorithm can converge to a nonstationary point even on smooth functions. In the D-dimensional space, it maintains the so-called simplex, a set of D+1 points. Their relative positions and function values determine where the next point(s) will be sampled. Since the simplex changes its shape, and can become elongated or stretched, the algorithm is sometimes called amoeba. In this article, NMSS denotes the restarted version of the method as described by Hansen (2009).

3  Experimental Framework Description

The experiments presented in this article were carried out using the Comparing Continuous Optimizers (COCO) framework (Hansen, Auger, et al., 2009), which was also used as the basis for the black box optimization benchmarking workshop at the GECCO-2009 and 2010 conferences.

The numerical experiments are performed on a testbed consisting of 24 noiseless test functions (Finck et al., 2009a; Hansen, Finck, et al., 2009). These functions were constructed so that they reflect the real-world application difficulties and are categorized by function properties as multimodality, ill-conditioning, global structure, and separability. The role of the categories is to reveal the different aspects of the algorithms. All functions are scalable with dimension D. The search domain is [−5; 5]D, where D=2, 3, 5, 10, 20, 40. The functions have many instances differing in rotation and offset. Each algorithm is given 15 trials on each function.

An optimization problem is defined as a particular (function, requested target value) pair. Each function is used to define several optimization problems differing in the requested target value , where fopt is the optimal function value, and is the precision (or tolerance) to reach. The success criterion of a trial (for each optimization problem) is to reach the requested target value ft. Many precision levels are defined. If the optimizer finds a solution with the ultimate precision value 10-8, it actually solves many optimization problems along the way, and we shall say that it has found the optimum of the function, or in other words, that it solved the function. If the optimizer cannot reach the ultimate precision, it can gain some points for optimizing the function at least partially.

The main performance measure used in the COCO framework is the expected running time, ERT (Hansen, Auger, et al., 2009; Price, 1997). The ERT estimates the expected number of function evaluations needed to reach the particular target function value if the algorithm is restarted until a single success. The ERT thus depends on the given target function value, ft, and is computed as “the number of function evaluations conducted in all trials, while the best function value was not smaller than ft during the trial, divided by the number of trials that actually reached ft” (Hansen, Auger, et al., 2009).

The results are conveniently presented using the empirical cumulative distribution function (ECDF). It shows the empirical cumulative probability of success depending on the allocated budget. The ECDF of the ERT is constructed as a bootstrap distribution of the ERT divided by the problem dimension D. In the bootstrapping process, 100 instances of ERT are generated by repeatedly drawing single trials with replacement until a successful trial is drawn for each optimization problem.

Since the ECDF graphs do not express the reached function values, but rather the proportion of solved problems, it is possible to meaningfully aggregate the ECDF graphs for several functions of the same class into one graph. The downside of this aggregation is that we are not able to distinguish the individual functions. If a graph shows aggregated ECDFs of five functions for a certain dimension D, reaching the 20% level of solved problems after n evaluations may mean many things. On the one hand, the algorithm could have found the minimum of one of the five functions, while the other functions may remain completely unsolved. On the other hand, it may mean that only the problems related to the loose target levels were solved across all the aggregated functions. The latter case is the usual one. If the former explanation is the right one, we will point it out explicitly.

4  Algorithm and Experiment Parameter Settings

The following sections describe the experimental setup and the parameter settings of both LS methods, RA, and VXQR1. For the settings of the reference algorithms, we refer the reader to the original reports (Ros, 2009a, 2009b; Hansen, 2009). All the presented algorithms use the same parameter settings across all functions and dimensions.

The LS methods and Rosenbrock's algorithm were benchmarked using the BBOB-2009 settings, that is, the algorithms were run on the 24 benchmark functions, for five instances each, with three trials per instance. The VXQR1 algorithm was benchmarked using the BBOB-2010 settings, that is, the algorithm was run on 24 benchmark functions, for 15 instances each, with one trial per instance.

4.1  Line Search

The function does not have any parameters (except for the search space bounds and termination criteria, both of which are described later). The STEP algorithm has two parameters: the Jones factor and the maximum interval difficulty set to 107 (value determined by experimenting with the Rastrigin function).

All benchmark functions have their optimum in (Hansen, Finck, et al., 2009), yet both algorithms were ordered to find the optimum in the hypercube . This decision was made due to the function that almost never samples the boundaries of the interval, which would make it extremely difficult for the algorithm to find the solution, for example, for the linear slope function.

Each multistart trial was finished either (1) after finding a solution with a precision , or (2) after performing more than function evaluations. Each individual trial of the basic line search algorithm was interrupted (1) if any of the above mentioned criteria were satisfied, or (2) if two consecutive cycles over all directions ended up with solutions with distance lower than 10-10, in which case the algorithm was restarted from a new randomly generated point. Finally, the individual univariate searches were stopped as follows:

  • in the case of , when the target function value was reached, or the maximum allowed number of function evaluations for was reached (100), or when the boundary points of the interval got closer than 10-10, and

  • in the case of STEP, when the target function value was reached, or when the maximum allowed number of function evaluations was reached (1,000); moreover, an interval was not used for further sampling if its boundary points were closer than 10-10, or when the difficulty of the interval was higher than 107.

The difference in the maximal number of allowed evaluations (100 vs. 1,000) is due to the fact that is a local line search technique, that is, if the function is unimodal, it needs a relatively small number of evaluations to find the global optimum. A larger value would not help the algorithm on multimodal functions. On the other hand, STEP is a global line search technique, and it needs a larger budget to converge to the global optimum on multimodal functions.

4.2  Rosenbrock's Algorithm

The algorithm has two parameters, and , set to their default values: and . The algorithm was run in the unconstrained setting.

Each multi-start trial was finished either (1) after finding a solution with a precision , or (2) after performing more than function evaluations. Each individual run of the basic Rosenbrock algorithm was interrupted (1) after finding a solution with a precision , or (2) after performing more than the allowed number of function evaluations, or (3) after the model converged too much, that is, when maxi|di|<10-9, in which case the algorithm was restarted.

4.3  VXQR1

For all control variables in the algorithm, default values are implemented. The only parameters we used in our experiments were the ones determining the stopping criterion, namely, the limit nfmax on the number of function calls and reaching the target value , where fopt is the global minimum of the function. Each test function instance was solved with VXQR1 with a limit of nfmax =500max(D, 10) function evaluations. At most nine independent restarts of VXQR1 with the origin as the initial point were made if was not reached. That is, at most 10 independent attempts were made to solve each problem with VXQR1. The same function evaluation budget per call and 10 calls were also used in the experiments carried out by the coauthor with MCS (Huyer and Neumaier, 2009).

5  Results

The results from experiments according to Hansen, Auger, et al. (2009) on the benchmark functions (Finck et al., 2009b; Hansen, Finck, et al., 2009) are presented in Figures 3 and 4 (discussed in Section 6), and Figure 5 (discussed in Section 7). Only the results for D=5 (exemplar of “low” dimensionality) and D=20 (exemplar of “higher” dimensionality) are presented.

Figure 3:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 5D. The light best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Figure 3:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 5D. The light best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Figure 4:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 20D. The light best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Figure 4:

Empirical cumulative distribution of the bootstrapped distribution of ERT over dimension for 50 targets in 10[-8‥2] for all functions and subgroups in 20D. The light best ever line corresponds to the algorithms from BBOB-2009 with the best ERT for each of the targets considered.

Figure 5:

ERT divided by dimension versus dimension in log-log presentation for the target function value 10-8 (filled symbols). Different symbols correspond to different algorithms given in legend of f1 and f24. Hollow symbols give the average number of function evaluations divided by the dimension when there is no success. Horizontal lines give linear scaling, dashed give quadratic.

Figure 5:

ERT divided by dimension versus dimension in log-log presentation for the target function value 10-8 (filled symbols). Different symbols correspond to different algorithms given in legend of f1 and f24. Hollow symbols give the average number of function evaluations divided by the dimension when there is no success. Horizontal lines give linear scaling, dashed give quadratic.

Tables 1 to 10 (discussed in Section 6) give the ERT for the target precisions 101, 0, -1, -3, -5, -7 divided by the best ERT obtained during BBOB 2009 (given in the ERTbest row), together with a measure of its spread (the value typeset in parentheses with a smaller font gives the half of the range between the 10th and 90th percentile). Bold entries correspond to the three best values among the algorithms compared. The median number of conducted function evaluations is additionally given in italics, if . The #succ column is the number of trials that reached the final target fopt+10-8.

Each algorithm is tested if it improved the results obtained by a baseline algorithm. We chose the NMSS as the baseline in the 5D space, and the NEWUOA as the baseline in the 20D space, since these two algorithms were the most successful regarding the final success rate. The comparison should thus reveal the situations in which other algorithms are more suitable than these good solvers. The statistical significance is tested with the rank-sum test for a given target ft using, for each trial, either the number of needed function evaluations to reach ft (inverted and multiplied by −1), or, if the target was not reached, the best value achieved, measured only up to the smallest number of overall function evaluations for any unsuccessful trial under consideration if available. Entries with the symbol are statistically significantly better (according to the rank-sum test) than the baseline algorithm, with p=.05 or p=10-k where k>1 is the number following the symbol, with the Bonferroni correction of 24.

Table 1:
ERT on f1f5 in 5D over ERTbest obtained in BBOB-2009. The NMSS is used as the baseline for statistical comparisons.
1 Sphere
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 11 12 12 12 12 12 15/15 
LSfminbnd 6.0 6.3 6.7 6.8 6.8 6.8 15/15 
LSstep 92 121 129 132 132 132 15/15 
NMSS 1.5 3.3 5.4 9.2 13 17 15/15 
RA 2.9 4.2 5.5 8.7 12 15 15/15 
NEWUOA 1.1 1 1 1 1 1 15/15 
BFGS 1.2 1.1 1.1 1.1 1.1 1.1 15/15 
VXQR1 1.5 1.7 1.7 1.8 1.8 1.8 15/15 
2 Ellipsoid separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 83 87 88 90 92 94 15/15 
LSfminbnd 1 1 1 1 1 1 15/15 
LSstep 16 16 16 15 15 15 15/15 
NMSS 5.0 6.8 7.4 7.9 8.3 8.6 15/15 
RA 13 102 136 153 188 241 12/15 
NEWUOA 5.7 22 45 85 129 166 15/15 
BFGS 3.8 5.6 6.2 6.6 6.9 7.1 15/15 
VXQR1 2.8 3.2 4.1 5.9 12 22 15/15 
3 Rastrigin separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 716 1,622 1,637 1,646 1,650 1,654 15/15 
LSfminbnd 1 52     2e4 0/15 
LSstep 2.2 1 1 1 1 1 15/15 
NMSS 5.4 282 1,464 1,456 1,452 1,449 3/15 
RA 24 394     5e4 0/15 
NEWUOA 6.1 229     3e4 0/15 
BFGS 107      2e4 0/15 
VXQR1 0.32 0.48 0.94 0.97 1.0 1.1 15/15 
4 Skew Rastrigin-Bueche separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 809 1,633 1,688 1,817 1,886 1,903 15/15 
LSfminbnd 7.8      2e4 0/15 
LSstep 2.0 1 1 1 1 1 12/15 
NMSS 26      5e5 0/15 
RA 57      5e4 0/15 
NEWUOA 27 305     3e4 0/15 
BFGS 169      2e4 0/15 
VXQR1 0.28 0.96 2.0 1.9 1.9 2.2 15/15 
5 Linear slope 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 10 10 10 10 10 10 15/15 
LSfminbnd 13 14 14 14 14 14 15/15 
LSstep 141 160 160 160 160 160 15/15 
NMSS 2.5 4.1 4.2 4.2 4.2 4.2 15/15 
RA 4.0 4.2 4.2 4.2 4.2 4.2 15/15 
NEWUOA 1.3 1.5 1.5 1.5 1.5 1.5 15/15 
BFGS 1.9 3.0 3.1 3.1 3.1 3.1 15/15 
VXQR1 3.0 4.4 4.4 4.6 4.6 4.6 15/15 
1 Sphere
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 11 12 12 12 12 12 15/15 
LSfminbnd 6.0 6.3 6.7 6.8 6.8 6.8 15/15 
LSstep 92 121 129 132 132 132 15/15 
NMSS 1.5 3.3 5.4 9.2 13 17 15/15 
RA 2.9 4.2 5.5 8.7 12 15 15/15 
NEWUOA 1.1 1 1 1 1 1 15/15 
BFGS 1.2 1.1 1.1 1.1 1.1 1.1 15/15 
VXQR1 1.5 1.7 1.7 1.8 1.8 1.8 15/15 
2 Ellipsoid separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 83 87 88 90 92 94 15/15 
LSfminbnd 1 1 1 1 1 1 15/15 
LSstep 16 16 16 15 15 15 15/15 
NMSS 5.0 6.8 7.4 7.9 8.3 8.6 15/15 
RA 13 102 136 153 188 241 12/15 
NEWUOA 5.7 22 45 85 129 166 15/15 
BFGS 3.8 5.6 6.2 6.6 6.9 7.1 15/15 
VXQR1 2.8 3.2 4.1 5.9 12 22 15/15 
3 Rastrigin separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 716 1,622 1,637 1,646 1,650 1,654 15/15 
LSfminbnd 1 52     2e4 0/15 
LSstep 2.2 1 1 1 1 1 15/15 
NMSS 5.4 282 1,464 1,456 1,452 1,449 3/15 
RA 24 394     5e4 0/15 
NEWUOA 6.1 229     3e4 0/15 
BFGS 107      2e4 0/15 
VXQR1 0.32 0.48 0.94 0.97 1.0 1.1 15/15 
4 Skew Rastrigin-Bueche separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 809 1,633 1,688 1,817 1,886 1,903 15/15 
LSfminbnd 7.8      2e4 0/15 
LSstep 2.0 1 1 1 1 1 12/15 
NMSS 26      5e5 0/15 
RA 57      5e4 0/15 
NEWUOA 27 305     3e4 0/15 
BFGS 169      2e4 0/15 
VXQR1 0.28 0.96 2.0 1.9 1.9 2.2 15/15 
5 Linear slope 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 10 10 10 10 10 10 15/15 
LSfminbnd 13 14 14 14 14 14 15/15 
LSstep 141 160 160 160 160 160 15/15 
NMSS 2.5 4.1 4.2 4.2 4.2 4.2 15/15 
RA 4.0 4.2 4.2 4.2 4.2 4.2 15/15 
NEWUOA 1.3 1.5 1.5 1.5 1.5 1.5 15/15 
BFGS 1.9 3.0 3.1 3.1 3.1 3.1 15/15 
VXQR1 3.0 4.4 4.4 4.6 4.6 4.6 15/15 

For notation, see text.

6  Discussion by Function Group

In this section, the discussion of the results is broken down by the function groups. The discussion mostly applies to the presented results for 5D and 20D. For a discussion on the individual algorithms, see Section 7.

6.1  All Functions Aggregated

The results for all the functions are aggregated in the ECDF graphs of ERT for the 5D and 20D functions in Figures 3 and 4, respectively, in the upper left part.

In the 5D space, for low evaluation budgets #FEs<20D, NEWUOA was the most successful method solving almost 20% of the problems. For larger budgets, NMSS and VXQR1 solved the largest proportion of the problems (about 80%). NEWUOA and BFGS were similarly fast but solved only about 65% of the problems. RA reached this level as well, but was about 10 times slower. Both line search methods were slower and less successful than the other algorithms.

For the 20D problems, NEWUOA eventually solved almost 60% of the problems. Its success rate was the highest for almost all evaluation budgets, with the exception of a short range 40D< #FEs<100D, where RA dominated. Until about 2000D evaluations, BFGS closely followed NEWUOA and eventually solved about 50% of the problems. This level was also reached by VXQR1 and NMSS, but they were slower than NEWUOA and BFGS. RA and both LS algorithms were slow and eventually solved about 40% of the problems.

6.2  Separable Functions f1f5

The ECDF graphs of ERT for the 5D and the 20D separable functions f1f5 are aggregated in Figures 3 and 4, respectively, in the upper right part. Tables 1 and 2 contain the detailed results for the 5D and the 20D functions, respectively.

Table 2:
ERT on f1f5 in 20D over ERTbest obtained in BBOB-2009. The NEWUOA is used as the baseline for statistical comparisons.
1 Sphere
ftarget1e11e01e–11e–31e–51e–7#succ
ERTbest43434343434315/15
LSfminbnd 9.3 10 10 10 10 10 15/15 
LSstep 164 175 176 177 177 177 15/15 
NMSS 5.2 12 19 32 40 49 15/15 
RA 3.8 5.8 7.2 11 14 17 15/15 
NEWUOA 1.0 1.0 1.0 1.0 1.0 1.0 15/15 
BFGS 1 1 1 1 1 1 15/15 
VXQR1 1.4 1.5 1.5 1.5 1.6 1.6 15/15 
2 Ellipsoid separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 385 386 387 390 391 393 15/15 
LSfminbnd 1 1 1 1 1 1 15/15 
LSstep 17 17 17 17 17 17 15/15 
NMSS 7.0 7.8 8.6 10 11 12 15/15 
RA 1.4 1.6 5.8 29 73 73 14/15 
NEWUOA 18 42 71 125 174 219 15/15 
BFGS 20 24 26 27 28 28 15/15 
VXQR1 5.6 7.5 12 54   1e5 0/15 
3 Rastrigin separable 
ftarget 1e1 1e0 1e–1 1e–3 1e–5 1e–7 #succ 
ERTbest 5,066 7,626 7,635 7,643 7,646 7,651 15/15 
LSfminbnd       1e5 0/15 
LSstep 1.5 1 1 1 1 1 15/15 
NMSS       2e5 0/15 
RA