Abstract

The solution of many combinatorial optimization problems is carried out by metaheuristics, which generally make use of local search algorithms. These algorithms use some kind of neighborhood structure over the search space. The performance of the algorithms strongly depends on the properties that the neighborhood imposes on the search space. One of these properties is the number of local optima. Given an instance of a combinatorial optimization problem and a neighborhood, the estimation of the number of local optima can help not only to measure the complexity of the instance, but also to choose the most convenient neighborhood to solve it. In this paper we review and evaluate several methods to estimate the number of local optima in combinatorial optimization problems. The methods reviewed not only come from the combinatorial optimization literature, but also from the statistical literature. A thorough evaluation in synthetic as well as real problems is given. We conclude by providing recommendations of methods for several scenarios.

1  Introduction

Metaheuristic algorithms have been proved as efficient methods for solving hard combinatorial optimization problems. Most of these methods are based on, or use a kind of local search that relies on a neighborhood structure over the search space. The properties of this neighborhood can cause dramatic differences in the performance of the local search methods (Mattfeld and Bierwirth, 1999; Reeves and Eremeev, 2004; Kirkpatrick and Toulouse, 1985; Hertz et al., 1994; Fonlupt et al., 1999; Tomassini et al., 2008). One of the characteristics that the neighborhood imposes on the search space is the number of local optima. This property has attracted much attention because it seems to be related to the difficulty of finding the global optima (Eremeev and Reeves, 2002; Albrecht et al., 2008, 2010; Reeves and Aupetit-Bélaidouni, 2004; Reeves, 2001; Grundel et al., 2007; Garnier and Kallel, 2001; Caruana and Mullin, 1999; Eremeev and Reeves, 2003). Therefore, the number of local optima has been considered as an indirect complexity measure of an instance when solving it with a local search algorithm. Moreover, as the number of local optima depends on the neighborhood, it would be useful to take it into account in order to choose the most suitable neighborhood to solve a particular problem instance efficiently.

Unfortunately, in practice, given an instance of a combinatorial optimization problem and a neighborhood, we do not know the number of local optima in advance. Thus, the development of methods that efficiently estimate the number of local optima seems to be a requirement in order to design algorithms that work in the right neighborhood. This work is connected to the area of evolutionary computation as the analysis of the set of local optima in a landscape associated to the problem can be related to the investigation of some properties of evolutionary algorithms, such as properties of the stable steady-state points in genetic algorithms (Vose and Wright, 1995; Reeves, 2002).

While several approaches have been proposed to estimate and bound the expected number of local optima for combinatorial optimization problems (Grundel et al., 2007; Albrecht et al., 2008, 2010), the literature is not so extensive when it is about estimating the number of local optima of a particular instance. Still, it is possible to find works on this topic. A key aspect to take into account when designing a method to estimate the number of local optima of an instance is the distribution of the sizes of the attraction basins (informally, an attraction basin of a local optimum is the set of initial solutions that, after applying a local search algorithm to them, end at that local optimum). Most of these approaches assume that the attraction basins are equally sized (Caruana and Mullin, 1999; Eremeev and Reeves, 2003), and they propose methods to obtain lower bounds for the number of local optima. Under this assumption on the attraction basins, there are works where biased estimators are obtained, and they try to correct this bias to provide an unbiased estimator (Eremeev and Reeves, 2002; Reeves, 2001; Reeves and Aupetit-Bélaidouni, 2004). On the other hand, there are papers where the sizes of the attraction basins are assumed to fit a certain type of parametric distribution, such as gamma or lognormal. For example, in Garnier and Kallel (2001), the authors assume a gamma distribution for the relative sizes of the attraction basins.

The problem of estimating the number of local optima can be compared with a well-known problem in biology: Estimating the number of different species in a population. In the statistics field, we can find plenty of algorithms and methods used to estimate the number of species in a population. In fact, Eremeev and Reeves (2003) noted the connection between these two problems, and they applied the Schnabel-Census method, already used for estimating species, for estimating the number of local optima. In Bunge and Fitzpatrick (1993), as well as in Seber (1986, 1992); Schwarz and Seber (1999), an exhaustive classification of methods for estimating the number of species is given. In those works the literature is organized depending on the sampling model, population specification, and philosophy of the estimation. We can also find recent papers (Chao and Bunge, 2002; Wang, 2010) that give estimators for the number of classes by assuming that the species abundance follows a Poisson-Gamma model, and others (Wang and Lindsay, 2005, 2008) in which estimators are based on the conditional likelihood of a Poisson mixture model. Unfortunately, there are extreme difficulties associated with estimating the population size (Link, 2003).

In this work, we present an evaluation of methods for estimating the number of local optima, that not only use the methods proposed in the combinatorial optimization field, but also those developed for the species problem in the statistics arena. After describing them in detail, we test their performance under three different scenarios:

  1. Simulated instances of combinatorial optimization problems.

  2. Random instances of the traveling salesman problem (TSP).

  3. Instances of the TSP with real distances between cities, and instances of the flow shop scheduling problem (FSSP) obtained from the well-known Taillard's benchmark.

The rest of the paper is organized as follows. The preliminary mathematical background is given in Section 2. In Section 3 we explain in detail the selected estimate methods. Section 4 shows the experimental results when applying the methods to synthetic data as well as to real instances, and discusses the results observed for the different methods, providing clues to help select the most suitable estimation algorithm for a given instance. Finally, the conclusions are presented in Section 5.

2  Preliminaries

A combinatorial optimization problem (COP) consists of finding an optimal solution of (from now on, minimizing) a function
formula
where the search space is a finite or countable infinite set.
A common way of solving these problems is by means of metaheuristic algorithms, most of which use a kind of local search that is based on a neighborhood structure. A neighborhood N in a search space is a mapping that assigns a set of neighbor solutions to each solution :
formula
Based on the definition of neighborhood, we say that a solution is a local optimum if (local minimum) or if (local maximum). Obviously, one solution can be a local optimum under a neighborhood N1, but not when considering a different neighborhood N2. If is a solution such that then is a global optimum. Clearly, the global optima are local optima for any neighborhood.
formula
A concept that plays an important role in this paper is that of basin of attraction of a local optimum . The basin of attraction of a local optimum depends on the neighborhood, however, we will omit the neighborhood from the notation in order to simplify it. Roughly speaking, an attraction basin is composed of all the solutions that, after applying a local search algorithm starting with those solutions, finishes in . In our case, we use a deterministic best-improvement local search (see Algorithm 1). It is important to note that the neighbors are evaluated in a specific order, so that, in the case of two neighbors having the same function value, the algorithm will always choose that which was encountered first. We denote by the operator that associates, to each solution x, the local optimum obtained after applying the algorithm. So, the attraction basin of a local optimum is the set that can be defined in the following way:
formula
The relative size of the attraction basin related to the search space , that is, the proportion of solutions of the whole search space that belong to the basin , denoted as , is of special relevance for the estimation of the number of local optima. Given the deterministic nature of , an important property of this concept is that the attraction basins of the local optima define a partition of .

The problem we are considering in this paper, that is, estimating the number of local optima, can be considered as a classical estimation problem in statistics. Therefore, the most common way of solving it is by constructing an estimator by collecting a set of statistics from a sample. Due to the fact that we want to estimate the number of local optima, our sample has to be related to the local optima. Basically, most of the presented methods start by taking a uniformly distributed random sample of size M. The local search algorithm is applied to each solution in S, and from the M local optima obtained we get the r () different ones into the set .

Two important statistics that will be used for the estimation methods are and . We denote by () the number of initial solutions of the sample that belong to the attraction basin of the local optimum :
formula
With this information, the following statistic is calculated:
formula
So, is the number of local optima that have been seen exactly j times in the sample. In the following section, we call the number of local optima in the search space that have not been found in the sample. Note that , . Two interesting relations are the following:
formula

3  Estimation Methods

In this section we present a review of the methods proposed in the literature for estimating the number of local optima, as well as the most widespread methods for estimating the number of species in a population. Table 1 shows the methods collected from the combinatorial optimization and also from the statistics area. Inside the combinatorial optimization field we find the following methods: a method based on the birthday problem (Caruana and Mullin, 1999), first repetition time (FRT), maximal first repetition time (MFRT), and Schnabel-census (Sch-Cen; Eremeev and Reeves, 2003), jackknife and bootstrap (Jckk and Boots; Eremeev and Reeves, 2002), and a method based on Gamma distributions (Garnier and Kallel, 2001). In the statistics field, the selected methods are: Chao1984 (Chao, 1984), ChaoBunge (Chao and Bunge, 2002), ChaoLee1 and ChaoLee2 (Chao and Lee, 1992), a Poisson-compound Gamma model (Wang, 2010) and a penalized nonparametric maximum likelihood approach (Wang and Lindsay, 2005).

Table 1:
A classification of the estimate methods selected on both the combinatorial optimization field and the statistics area, according to the sampling model. The bold type signifies the methods analyzed in this paper, with abbreviations.
Sampling model Method (Abbreviation) Reference 
Combinatorial optimization Method based on the birthday problem Caruana and Mullin (1999
Multinomial Confidence intervals First repetition time (FRTEremeev and Reeves (2003
  Maximal first repetition time (MFRT 
  Schnabel-Census (Sch-Cen 
 Bias correction Jackknife (Jckk) Bootstrap (BootsEremeev and Reeves (2002
Gamma Method based on a Gamma model Garnier and Kallel (2001
Statistics    
Multinomial Chao 1984  Chao (1984
 ChaoLee1 Chao and Lee (1992
 ChaoLee2  
Poisson-Gamma ChaoBunge  Chao and Bunge (2002
 Poisson-compound Gamma model Wang (2010
Mixed Poisson Penalized nonparametric maximum likelihood approach Wang and Lindsay (2005
Sampling model Method (Abbreviation) Reference 
Combinatorial optimization Method based on the birthday problem Caruana and Mullin (1999
Multinomial Confidence intervals First repetition time (FRTEremeev and Reeves (2003
  Maximal first repetition time (MFRT 
  Schnabel-Census (Sch-Cen 
 Bias correction Jackknife (Jckk) Bootstrap (BootsEremeev and Reeves (2002
Gamma Method based on a Gamma model Garnier and Kallel (2001
Statistics    
Multinomial Chao 1984  Chao (1984
 ChaoLee1 Chao and Lee (1992
 ChaoLee2  
Poisson-Gamma ChaoBunge  Chao and Bunge (2002
 Poisson-compound Gamma model Wang (2010
Mixed Poisson Penalized nonparametric maximum likelihood approach Wang and Lindsay (2005

We discarded some of the methods shown in Table 1, due to their poor performance (Caruana and Mullin, 1999), high dependence on the sample size (Garnier and Kallel, 2001), or high computation time (Wang and Lindsay, 2005, 2008; Wang, 2010) as observed in preliminary experiments. The methods analyzed in this paper are highlighted in bold in Table 1.

The five methods proposed in the field of optimization are explained in detail. The FRT, MFRT, and Sch-Cen are methods that provide lower bounds and that can be used for computing confidence intervals, while Jckk and Boots are bias-correcting nonparametric methods. FRT has attracted our interest because it is a parameterless method, whereas MFRT and Sch-Cen only depend on one parameter, which is the sample size. Thus, these three methods do not require too much computation time. Jckk is a method that also depends on the sample size and it is very fast. Boots not only needs the sample size, but also another parameter: the number of repetitions inside the method. Carrying out repetitions causes the method to take more time than the other methods in providing the estimated value.

In a second step, we present methods proposed in the field of statistics used by biologists and ecologists when determining how many different classes of species are in a population of plants or animals. They are nonparametric methods, but they are based on particular sampling models. Chao1984, ChaoLee1, and ChaoLee2 are based on multinomial sampling, while ChaoBunge is based on a mixed Poisson sampling model. The main reason for choosing these methods is that, according to preliminary experiments, they do not require too much computation time and, in general, they give very good estimates.

3.1  Methods Proposed in the Field of Optimization

3.1.1  Methods Used for Computing Confidence Intervals

In this section, we describe the methods proposed in Reeves and Eremeev (2004): FRT, MFRT, and Sch-Cen. These three methods assume that all the attraction basins of the local optima are equal in size. Under this assumption, and supposing a finite multinomial model, they give (with a high probability) lower bounds for the number of local optima. First, we explain the common aspects of these three methods, and then, the particular details of each of them are given.

Let us start by considering that the continuous distribution function Fv(t) of a random variable T that depends on a parameter v is known, and this distribution function is a strictly monotonically decreasing function on this parameter v, that is:
formula
At this point, the (1 – )*100% confidence interval for the parameter v is calculated. Thus, the goal is to find [v1, v2], with v1<v2, such that P() = 1 – , from the known distribution function . Taking into account that is an observed value sampled from this distribution, we obtain:
formula
Following the framework of Reeves and Eremeev (2004), we work with the fixed value . Thus, these methods will define v1 as a lower bound with probability 0.95 when , and consequently v2 is infinite.

Particularly, in the FRT, MFRT, and Sch-Cen methods, we denote v as the number of local optima. Let be the vector of probabilities of finding the corresponding local optima, that is, the relative sizes of the attraction basins of the local optima. If , then all the local optima have the same probability of being found. These methods calculate the distribution function according to a random variable T, which in each case will determine different concepts.

3.1.1.1 The First Repetition Time Method

The FRT method starts taking uniformly at random a solution x1 from the search space , and a local search algorithm (in our case, algorithm ) is applied to x1, ending at a local optimum . This process is repeated until a local optimum is seen twice.

The random variable T denotes, in this case, the number of initial solutions xi taken until a local optimum is repeated. The distribution function of T corresponding to the vector p is Fp(t). It can be proven (Reeves and Eremeev, 2004) that for any , Fp(t) is minimal only at , where v is the number of local optima.

Next, the distribution function is calculated for the variable T.
formula
where is the probability of finding none of them repeated in the t first optima: This yields
formula
Assuming that is the value obtained from the sample for the variable T, the estimate for the number of local optima v1 is given by the following formula:
formula

3.1.1.2 The Maximal First Repetition Time Method

In this case of the MFRT, a uniformly distributed random sample S of size M is taken from the search space: . Then, a local search is applied to each solution of the sample, so that M local optima are obtained. Note that not all of them have to be different. Then, starting from and taking the local optima in order of appearance, subsequences are created, where each ends with its first reoccurrence of a local optimum. It is as if we were repeating the FRT procedure many times. If no local optima is repeated, then the unique subsequence obtained is S1=S of size M.

The number of subsequences obtained is denoted by s. The variable that denotes the length of the jth subsequence is Tj, and T(s)=maxjTj represents the maximum length of all the subsequences.

The distribution function of the variable T(s) corresponding to the vector of probabilities of the local optima p is
formula
As in the previous case, for a fixed value of t and a fixed value of s, it can be proven (Reeves and Eremeev, 2004) that F(s)p is minimal when .
The distribution function for the variable T(s) is:
formula
If is the value obtained from the sample for the variable T(s), then the estimate for the number of local optima v1 is given by the following formula:
formula

3.1.1.3 Schnabel Census Procedure

The Sch-Cen method has also been used in ecology to provide an estimate of the size of a population of animals. It consists of taking a sample of size M and counting the number of distinct animals seen. However, we include it in this section because there are already studies (Eremeev and Reeves, 2002; Reeves and Eremeev, 2004) that have used this method to estimate the number of local optima in COP. The way to proceed in the case of estimating the number of local optima is similar to the problem of estimating the number of animals.

First, a uniformly distributed random sample of size M is taken from the search space , and a local search algorithm is applied to each solution in S, obtaining r different local optima , with .

Let R be the random variable that represents the number of different local optima found. The distribution function for the variable R, when a sample of size M has been taken from and when it corresponds to the vector of probabilities of the local optima p is:
formula
If , then , where S(M,i) is the Stirling number of the second kind, that is, the number of all possible partitions of an M-element set into i nonempty subsets. This yields
formula
The estimate for the number of local optima v1 is given by the following formula:
formula

3.1.2  Bias-Correcting Nonparametric Methods

In this section, we describe the application of two commonly used bias-correcting methods to the problem of estimating the number of local optima. These methods are Jckk and Boots, and their specific use in this context was proposed in Reeves (2001), Eremeev and Reeves (2002, 2003), and Reeves and Aupetit-Bélaidouni (2004). While in the original papers only the mechanics of the algorithm is provided, we have also added the assumptions of the methods, as we consider them relevant for our work.

Jckk and Boots are nonparametric methods based on ideas of resampling. Moreover, they use the concept of bias of an estimator (the difference between the estimated value and the real value) to improve an initial estimate. There is an important difference that the authors make between the application of Jckk and the application of Boots. In the Jckk method, no assumptions about the sampling model are made when calculating the initial biased estimate. However, the Boots method has an underlying assumption about the sampling model because it uses the maximum likelihood estimator as an initial biased estimator.

3.1.2.1 Jackknife

The Jckk method starts from a biased estimator , and assumes that the bias decreases asymptotically as the size of the sample increases. The underlying resampling technique consists of leaving the different points xi of the initial sample out, and finding estimators that reduce the bias of . The mean of these estimates is considered the Jckk estimator.

This method, in the same way as Sch-Cen procedure, was previously proposed for the estimation of population sizes (Burnham and Overton, 1978). In the context of estimating the number of local optima (Eremeev and Reeves, 2002), a uniformly distributed random sample of size M is taken from the search space. After applying a local search algorithm to each solution , the set of local optima is obtained, with different local optima.

Next, one point xi is left out from the sample S. The subset that contains the local optima that corresponds to all the solutions in is considered: . If this idea is repeated leaving each of the points out from the original sample once each time, we obtain M subsets , with different local optima.

The biased estimator of v is assumed to be of the form . Thus, the bias is on the order of . The purpose is to find an estimator that reduces the bias to . Thus, for each , the following estimator is defined:
formula
1
so that
Since
formula
2
then, from Equations (1) and (2) the estimator is:
formula
The Jckk estimator for the number of local optima is the mean value of ri:
formula
where is the number of local optima which have only been seen once.

Note that when the sample size tends to infinity, tends to 0, and therefore, the estimate for the number of local optima tends to the real number of local optima. Therefore, this is an unbiased estimator.

3.1.2.2 Bootstrap

The Boots method starts from a biased estimator , obtained from a sample S. Resamples from S are taken and the biased estimate is calculated in each case. With this information, an estimator of the bias is provided. Therefore, the result of adding the estimated bias to the initial is the Boots estimator.

In the application of this method to the estimation of the number of local optima (Eremeev and Reeves, 2002), we start from the set of r different local optima. Assuming a multinomial model (Reeves and Aupetit-Bélaidouni, 2004), with equally sized attraction basins, the probability distribution of the random variable R that represents the number of different local optima found in the sample S is given by
formula
where S(M, r) is again the Stirling number of the second kind. From this, the maximum likelihood estimate of v is obtained by solving the equation:
formula
If r/M is small (lower than 0.3), the best estimate for the number of local optima (Reeves and Aupetit-Bélaidouni, 2004) is actually r, because it is assumed that with small values it is likely that all local optima have been found. In this case it is considered that .

Afterward, a resample with replacement of the same size M from S is taken, obtaining r1 different local optima, and the maximum likelihood estimate of r1 is considered: . The same procedure is repeated b times, that is, b resamples with replacement from S are taken, obtaining different local optima, and the maximum likelihood estimate for each ri is calculated: .

With the maximum likelihood estimates for the number of local optima of the resamples and the maximum likelihood estimate for the number of local optima obtained in the original sample , the bias can be estimated and used as a bias correction for . The bias can be calculated as the difference between the maximum likelihood estimate of the number of local optima found with the original sample and the average maximum likelihood estimates of the number of local optima from the resamples:

Hence, the Boots estimator for the number of local optima is , so:
formula

A very important observation is that, as the sample size tends to infinity, the number of local optima found tends to be the real number of local optima. In addition, as r/M is very small (M is very large, and r is constant), the maximum likelihood estimate is the number of local optima found from the sample. Moreover, the bias tends to 0, so the estimate tends to be the real number of local optima.

3.2  Methods Proposed in the Field of Statistics

In this section, we present four nonparametric methods based on sampling models: Chao1984 (Chao, 1984), ChaoLee1, ChaoLee2 (Chao and Lee, 1992), and ChaoBunge (Chao and Bunge, 2002). Although they were proposed to estimate the number of species in a population, we explain here their specific application to our problem. An important consideration is that they assume an infinite population, while the common COP have a finite search space. However, we treat the search spaces as if they were infinite because of their large cardinality.

All of the methods presented below start from a sample of size M. A local search algorithm is applied to each solution and r different local optima are obtained.

3.2.1  Chao 1984 

Chao1984 is a nonparametric method proposed in Chao (1984) based on multinomial sampling that has been used to estimate the number of classes in an infinite population.

The estimator given by this method is the result of adding to the number of local optima obtained from the sample a quantity that depends only on the number of local optima seen once or twice in the sample.

This method is based on the estimate of the expected value of the number of unobserved local optima . Harris (1959) proved that if j2=O(M), then , where pi is the relative size of the attraction basin of the local optimum . The method considers the following distribution function:
formula
and it assumes that the number of unobserved local optima is of the following form:
We want to obtain an estimator of F(x) and thus, find an estimator of v that is the sum of r and the estimated number of unobserved local optima. That is,
formula
The moment estimates were proposed in Chao (1984) to obtain an estimator of F(x), and once attained, a lower bound for v was found. As the sample size M tends to infinity, the lower bound tends to the Chao1984 estimator:
formula

It is very important to take into account that this estimator works when the information is concentrated on the low order occupancy numbers, that is, when and carry most of the information. If , that is, if there is no local optima seen exactly twice from the sample, the method does not work. Moreover, is also very important in this estimator, because if , the estimate is just the number of local optima obtained from the sample. We find these situations, for example, when M is much higher than the real number of local optima. In this case, it is unlikely that the local optima will be found only once or twice. Furthermore, we can also find and when the attraction basins are close in size, because the different local optima are probably found the same number of times. Note that only if we force the method to return r as the estimate for the number of local optima when and , we can ensure that when M tends to infinity we obtain the real number of local optima.

3.2.2  ChaoLee1, ChaoLee2

Chao and Lee (1992) proposed two new estimators based on the estimators proposed by Esty (1985). The methods are also nonparametric, used for infinite population, and based on multinomial sampling.

They are the first methods that included the idea of sampling coverage. These estimators are the sum of the number of local optima observed many times, and quantities dependent on the number of local optima found few times in the sample.

Given a sample S, the sample coverage C is defined as the sum of the probabilities of the observed local optima. That is, the sum of the relative sizes of the attraction basins of the local optima found. Obviously, C is a random variable and an estimator of C (Good, 1953) is . Note that if all the local optima have the same probability of being chosen, that is, , then . In this case, an initial estimator of v is . Based on Esty (1985) and using , Chao and Lee proposed an estimator of v of the following form: , where is the coefficient of variation (with ).

Note that . Therefore, they suggested: .

Chao and Lee distinguished between what they call abundant species and rare species. Transferring these concepts to our problem, we distinguish between easy-to-find local optima and hard-to-find local optima. We define a local optimum as hard-to-find if for some cutoff value . Thus, the easy-to-find local optima are the such that . One can select this cutoff value in advance, but there are studies (Chao and Yang, 1993), that are based on empirical experience set . We call rh the number of hard-to-find local optima, and re the number of easy-to-find local optima.

Taking this distinction into account, the estimators are the following:
formula
where and .

It is important to take into account that if , the method does not work. This occurs when and . Neither does the method work when Mh=0 or Mh=1, that is, when , or when such that and . In these cases, we redefine the estimators as the number of the different local optima that appear in the sample r. Only under this consideration, we ensure having an unbiased estimator and therefore, obtaining the real number of local optima when M tends to infinity.

3.2.3  Chao and Bunge

ChaoBunge was a new estimate technique proposed by Chao and Bunge (2002). This estimator is also nonparametric in form, but it has some optimality properties under a particular parametric model. This method is based on a mixed Poisson sampling model. It is closely related to the estimators in Chao and Lee (1992).

This method bases the estimate of unobserved local optima on for some cutoff value , and then they complete the estimate by adding the number of local optima found more than times in the sample.

First, they showed that, for the Gamma-Poisson case, the expected proportion of duplicates in the sample, denoted by , can be estimated by Second, and based on this estimator, they proposed the following estimator for the expected number of unseen optima: Third, with this estimator of the unseen optima, the estimator of the number of local optima was defined as .

Finally, based on the data and the estimator , and taking into account the distinction between hard-to-find local optima (rh) and easy-to-find local optima (re), their final proposed estimator of v is:
formula
3
where only takes into account the hard-to-find local optima, and not all the local optima found in the sample.

Note that, if , then all the local optima are considered as hard-to-find, and therefore

It is important to take into account that if we have a sample in which and , then , and therefore the method does not work. In these cases, we consider that , and so the estimate is just r. Considering this point, as M tends to infinity, we obtain the real number of local optima.

4  Experiments

The accuracy of the different estimators presented in the previous section was tested on three different datasets: simulated instances of COPs, instances of the TSP taking random distances, and instances of the TSP with real distances between different cities, as well as instances of the FSSP obtained from the well-known Taillard's benchmark. Using these datasets, we can first test the methods over a wide set of instances with different characteristics (the dataset with simulated instances) and then check whether these conclusions can be generalized for artificial and real instances (second and third datasets). In these three scenarios, we work with problems for which we already know the number of local optima, which allows us to evaluate the accuracy of the different estimates. We report a comparison of the different methods, giving recommendations of the methods that provide the best estimates.

4.1  Synthetic Data

4.1.1  Experimental Design

The aim of this section is to study the performance of the methods when they are applied to instances with a different number of local optima and different distributions of the sizes of the attraction basins. Therefore, we are interested in a set of data that includes instances with attraction basins very similar in size, as well as instances with very different sizes of attraction basins. However, it is not easy to obtain many real or random instances with the desired characteristics. On the one hand, looking for the real number of local optima and the sizes of their attraction basins of a given instance of a COP would require high computation time. On the other hand, it is not easy to find instances in the literature with a high number of local optima that are realistic enough. These are the reasons why we simulated instances of COP, instead of working with real instances.

As far as the methods for estimating the number of local optima are concerned, an instance of a COP is determined by the number of local optima and the size of their attraction basins. Therefore, we can summarize an instance as the pair (v, p), where v denotes the number of local optima and , with , is the vector that gives the relative sizes of the attraction basins of the local optima. To this end, we create instances assuming a certain number of local optima and assigning to each local optimum a probability of being chosen (or a certain relative size of its attraction basin).

One way to do that is by sampling a Dirichlet distribution. When a Dirichlet distribution with v parameters is sampled, a vector with v values is obtained that fulfills the following two relations:
formula
We take advantage of this property of the Dirichlet distribution to simulate the relative sizes of the attraction basins of v local optima. We simplify the sampling by assuming , so that we sample Dirichlet distributions with only two parameters: D(v, d).

For the parameter v, we work with the following values: 100, 1, 000, and 10, 000. Regarding d, working with many values of this parameter would be unfeasible. We make an initial experiment to choose values of d that could simulate very different distributions of the sizes of the attraction basins. In order to choose these values, we sample different D(v, d), for v=100, 1, 000 and 10, 000, and . For each v and d we take 100 samples and calculate the variance of the v data obtained in each sample. Then, according to the average of the 100 variances of each case, we choose the values of d to simulate the instances. Figure 1 shows the average values of the 100 variances obtained for each combination of d and v.

Figure 1:

Average variance of the values of the samples obtained from D(v, d), for the different values of d, and for v=100, 1, 000, and 10, 000. The Y axis is at logarithmic scale.

Figure 1:

Average variance of the values of the samples obtained from D(v, d), for the different values of d, and for v=100, 1, 000, and 10, 000. The Y axis is at logarithmic scale.

Observing the plot, we choose the following values for d: 0.1, 0.2, 0.5 (high-medium variance) and 2, 4 (low variance). Once the values of d are decided, we obtain 10,000 samples of a Dirichlet distribution for each combination of d and v, that pretend to be 10,000 different instances of COPs for each case. This gives us a set of 150,000 instances divided into 15 equally sized groups according to v and d. Each method is run 100 times for each of the simulated instances using two different sample sizes: M=1, 000 and M=10, 000. The results provided are the average of the 100 values. Note that as FRT does not depend on the sample size, it is only applied 100 times for each instance.

4.1.2  Results

In this section we analyze the performance of the methods and compare them, taking into account the different parameters (v, d, and M). First, we check whether, the methods provide useful estimates. Second, we use nonparametric tests to rank the methods and study the significant differences among the observed results. Finally, a more qualitative study is developed, emphasizing an important characteristic, which is the stability of the methods.

The closeness of the estimates provided by the methods to the value we want to estimate is the most important factor. Obviously, there are methods that estimate better than others, but this does not mean that the estimates provided by the best methods are close enough to the real value. In order to check whether we are able to obtain good estimates with these methods, we choose for each combination of v, d, and M, the method that provides the best average estimation over all the 1,000,000 results (10,000 instances × 100 repetitions). In Figure 2 we represent the average estimate obtained with this best method and the real number of local optima (with a dashed line) for each v and d. As expected, the quality of the estimate depends on the parameters of the instances (v and d) and the sample size. For small values of d (0.1, 0.2, and 0.5, i.e., high variance of the sizes of the attraction basins) the estimates are really far from the real number. Furthermore, the higher the number of local optima, the worse the estimate (see Figure 2 (a, c, e) and (b, d, f)). For scenarios where the sizes of the attraction basins are quite similar (d=2, 4), the methods provide precise estimates. As regards the sample size, it can be observed that the larger the sample size, the better the estimates provided are (see Figure 2 (a–b), (c–d) and (e–f)). However, this improvement is not enough to reach accurate results in cases of low values of d.

Figure 2:

Boxplot that represents the estimations provided by the best method (on average over the 10,000 instances 100 repetitions) for the different values of d. The datasets are created assuming 100 (a, b), 1,000 (c, d), and 10,000 (e, f), local optima, which are indicated in each figure with a dashed line. In a, c, and e, the methods are applied with sample size 1,000, while in b, d, and f, the sample size is 10,000.

Figure 2:

Boxplot that represents the estimations provided by the best method (on average over the 10,000 instances 100 repetitions) for the different values of d. The datasets are created assuming 100 (a, b), 1,000 (c, d), and 10,000 (e, f), local optima, which are indicated in each figure with a dashed line. In a, c, and e, the methods are applied with sample size 1,000, while in b, d, and f, the sample size is 10,000.

Continuing with the study of the methods, we carry out a statistical analysis to compare the estimates obtained for the different methods. We consider three different scenarios for comparison according to the parameters of the study (M, v, d). In the first scenario considered, the estimates are grouped in two sets according to M=1, 000 and 10, 000. The second scenario considers three different sets that contain the estimates of the instances with v=100, 1, 000 and 10, 000 local optima. In the last scenario, the estimates are grouped in five sets, according to the parameter d=0.1, 0.2, 0.5, 2, 4. A nonparametric Friedman's test with level of significance is used to test whether there are statistically significant differences between the estimates provided by the nine methods in the different scenarios. It provides a ranking of the methods while also giving an average rank value for each method. As we always find statistical differences in all the cases, we proceed with a post hoc test which carries out all pairwise comparisons. In particular, we use Holm's procedure fixing the level of significance to . Pairwise significant differences are found between all of the methods in the three scenarios.

Table 2 shows the ranking obtained for the methods with the Friedman's test in the first scenario, when using a sample of size M=1, 000 (first pair of columns) and M=10, 000 (last pair of columns). The lower the rank, the worse the performance of the method is. The methods are ordered from best to worst. Therefore, the best methods when separating the estimates according to the sample size, are ChaoLee2 and ChaoBunge.

Table 2:
Average rankings of the methods according to the sample size M.
M= 1,000M= 10,000
MethodRankingMethodRanking
ChaoLee2 7.79 ChaoBunge 7.78 
ChaoBunge 7.03 ChaoLee2 6.88 
Chao1984 6.87 Chao1984 6.68 
ChaoLee1 6.50 Jckk 6.31 
Jckk 5.67 ChaoLee1 5.71 
Boots 4.59 Boots 4.71 
Sch-Cen 3.27 Sch-Cen 3.63 
MFRT 1.94 MFRT 1.88 
FRT 1.35 FRT 1.41 
M= 1,000M= 10,000
MethodRankingMethodRanking
ChaoLee2 7.79 ChaoBunge 7.78 
ChaoBunge 7.03 ChaoLee2 6.88 
Chao1984 6.87 Chao1984 6.68 
ChaoLee1 6.50 Jckk 6.31 
Jckk 5.67 ChaoLee1 5.71 
Boots 4.59 Boots 4.71 
Sch-Cen 3.27 Sch-Cen 3.63 
MFRT 1.94 MFRT 1.88 
FRT 1.35 FRT 1.41 

In Table 3, the ranking for the methods is shown, but this time for the second scenario, that is, when grouping the estimates for the instances created with v=100 (the first pair of columns), v=1, 000 (the pair of columns in the middle), and v=10, 000 (the last pair of columns). In these three cases, Holm's procedure states that significant differences exist among each pair of methods. From Table 3, we observe that the lower the number of local optima, the better the estimates provided by Chao1984 are. On the contrary, ChaoLee2 improves its performance as the number of local optima grows.

Table 3:
Average rankings of the methods according to the number of local optima v.
v= 100v= 1,000v= 10,000
MethodRankingMethodRankingMethodRanking
Chao1984 7.25 ChaoBunge 8.25 ChaoLee2 8.33 
ChaoBunge 7.05 ChaoLee2 7.30 ChaoLee1 7.06 
ChaoLee2 6.37 Chao1984 6.85 ChaoBunge 6.91 
Jckk 6.35 Jckk 6.03 Chao1984 6.24 
ChaoLee1 5.40 ChaoLee1 5.87 Jckk 5.58 
Boots 5.21 Boots 4.51 Boots 4.23 
Sch-Cen 3.93 Sch-Cen 3.20 Sch-Cen 3.22 
MFRT 2.22 MFRT 2.00 FRT 1.92 
FRT 1.22 FRT 1.00 MFRT 1.52 
v= 100v= 1,000v= 10,000
MethodRankingMethodRankingMethodRanking
Chao1984 7.25 ChaoBunge 8.25 ChaoLee2 8.33 
ChaoBunge 7.05 ChaoLee2 7.30 ChaoLee1 7.06 
ChaoLee2 6.37 Chao1984 6.85 ChaoBunge 6.91 
Jckk 6.35 Jckk 6.03 Chao1984 6.24 
ChaoLee1 5.40 ChaoLee1 5.87 Jckk 5.58 
Boots 5.21 Boots 4.51 Boots 4.23 
Sch-Cen 3.93 Sch-Cen 3.20 Sch-Cen 3.22 
MFRT 2.22 MFRT 2.00 FRT 1.92 
FRT 1.22 FRT 1.00 MFRT 1.52 

Finally, Table 4 shows the ranking obtained for the methods when the estimates are separated according to the instances created with the same Dirichlet parameter d. This ranking gives an idea of the method that is better to use according to the variance of the sizes of the attraction basins of the local optima. For high variance (small values of d) the recommended method is ChaoBunge, but ChaoLee2 also provides very good estimates. For instances with quite similar sizes of attraction basins, the best method is ChaoLee2.

Table 4:
Average rankings of the methods according to the Dirichlet parameter d.
d= 0.1d= 0.2d= 0.5
MethodRankingMethodRankingMethodRanking
ChaoBunge 8.37 ChaoBunge 8.02 ChaoBunge 7.39 
ChaoLee2 7.52 ChaoLee2 7.31 ChaoLee2 7.13 
Chao1984 7.07 Chao1984 7.01 Chao1984 7.08 
Jckk 6.30 Jckk 6.58 Jckk 7.01 
ChaoLee1 5.51 ChaoLee1 5.82 ChaoLee1 6.05 
Boots 4.06 Boots 4.07 Boots 4.13 
Sch-Cen 3.06 Sch-Cen 3.07 Sch-Cen 3.07 
MFRT 2.05 MFRT 1.90 MFRT 1.73 
FRT 1.05 FRT 1.23 FRT 1.40 
 d= 2 d= 4  
 Method Ranking Method Ranking  
 ChaoLee2 7.17 ChaoLee2 7.53  
 Chao1984 6.75 ChaoLee1 6.84  
 ChaoBunge 6.59 ChaoBunge 6.63  
 ChaoLee1 6.31 Chao1984 5.99  
 Boots 6.03 Boots 4.96  
 Jckk 5.16 Jckk 4.90  
 Sch-Cen 3.72 Sch-Cen 4.33  
 MFRT 1.80 MFRT 2.07  
 FRT 1.47 FRT 1.74  
d= 0.1d= 0.2d= 0.5
MethodRankingMethodRankingMethodRanking
ChaoBunge 8.37 ChaoBunge 8.02 ChaoBunge 7.39 
ChaoLee2 7.52 ChaoLee2 7.31 ChaoLee2 7.13 
Chao1984 7.07 Chao1984 7.01 Chao1984 7.08 
Jckk 6.30 Jckk 6.58 Jckk 7.01 
ChaoLee1 5.51 ChaoLee1 5.82 ChaoLee1 6.05 
Boots 4.06 Boots 4.07 Boots 4.13 
Sch-Cen 3.06 Sch-Cen 3.07 Sch-Cen 3.07 
MFRT 2.05 MFRT 1.90 MFRT 1.73 
FRT 1.05 FRT 1.23 FRT 1.40 
 d= 2 d= 4  
 Method Ranking Method Ranking  
 ChaoLee2 7.17 ChaoLee2 7.53  
 Chao1984 6.75 ChaoLee1 6.84  
 ChaoBunge 6.59 ChaoBunge 6.63  
 ChaoLee1 6.31 Chao1984 5.99  
 Boots 6.03 Boots 4.96  
 Jckk 5.16 Jckk 4.90  
 Sch-Cen 3.72 Sch-Cen 4.33  
 MFRT 1.80 MFRT 2.07  
 FRT 1.47 FRT 1.74  

From the statistical analysis, we can conclude that the worst methods in all scenarios are FRT, MFRT, and Sch-Cen. On the other hand, we cannot conclude that there is a best overall method for all scenarios. However, if we consider the first three best methods, ChaoBunge and ChaoLee2 are always among them. In the case of lack of information about the number of local optima of the instance, or the sizes of their attraction basins, using both of them, we will probably be obtaining more accurate estimates than by using any other method.

Although the statistical analysis gives a global picture of the performance of the methods, it is also relevant to consider some aspects that are not reflected in the hypothesis tests. One of these aspects is that of stability. Imagine that we have some instances with similar properties (the same number of local optima and similar distribution of sizes of the attraction basins). We are interested in knowing whether the method will provide comparable estimates for the different instances, or if they will be extremely different. In the first case, we say that the method is stable, while in the second, it is unstable. ChaoBunge is a very unstable method in certain situations, while the rest of the methods are very stable. Under some circumstances, ChaoBunge provides a very good estimate for the number of local optima for most of the instances, but there are instances where the given estimate is very far from v. For example, for v=100, M=1, 000, and d=0.1 (Figure 3(a)). In other situations, for example, for v=1, 000, M=1, 000, and d=2 (Figure 3(b)) it is the method that provides the best estimates of the number of local optima for all instances. In order to compare the performance of this method with ChaoLee2 in these particular scenarios, the estimates provided by ChaoLee2 are also reflected in Figure 3(b). Our website provides all the figures that represent the estimates given by each method for each v and each d.1

Figure 3:

Estimates of the number of local optima provided by (a) ChaoBunge and (b) ChaoBunge and ChaoLee2 for 10,000 synthetic instances created by sampling D(100, 0.1) and D(1, 000, 2), respectively. The sample size used in both cases is 1,000. In (a) we see the instability of the ChaoBunge method, but in (b) it is stable.

Figure 3:

Estimates of the number of local optima provided by (a) ChaoBunge and (b) ChaoBunge and ChaoLee2 for 10,000 synthetic instances created by sampling D(100, 0.1) and D(1, 000, 2), respectively. The sample size used in both cases is 1,000. In (a) we see the instability of the ChaoBunge method, but in (b) it is stable.

We conclude from the tables that ChaoBunge is one of the best methods, but we find specific situations where the estimate provided is very far from the real value we want to estimate. In order to know whether the estimate provided by ChaoBunge is valid, we could also apply other methods, such us ChaoLee2, and compare both results. If these estimates are close enough, the one provided by ChaoBunge could be accepted. Otherwise, if these estimates are very far one from each other, we are almost sure that ChaoBunge is giving a useless estimate. Therefore, we consider that a suitable way for estimating the number of local optima of an instance is not by using a single method, but a comparison of different methods.

4.2  Random Instances of TSP

4.2.1  Experimental Design

In order to contrast our initial conclusions, in this section we work with random instances of the TSP. Given a list of cities and their pairwise distances, the aim of this problem is to find the shortest tour that visits each city exactly once, returning to the initial city. Particularly, we work with random instances of the symmetric TSP with 14 and 15 cities. In the symmetric version, the distance from the city A to the city B is considered the same as from B to A. The instances were created by placing 14 and 15 points, respectively, uniformly at random on a square of area 100 in an Euclidean space (Gent and Walsh, 1996). Afterward, we calculated the matrix that gives the Euclidean distance between every pair of cities. We randomly created 500 instances with 14 cities, and 110 instances with 15 cities.

For the purpose of measuring the accuracy of the estimation methods, we first calculated the exact number of local optima of the instances when using a 2-exchange neighborhood (NS). The 2-exchange neighborhood considers that two solutions are neighbors if one is generated by swapping two elements of the other:
formula
By applying to each solution of the search space a deterministic local search algorithm (see Algorithm 1 in Section 2) with a 2-exchange neighborhood, the exact number of local optima of the instance and their corresponding sizes of the attraction basins are obtained. Note that in the symmetric TSP there are 2n permutations encoding the same solution. Therefore, we only take into account one of all these different representations, and thus we search in a space of size (n−1)!/2.

The different methods for estimating the number of local optima were applied to all the instances considering two sample sizes: M=1, 000 and M=10, 000. For each instance, the methods are repeated 100 times and we evaluate and compare the average estimates of each method.

4.2.2  Results

A first step in the analysis of the methods when applying them to random instances of the TSP is the study of the accuracy of the estimates provided by the methods. Second, a parameter d is associated with each instance and, as in the previous section, the performance of the methods is studied again according to d, v, and M.

In order to check whether the methods provide useful estimates, the average errors of the estimates with respect to the real number of local optima are calculated. Table 5 shows the average relative errors and the standard deviations (in parenthesis) grouped by the number of cities and the sample size.

Table 5:
Average relative errors and SD (in parentheses) of the estimates provided by the different methods, according to the number of cities n and the sample size M. The range for the real number of local optima of the instances appears in parentheses under the number of cities n.
FRTMFRTSch-CenJckkBootsChao1984ChaoBungeChaoLee1ChaoLee2
n= 14 M = 1,000  89.20 43.61 27.95 36.75 26.08 27.15 28.42 22.70 
   (17.76) (76.89) (71.81) (74.53) (57.02) (5,949.52) (49.49) (45.48) 
( 96.15         
  (3.48)         
 M = 10,000  87.15 14.29 5.65 10.28 6.71 7.34 8.93 7.86 
   (23.55) (29.03) (14.61) (21.50) (14.77) (15.49) (15.99) (13.83) 
n= 15 M= 1,000  93.04 59.38 44.42 52.87 41.42 59.46 41.72 33.78 
   (11.48) (66.19) (83.04) (73.92) (62.66) (19,247.69) (61.08) (61.47) 
( 97.46         
  (2.11)         
 M = 10,000  91.68 26.47 14.64 20.91 16.33 14.64 18.16 16.20 
   (15.19) (47.48) (29.51) (39.67) (24.74) (14.20) (24.32) (18.40) 
FRTMFRTSch-CenJckkBootsChao1984ChaoBungeChaoLee1ChaoLee2
n= 14 M = 1,000  89.20 43.61 27.95 36.75 26.08 27.15 28.42 22.70 
   (17.76) (76.89) (71.81) (74.53) (57.02) (5,949.52) (49.49) (45.48) 
( 96.15         
  (3.48)         
 M = 10,000  87.15 14.29 5.65 10.28 6.71 7.34 8.93 7.86 
   (23.55) (29.03) (14.61) (21.50) (14.77) (15.49) (15.99) (13.83) 
n= 15 M= 1,000  93.04 59.38 44.42 52.87 41.42 59.46 41.72 33.78 
   (11.48) (66.19) (83.04) (73.92) (62.66) (19,247.69) (61.08) (61.47) 
( 97.46         
  (2.11)         
 M = 10,000  91.68 26.47 14.64 20.91 16.33 14.64 18.16 16.20 
   (15.19) (47.48) (29.51) (39.67) (24.74) (14.20) (24.32) (18.40) 

A general conclusion deduced from Table 5 is that for n=14 the methods provide better estimates than for n=15. Table 5 also confirms the improvement of the estimates as the sample size grows. It is remarkable that for n=15 cities (which has a higher number of local optima than for n=14) and sample size M=1, 000, the estimates are very far from the real value, and the standard deviations for these estimates are high. Particularly, ChaoBunge has a very high standard deviation when the sample size is 1,000. This fact confirms the unstable behavior observed in the previous experiments. The instability of this method is a consequence of the variability on the estimation of the parameter (see Equation (3) of Section 3.2). If , then and the estimation in this case is very large and very far from the real value. This occurs when we have a sample where a lot of local optima are seen only once, but a small number of local optima are seen twice, three times, and so on. These particularities are commonly found when the sample size is small with respect to the number of local optima, or even when the variance of the sizes of the attraction basins of the local optima is high.

To visualize the performance of the methods, in Figure 4 we represent the estimates provided. We first arrange the instances according to the number of local optima and take 10 groups of 11 instances. For each group we calculate the average estimate of each method. We represent the five methods that provide the best results: Jckk, Boots, Chao1984, ChaoLee1, and ChaoLee2. ChaoBunge is removed from the plots because of its instability. Figure 4 shows the average estimates obtained for the instances of the TSP with 15 cities, for sample size 1,000 (top) and 10,000 (bottom). We observe from the graphs that when the number of local optima is lower than 400−500, the estimates are close to the real values. However, as the number of local optima grows, the estimates provided by all the methods tend to distance themselves from the real number of local optima. For sample size 1,000, it can be clearly seen that in all groups the best method is ChaoLee2, while ChaoLee1, Chao1984, and Jckk provide similar estimates. For sample size 10,000, we observe that Jckk is the best method. Additional information about the estimates provided by each method for each instance is available.2

Figure 4:

Estimates of the number of local optima provided by the different methods for 110 random instances of the TSP with 15 cities. The 110 instances are arranged according to the number of local optima, and are put in 10 groups of 11 instances each. The average of the 11 estimates is shown by the histogram for each method. The methods consider a sample of size 1,000 (top graph) and 10,000 (bottom graph). The solid line indicates the number of local optima of the instances.

Figure 4:

Estimates of the number of local optima provided by the different methods for 110 random instances of the TSP with 15 cities. The 110 instances are arranged according to the number of local optima, and are put in 10 groups of 11 instances each. The average of the 11 estimates is shown by the histogram for each method. The methods consider a sample of size 1,000 (top graph) and 10,000 (bottom graph). The solid line indicates the number of local optima of the instances.

We compare the methods according to different parameters, as shown in the previous section. First, a parameter d is associated with each instance, supposing that the sizes of the attraction basins were created sampling a Dirichlet distribution with that particular d. Due to the fact that we know the number of local optima v of the 610 random instances of the TSP, for each value of v we sample Dirichlet distributions D(v, d), for each d=0.1, 0.2, 0.5, 2, 4. We take 100 samples for each v and d and the variance of the v data is calculated in each sample. On the other hand, the variance of the relative sizes of the attraction basins of the local optima of each of the random instances is calculated. For each instance, we compare the variance of its relative sizes of the attraction basins with the variances obtained when sampling D(v, d), where v is the corresponding number of local optima of that instance. We associate with each instance the value of d for which the variance of the instance is closer to the average variance of D(v, d).

A classification of the instances according to d and v is carried out and we realize that most of the instances have low values of d, that is, they have high variances of the sizes of the attraction basins of the local optima. Table 6 shows the number of instances that we associate with the different values of d depending on the different number of local optima. Next, we study the performance of the methods taking into account d, v, and M, and we compare it with the results of the previous section. As we have only found one instance with parameter d=4, we analyze it separately. We saw in the previous section that the three methods that provided the best estimates for d=0.1, 0.2, 0.5, 2 as well as for M=1, 000, 10, 000, were Chao1984, ChaoLee2, and ChaoBunge. Table 7 shows the percentage of instances for which the three best estimates are provided by these three methods, according to d and M. We observe that for M=1, 000, the percentages are lower than for M=10, 000, and this is because when M=1, 000, ChaoBunge is more unstable than for M=10, 000. For small values of d (d=0.1, 0.2, 0.5), and when the sample size used by the methods is 1, 000, the best method is ChaoBunge in more than 77% of the estimates. The second best method is ChaoLee2, in 379 of the 610 instances, and the third best method is Chao1984 in 367 of the 610 instances. These results corroborate the conclusions obtained from the analysis of the methods for the synthetic instances (Table 4). When the sample size is 10, 000, for small values of d, Chao1984 provides very good estimates, and in most of the instances it is the best method. The reason is that almost all of the instances that have a low value of d also have a low number of local optima and, as was seen in Table 3, the best method in these scenarios is Chao1984.

Table 6:
Number of instances that are assigned different values of d according to v.
d30<v<500500<v<1, 100
0.1 346 
0.2 155 
0.5 81 
13 11 
d30<v<500500<v<1, 100
0.1 346 
0.2 155 
0.5 81 
13 11 
Table 7:
Percentages of the number of instances for which the best three estimates obtained are provided by Chao1984, ChaoBunge, and ChaoLee2.
dM=1, 000M=10, 000
0.1 80.64% 97.40% 
0.2 78.71% 99.35% 
0.5 55.95% 100.00% 
41.67% 95.83% 
dM=1, 000M=10, 000
0.1 80.64% 97.40% 
0.2 78.71% 99.35% 
0.5 55.95% 100.00% 
41.67% 95.83% 

We only find one instance with a high value of d, and furthermore, it is the instance that has the highest number of local optima (v=1, 087). When the methods are applied to this instance with sample size 1, 000, the best estimates are provided by ChaoLee2 and ChaoLee1. As we saw in Table 4, these are the best methods for d= 4. On the other hand, when the sample size is 10,000, the best methods are ChaoBunge and ChaoLee2. This matches the results shown in Table 2 (last pair of columns) and Table 3 for v=1, 000.

4.3  Instances of TSP and FSSP

4.3.1  Experimental Design

This section is devoted to experiments with real instances of COPs, as well as instances taken from Taillard's benchmark. We work with 10 instances of the TSP (with real distances between cities) and another 10 instances of the FSSP. The FSSP can be stated as follows: there are n jobs to be scheduled in m machines. A job consists of m operations and the jth operation of each job must be processed on machine j for a specific processing time without interruption. We consider that the jobs are processed in the same order on different machines, what is known as the permutation flow shop scheduling problem (PFSP). The objective of the PFSP is to find a permutation that represents the order in which the jobs have to be scheduled on the machines, minimizing the total flow time.

For the TSP we take the real distances between 14 cities of the continents of Africa, America, Asia, and Europe, and 14 cities of The United States, Spain, and Australia and Pacific Islands cities.3 For the PFSP, we consider instances with 13 jobs and five machines, obtained from the well-known benchmark proposed by Taillard4 that has been commonly used by numerous authors, such as Taillard (1990), Bierwirth and Mattfeld (1999), and Ceberio et al. (2012). The instances of both TSP and PFSP used in this section of the experiments are available online.5

We apply Algorithm 1 to each instance starting from each solution of the search space. Note that the size of the search space of the TSP instances is 13!/2, while the instances of the PFSP have a search space of size 13!. In this section, we consider two neighborhoods: 2-exchange and Insert. Two solutions are neighbors under the Insert neighborhood (NI) if one is obtained by moving an element of the other one to a different place:
formula

The different methods for estimating the number of local optima are applied to all the instances using both neighborhoods. The reason why we have considered these two neighborhoods in this section is that they provoke different situations for the estimates obtained with the different methods. As the Insert neighborhood explores at each step more solutions than the 2-exchange neighborhood, the number of local optima obtained when considering the first neighborhood is probabilistically lower than when considering the second neighborhood.

4.3.2  Results

In this final section, our aim is to extend the previous analysis by focusing on the accuracy of the methods and their relation to the sample size. For this purpose, we first look for the minimum sample size that allows each method to reach estimates closed to the real number of local optima. We consider that a method that needs a smaller sample size to provide good estimates will be more useful. In addition, we also analyze the effect of the sample size on the methods using small as well as large sample sizes, in order to find the methods that provide better estimates in more realistic situations. This is expected to be the case, when the sample size is very small compared with the real number of local optima of the instance.

In order to study the sample sizes needed to obtain good estimates, we choose for each method the minimum sample size for which at least 95 of 100 estimates provided are closer than 95% from the real number of local optima of each instance under each neighborhood. The algorithm used to obtain the minimum sample sizes starts with M = 100. It doubles the value of M, until it succeeds or reaches the maximum sample size considered (6,553,600 = 100 ). In case of success for a given M, a bisection procedure is applied until the difference between the last accepted sample size and the previously discarded one is 100. Thus, it converges to the minimum sample size wanted. We repeat this process 10 times and show the average values.

Tables 8 and 9 show the average sample sizes that the methods need when they are applied to the TSP and PFSP instances, respectively. In both tables, each row represents an instance and a neighborhood. The first half of the tables is related to the Insert operator and the second half to the 2-exchange operator. Inside each group, instances are arranged in ascending order according to the number of local optima (first column). In most of the instances, the MFRT method is not able to fulfill the condition stated (a line is drawn for these cases). Note that FRT is not taken into account because this method does not depend on the sample size and, as we saw in the previous sections, this method provides such bad estimates that we decided to exclude it from the study in this section.

Table 8:
Average of the minimum sample sizes obtained for which at least 95 of 100 estimates provided by each of the methods are closer than 95% to the real number of local optima for each TSP instance under Insert and 2-exchange neighborhoods. The minimum sample size obtained for each instance is in bold.
Number of
local optimaMFRTSch-CenJckkBootsChao1984ChaoBungeChaoLee1ChaoLee2
TSP Insert          
 100 100 100 100 100 100 100 100 
 100 100 100 100 100 100 100 100 
 100 100 100 100 100 100 100 100 
 650 100 110 100 100 100 100 100 
 —— 1,320 2,030 1,220 1,330 1,910 1,870 1,840 
 353,940 200 250 200 200 230 200 200 
 12 —— 1,690 2,750 1,650 1,750 2,520 2,600 2,590 
 22 —— 2,740 4,330 2,720 3,600 4,030 3,580 4,000 
 29 —— 2,260 3,360 2,270 2,670 5,240 2,860 5,480 
 32 —— 2,370 3,490 2,320 2,960 3,110 2,750 2,950 
TSP 2-exchange          
 67 —— 23,270 28,810 22,010 37,130 34,250 24,280 28,590 
 73 —— 522,490 483,810 515,930 647,950 1,095,310 793,700 1,009,610 
 90 —— 55,220 44,460 57,090 81,340 174,680 55,320 152,200 
 92 —— 24,980 30,450 20,020 39,110 23,770 21,740 23,250 
 103 —— 40,450 26,450 32,110 48,070 35,920 36,390 36,790 
 117 —— 179,670 132,460 173,490 251,070 228,310 165,270 177,680 
 188 —— 88,860 47,150 66,520 129,460 75,440 74,640 70,690 
 201 —— 93,970 45,850 68,870 107,460 69,160 70,890 65,720 
 393 —— 224,390 91,510 150,410 167,600 168,980 167,560 166,950 
 455 —— 275,540 89,380 161,850 189,310 189,850 199,250 193,950 
Number of
local optimaMFRTSch-CenJckkBootsChao1984ChaoBungeChaoLee1ChaoLee2
TSP Insert          
 100 100 100 100 100 100 100 100 
 100 100 100 100 100 100 100 100 
 100 100 100 100 100 100 100 100 
 650 100 110 100 100 100 100 100 
 —— 1,320 2,030 1,220 1,330 1,910 1,870 1,840 
 353,940 200 250 200 200 230 200 200 
 12 —— 1,690 2,750 1,650 1,750 2,520 2,600 2,590 
 22 —— 2,740 4,330 2,720 3,600 4,030 3,580 4,000 
 29 —— 2,260 3,360 2,270 2,670 5,240 2,860 5,480 
 32 —— 2,370 3,490 2,320 2,960 3,110 2,750 2,950 
TSP 2-exchange          
 67 —— 23,270 28,810 22,010 37,130 34,250 24,280 28,590 
 73 —— 522,490 483,810 515,930 647,950 1,095,310 793,700 1,009,610 
 90 —— 55,220 44,460 57,090 81,340 174,680 55,320 152,200 
 92 —— 24,980 30,450 20,020 39,110 23,770 21,740 23,250 
 103 —— 40,450 26,450 32,110 48,070 35,920 36,390 36,790 
 117 —— 179,670 132,460 173,490 251,070 228,310 165,270 177,680 
 188 —— 88,860 47,150 66,520 129,460 75,440 74,640 70,690 
 201 —— 93,970 45,850 68,870 107,460 69,160 70,890 65,720 
 393 —— 224,390 91,510 150,410 167,600 168,980 167,560 166,950 
 455 —— 275,540 89,380 161,850 189,310 189,850 199,250 193,950 
Table 9:
Average of the minimum sample sizes obtained for which at least 95 of 100 estimates provided by each of the methods are closer than 95% to the real number of local optima for each FSSP instance under Insert and 2-exchange neighborhoods. The minimum sample size obtained for each instance is in bold.
Number of
local optimaMFRTSch-CenJckkBootsChao1984ChaoBungeChaoLee1ChaoLee2
FSSP Insert          
 14 —— 3,320 5,540 3,150 3,640 4,720 4,830 4,760 
 70 —— 17,920 19,380 16,520 25,220 17,680 16,390 18,130 
 134 —— 25,710 25,730 18,880 45,320 21,540 19,870 20,750 
 160 —— 47,300 28,170 34,350 63,310 39,850 37,040 35,380 
 190 —— 19,930 11,630 13,770 32,110 16,050 15,910 15,960 
 285 —— 23,820 9,840 15,260 19,310 16,620 18,110 17,240 
 404 —— 29,830 10,670 18,430 19,880 18,790 20,100 19,270 
 461 —— 51,320 17,550 31,200 35,040 34,770 36,670 33,560 
 506 —— 77,700 24,780 45,910 52,990 54,730 58,060 56,180 
 923 —— 137,950 40,850 79,530 88,750 92,460 94,730 93,780 
FSSP 2-exchange          
 192 —— 39,410 19,540 29,300 55,320 31,750 32,290 33,600 
 1,643 —— 445,780 134,260 254,870 264,950 264,640 285,060 264,130 
 1,846 —— 628,440 199,600 363,320 338,380 323,990 366,380 338,730 
 1,997 —— 592,030 177,390 341,020 337,870 343,950 370,630 354,610 
 2,130 —— 912,200 273,490 527,160 508,690 521,420 576,430 539,570 
 2,382 —— 763,420 227,140 435,560 411,080 426,840 466,190 431,230 
 2,386 —— 613,130 179,250 353,810 346,650 357,130 384,600 363,640 
 5,119 —— 2,149,000 643,230 1,229,450 1,098,740 1,093,250 1,235,370 1,128,300 
 6,485 —— 1,671,460 456,690 927,350 863,640 875,150 994,270 900,410 
 8,194 —— 2,052,570 568,480 1,148,250 1,032,760 1,058,350 1,204,950 1,094,970 
Number of
local optimaMFRTSch-CenJckkBootsChao1984ChaoBungeChaoLee1ChaoLee2
FSSP Insert          
 14 —— 3,320 5,540 3,150 3,640 4,720 4,830 4,760 
 70 —— 17,920 19,380 16,520 25,220 17,680 16,390 18,130 
 134 —— 25,710 25,730 18,880 45,320 21,540 19,870 20,750 
 160 —— 47,300 28,170 34,350 63,310 39,850 37,040 35,380 
 190 —— 19,930 11,630 13,770 32,110 16,050 15,910 15,960 
 285 —— 23,820 9,840 15,260 19,310 16,620 18,110 17,240 
 404 —— 29,830 10,670 18,430 19,880 18,790 20,100 19,270 
 461 —— 51,320 17,550 31,200 35,040 34,770 36,670 33,560 
 506 —— 77,700 24,780 45,910 52,990 54,730 58,060 56,180 
 923 —— 137,950 40,850 79,530 88,750 92,460 94,730 93,780 
FSSP 2-exchange          
 192 —— 39,410 19,540 29,300 55,320 31,750 32,290 33,600 
 1,643 —— 445,780 134,260 254,870 264,950 264,640 285,060 264,130 
 1,846 —— 628,440 199,600 363,320 338,380 323,990 366,380 338,730 
 1,997 —— 592,030 177,390 341,020 337,870 343,950 370,630 354,610 
 2,130 —— 912,200 273,490 527,160 508,690 521,420 576,430 539,570 
 2,382 —— 763,420 227,140 435,560 411,080 426,840 466,190 431,230 
 2,386 —— 613,130 179,250 353,810 346,650 357,130 384,600 363,640 
 5,119 —— 2,149,000 643,230 1,229,450 1,098,740 1,093,250 1,235,370 1,128,300 
 6,485 —— 1,671,460 456,690 927,350 863,640 875,150 994,270 900,410 
 8,194 —— 2,052,570 568,480 1,148,250 1,032,760 1,058,350 1,204,950 1,094,970 

Looking at the overall results, one could conclude that the best methods are Jckk and Boots, because in almost all instances they need a smaller sample size to provide very good estimates. This fact seems to be in conflict with almost all the results obtained in the previous sections, where Chao1984, ChaoBunge, ChaoLee1, and ChaoLee2 seemed to be the most promising. However, this result agrees with that observed in the previous section, where Jckk provided better estimates than the rest of the methods for sample size 10,000. Therefore, and in order to obtain additional information about the performance of the methods, we decided to study the estimates provided by them when the sample sizes are small. This idea arose when we realized that in real life we have to face problems of such high dimension that they have a huge number of local optima. Thus, the sample we are able to deal with is usually tiny compared to the number of local optima. Thus, we are interested in finding methods that do not need such a large sample size to provide a good estimate.

We apply the methods taking small sample sizes (compared to the number of local optima) and analyze them according to the estimate they provide. We have only considered the instances with more than 100 local optima, without making distinctions between the two neighborhoods. We take sample sizes in the range of 50–600 with steps of 50. MFRT and Sch-Cen have been removed from the study because for the instances with the highest number of local optima, the estimates provided by these methods have lower bounds very far from the real values. In consequence, we analyze Jckk, Boots, Chao1984, ChaoBunge, ChaoLee1, and ChaoLee2.

The methods are applied 100 times for each sample size. We carry out the nonparametric Friedman's test with level of significance on the estimates, grouping them according to the sample size. We observe that there are significant statistical differences between the estimates provided by the six methods for all the sample sizes. We continue with the Holm's procedure which carries out all pairwise comparisons, setting the level of significance to . Table 10 shows the best methods obtained from Friedman's test for the TSP and PFSP according to the sample sizes. For the TSP, and using sample sizes lower than 400, we find that the best method is ChaoLee2 and significant differences between ChaoLee2 and the rest of the methods are found. For sample sizes 400 and 450, the best methods are ChaoLee2 and ChaoBunge. There are no significant differences between them, but there are between them and the rest of the methods. For sample sizes larger than 450, the best method in the ranking is ChaoBunge, with significant differences between this method and the rest. For the PFSP instances, and for all sample sizes, ChaoLee2 is the best performing method, and when we study the pairwise significant differences with Holm's procedure, we see significant differences between ChaoLee2 and the rest of the methods.

Table 10:
Best methods obtained from Friedman's test for the TSP and PFSP according to the sample size. The average relative error of the estimates provided by these methods is also shown.
Sample sizeBest methodAverage relative error
TSP   
 50–350 ChaoLee2 22.56 
 400–450 ChaoLee2, ChaoBunge 18.76, 20.05 
 500–600 ChaoBunge 11.46 
PFSP   
 50–600 ChaoLee2 49.95 
Sample sizeBest methodAverage relative error
TSP   
 50–350 ChaoLee2 22.56 
 400–450 ChaoLee2, ChaoBunge 18.76, 20.05 
 500–600 ChaoBunge 11.46 
PFSP   
 50–600 ChaoLee2 49.95 

We now study in detail the estimates obtained for the two instances with the highest number of local optima. Tables 11 and 12 show the average of 100 estimates provided by each method for small sample sizes (with respect to the number of local optima) for instances 9 and 10 of PFSP, respectively, when using the 2-exchange neighborhood. These tables show that when the sample size is small, Boots and Jckk provide worse estimates than the other methods. Obviously, the estimates improve as the sample size grows for all methods, except for ChaoBunge. As was seen in previous sections, the ChaoBunge method is very unstable. The estimate provided by this method varies significantly depending on the sample size. Note that although ChaoLee2, ChaoLee1, and Chao1984 provide the best estimates, these are also far from the real number of local optima.

Table 11:
Average of 100 estimates provided by each method for small sample sizes for instance 9 of PFSP when using the 2-exchange neighborhood.
MethodM=50M = 100M = 150M = 200M = 250M = 300M = 350M = 400M = 450M = 500M = 550M = 600
Instance 9 PFSP. Real number of local optima: 6485 
Jckk 92 171 241 304 363 418 469 517 563 605 646 687 
Boots 64 122 173 220 265 307 344 381 416 449 481 511 
Chao1984 508 683 741 756 798 835 879 919 968 1016 1061 1099 
ChaoBunge 254 390 508 259 425 607 600 533 1345 53853 519 393 
ChaoLee1 576 682 799 847 876 910 950 988 1032 1074 1129 1176 
ChaoLee2 789 1009 1221 1293 1290 1292 1341 1378 1441 1493 1584 1664 
MethodM=50M = 100M = 150M = 200M = 250M = 300M = 350M = 400M = 450M = 500M = 550M = 600
Instance 9 PFSP. Real number of local optima: 6485 
Jckk 92 171 241 304 363 418 469 517 563 605 646 687 
Boots 64 122 173 220 265 307 344 381 416 449 481 511 
Chao1984 508 683 741 756 798 835 879 919 968 1016 1061 1099 
ChaoBunge 254 390 508 259 425 607 600 533 1345 53853 519 393 
ChaoLee1 576 682 799 847 876 910 950 988 1032 1074 1129 1176 
ChaoLee2 789 1009 1221 1293 1290 1292 1341 1378 1441 1493 1584 1664 
Table 12:
Average of 100 estimates provided by each method for small sample sizes for instance 10 of PFSP when using the 2-exchange neighborhood.
Method M = 50 M = 100 M = 150 M = 200 M = 250 M = 300 M = 350 M = 400 M = 450 M = 500 M = 550 M = 600 
Instance 10 PFSP. Real number of local optima: 8194 
Jckk 95 182 262 335 405 470 532 593 648 702 754 803 
Boots 66 129 185 241 292 340 386 431 473 512 551 587 
Chao1984 1018 1050 1048 1077 1150 1194 1240 1278 1334 1367 1405 1445 
ChaoBunge 341 476 753 1634 5392 1144 5634 1211 1165 1279 4140 540 
ChaoLee1 701 1035 1055 1061 1140 1200 1274 1323 1402 1443 1492 1543 
ChaoLee2 745 1195 1195 1247 1396 1492 1633 1718 1858 1915 1992 2080 
Method M = 50 M = 100 M = 150 M = 200 M = 250 M = 300 M = 350 M = 400 M = 450 M = 500 M = 550 M = 600 
Instance 10 PFSP. Real number of local optima: 8194 
Jckk 95 182 262 335 405 470 532 593 648 702 754 803 
Boots 66 129 185 241 292 340 386 431 473 512 551 587 
Chao1984 1018 1050 1048 1077 1150 1194 1240 1278 1334 1367 1405 1445 
ChaoBunge 341 476 753 1634 5392 1144 5634 1211 1165 1279 4140 540 
ChaoLee1 701 1035 1055 1061 1140 1200 1274 1323 1402 1443 1492 1543 
ChaoLee2 745 1195 1195 1247 1396 1492 1633 1718 1858 1915 1992 2080 

If we analyze all the results obtained in this section, on the one hand, we find that Jckk and Boots need a smaller sample size than the rest of the methods to provide very good estimates. On the other hand, ChaoLee2 and ChaoBunge are considered the best methods for small sample sizes. Thus, we suspect that there is a threshold for the sample size where the estimates provided by ChaoLee2 and ChaoBunge, or even ChaoLee1 and Chao1984, are worse when compared to the estimates provided by Jckk and Boots.

These suspicions motivate us to represent the estimates obtained by the different methods for the two instances with the highest number of local optima as the sample size grows. In Figure 5 we plot the estimates corresponding to Jckk, ChaoLee1, and ChaoLee2 to see the threshold mentioned more clearly. More detailed graphs are available online.6 In Figure 5, we take into account very small sample sizes as well as high sample sizes (from M = 50 to M=200, 000). We observe that, when the sample size is small, the best methods are ChaoLee2 and ChaoLee1. There is a threshold (for sample size between 20,000 and 60,000) where Jckk has better estimates compared with those provided by ChaoLee1, and for sample size between 80,000 and 200,000, the estimates given by Jckk are better than those provided by ChaoLee2. The reason is that with a small growth in the sample size, the estimate provided by Jckk improves more than the estimates given by ChaoLee1 or ChaoLee2. Thus, our recommendation is to use the Jckk and Boots methods when we are able to work with large sample sizes. However, if we suspect that our sample is very small compared to the real number of local optima, the best methods to apply are ChaoLee2 and ChaoLee1.

Figure 5:

Average of 100 estimates obtained by the different methods for instance number 9 (top) and instance number 10 (bottom) of PFSP using the 2-exchange neighborhood as the sample size grows.

Figure 5:

Average of 100 estimates obtained by the different methods for instance number 9 (top) and instance number 10 (bottom) of PFSP using the 2-exchange neighborhood as the sample size grows.

5  Conclusions and Future Work

In this paper, we have reviewed different methods for estimating the number of local optima of instances of combinatorial optimization problems. Our main contribution is the comparison of methods in the optimization field with some methods previously used for estimating the number of species in a population in the field of statistics.

The methods were applied to three datasets: synthetic instances, instances of the TSP with 14 and 15 cities taken at random, and instances of TSP with real distances between cities and instances of FSSP taken from the well-known Taillard's benchmark. The main conclusions observed for all the methods in the three scenarios are the following:

  1. When the attraction basins are similar in size, the methods provide estimates close to the real number of local optima. Of course, the higher the sample, the more precise the estimates.

  2. The further the sizes of the attraction basins are from uniformity, the worse the estimates. In fact, in the real instances (where the variance of the sizes of the attraction basins is very high) the predictions are quite far from the real number of local optima.

Based on the results observed through the experiments, we provide the following rules of thumb:

  • If we are able to take a sample of large size with respect to the number of local optima, we recommend using Jckk.

  • If we suspect that our sample size is small (with respect to the number of local optima), we recommend using ChaoBunge and ChaoLee2. Due to the instability observed for ChaoBunge, both methods should be executed independently. If the results provided are close, ChaoBunge should usually be chosen. Otherwise, select ChaoLee2.

  • If in analyzing the sample we realize that each (or most) of the initial solutions reach different local optima, that is, r=M and , none of the previous methods can be applied. In this case, we base our estimator on the proportion of local optima over the sample (Caruana and Mullin, 1999; Grundel et al., 2007).

We consider two different lines of future work. In a first step, we plan to improve the quality of the estimates of some of the presented methods. For example, methods such as ChaoBunge or ChaoLee2 depend on a cutoff value that fixes the border between rare and abundant species. We think that this cutoff value could be properly tuned for each instance and sample size instead of being a fixed number. Our second line is related to the design of specific estimation methods for COPs. As we have seen, the evaluated methods provide unacceptable estimates in real instances. We conjecture that this is due to, on the one hand, the fact that the methods have not been explicitly designed to calculate the number of local optima but instead the number of species, and these are different problems. On the other hand, they do not use all the information that we have at hand when we try to calculate the number of local optima in a COP. For example, the search space is structured and therefore, it could be divided based on a certain criterion, performing estimates for each chunk. Also, the number of steps (solutions) traversed from the initial solution to the local optima can provide valuable information about the relative sizes of the attraction basins.

Acknowledgments

This work was partially supported by the Saiotek, Etortek, and Research Groups 2007–2012 (IT- 242-07) programs (Basque Government), TIN2010-14931 and Consolider Ingenio 2010-CSD2007-00018 projects (Spanish Ministry of Science and Innovation) and COMBIOMED network in computational biomedicine (Carlos III Health Institute). Leticia Hernando holds a grant from the Basque Government.

References

Albrecht
,
A.
,
Lane
,
P.
, and
Steinhofel
,
K.
(
2008
).
Combinatorial landscape analysis for k-SAT instances
. In
Proceedings of the IEEE World Congress on Computational Intelligence, Evolutionary Computation, 2008, CEC 2008
, pp.
2498
2504
.
Albrecht
,
A.
,
Lane
,
P.
, and
Steinhofel
,
K.
(
2010
).
Analysis of local search landscapes for k-SAT instances
.
Mathematics in Computer Science
,
3
(
4
):
465
488
.
Bierwirth
,
C.
, and
Mattfeld
,
D. C.
(
1999
).
Production scheduling and rescheduling with genetic algorithms
.
Evolutionary Computation
,
7
(
1
):
1
17
.
Bunge
,
J.
, and
Fitzpatrick
,
M.
(
1993
).
Estimating the number of species: A review
.
Journal of the American Statistical Association
,
88
(
421
):
364
373
.
Burnham
,
K. P.
, and
Overton
,
W. S.
(
1978
).
Estimation of the size of a closed population when capture probabilities vary among animals
.
Biometrika
,
65
(
3
):
625
633
.
Caruana
,
R.
, and
Mullin
,
M.
(
1999
).
Estimating the number of local minima in big, nasty search spaces
. Paper presented at the IJCAI-99 Workshop on Statistical Machine Learning for Large-Scale Optimization.
Ceberio
,
J.
,
Irurozki
,
E.
,
Mendiburu
,
A.
, and
Lozano
,
J. A.
(
2012
).
A review on estimation of distribution algorithms in permutation-based combinatorial optimization problems
.
Progress in Artificial Intelligence
,
1
(
1
):
103
117
.
Chao
,
A.
(
1984
).
Nonparametric estimation of the number of classes in a population
.
Scandinavian Journal of Statistics
,
11
(
4
):
265
270
.
Chao
,
A.
, and
Bunge
,
J.
(
2002
).
Estimating the number of species in a stochastic abundance model
.
Biometrics
,
58
(
3
):
531
539
.
Chao
,
A.
, and
Lee
,
S.-M.
(
1992
).
Estimating the number of classes via sample coverage
.
Journal of the American Statistical Association
,
87
(
417
):
210
217
.
Chao
,
A.
, and
Yang
,
M. C. K.
(
1993
).
Stopping rules and estimation for recapture debugging with unequal failure rates
.
Biometrika
,
80
(
1
):
193
201
.
Eremeev
,
A. V.
, and
Reeves
,
C. R.
(
2002
).
Non-parametric estimation of properties of combinatorial landscapes
. In
S. Cagnoni, J. Gottlieb, E. Hart, M. Middendorf, and G. Raidl
(Eds.),
Applications of evolutionary computing. Lecture notes in computer science
, Vol.
2279
(pp.
31
40
).
Berlin
:
Springer-Verlag
.
Eremeev
,
A. V.
, and
Reeves
,
C. R.
(
2003
).
On confidence intervals for the number of local optima
. In
Proceedings of EvoWorkshops 2003
, pp.
224
235
.
Esty
,
W. W.
(
1985
).
Estimation of the number of classes in a population and the coverage of a sample
.
Mathematical Scientists
,
10
:
41
50
.
Fonlupt
,
C.
,
Robilliard
,
D.
,
Preux
,
P.
, and
Talbi
,
E.-G.
(
1999
).
Fitness landscapes and performance of meta-heuristics
. In
Meta-heuristics: Advances and trends in local search paradigms for optimization
(pp.
257
268
).
Dordrecht, The Netherlands
:
Kluwer Academic Publishers
.
Garnier
,
J.
, and
Kallel
,
L.
(
2001
).
How to detect all maxima of a function
. In
Theoretical aspects of evolutionary computing
(pp.
343
370
).
Berlin
:
Springer-Verlag
.
Gent
,
I. P.
, and
Walsh
,
T.
(
1996
).
The TSP phase transition
.
Artificial Intelligence
,
88
(
1-2
):
349
358
.
Good
,
I. J.
(
1953
).
The population frequencies of species and the estimation of population parameters
.
Biometrika
,
40
(
3–4
):
237
264
.
Grundel
,
D.
,
Krokhmal
,
P.
,
Oliveira
,
C.
, and
Pardalos
,
P.
(
2007
).
On the number of local minima for the multidimensional assignment problem
.
Journal of Combinatorial Optimization
,
13
:
1
18
.
Harris
,
B.
(
1959
).
Determining bounds on integrals with applications to cataloging problems
.
Annals of Mathematical Statistics
,
30
(
2
):
521
548
.
Hertz
,
A.
,
Jaumard
,
B.
, and
de Aragão
,
M. P.
(
1994
).
Local optima topology for the k-coloring problem
.
Discrete Applied Mathematics
,
49
(
1–3
):
257
280
.
Kirkpatrick
,
S.
, and
Toulouse
,
G.
(
1985
).
Configuration space analysis of travelling salesman problems
.
Journal de Physique
,
46
(
8
):
1277
1292
.
Link
,
W. A.
(
2003
).
Nonidentifiability of population size from capture-recapture data with heterogeneous detection probabilities
.
Biometrics
,
59
(
4
):
1123
1130
.
Mattfeld
,
D. C.
, and
Bierwirth
,
C.
(
1999
).
A search space analysis of the job shop scheduling problem
.
Annals of Operations Research
,
86
:
441
453
.
Reeves
,
C. R.
(
2001
).
Statistical properties of combinatorial landscapes: An application to scheduling problems
. In
MIC’2001: Proceedings of the 4th Metaheuristic International Conference
, pp.
691
695
.
Reeves
,
C. R.
(
2002
).
The ”crossover landscape” and the Hamming landscape for binary search spaces
. In
Foundations of Genetic Algorithms 7
(pp.
81
98
).
San Mateo, CA
:
Morgan Kaufmann
.
Reeves
,
C. R.
, and
Aupetit-Bélaidouni
,
M.
(
2004
).
Estimating the number of solutions for SAT problems
. In
X. Yao, E. Burke, J. Lozano, J. Smith, J. Merelo-Guervós, J. Bullinaria, J. Rowe, P. Tino, A. Kabán, and H.-P. Schwefel
(Eds.),
Parallel problem solving from nature PPSN VIII. Lecture notes in computer science
, Vol.
3242
(pp.
101
110
).
Berlin
:
Springer-Verlag
.
Reeves
,
C. R.
, and
Eremeev
,
A. V.
(
2004
).
Statistical analysis of local search landscapes
.
Journal of the Operational Research Society
,
55
(
7
):
687
693
.
Schwarz
,
C. J.
, and
Seber
,
G. A. F.
(
1999
).
Estimating animal abundance: Review III
.
Statistical Science
,
14
(
4
):
427
456
.
Seber
,
G. A. F.
(
1986
).
A review of estimating animal abundance
.
Biometrics
,
42
(
2
):
267
292
.
Seber
,
G. A. F.
(
1992
).
A review of estimating animal abundance II
.
International Statistical Review
,
60
(
2
):
129
166
.
Taillard
,
E.
(
1990
).
Some efficient heuristic methods for the flow shop sequencing problem
.
European Journal of Operational Research
,
47
(
1
):
65
74
.
Tomassini
,
M.
,
Vérel
,
S.
, and
Ochoa
,
G.
(
2008
).
Complex-network analysis of combinatorial spaces: The NK landscape case
.
Physical Review E
,
78
(
6
):
66
114
.
Vose
,
M. D.
, and
Wright
,
A. H.
(
1995
).
Stability of vertex fixed points and applications
. In
Foundations of Genetic Algorithms 3
(pp.
103
113
).
San Mateo, CA
:
Morgan Kaufmann
.
Wang
,
J.-P.
(
2010
).
Estimating the species richness by a Poisson-compound gamma model
.
Biometrika
,
97
(
3
):
727
740
.
Wang
,
J.-P.
, and
Lindsay
,
B. G.
(
2008
).
An exponential partial prior for improving NPML estimation for mixtures
.
Statistical Methodology
,
5
:
30
45
.
Wang
,
J.-P. Z.
, and
Lindsay
,
B. G.
(
2005
).
A penalized nonparametric maximum likelihood approach to species richness estimation
.
Journal of the American Statistical Association
,
100
(
471
):
942
959
.

Notes