Abstract

The fitness landscape of the travelling salesman problem is investigated for 11 different types of the problem. The types differ in how the distances between cities are generated. Many different properties of the landscape are studied. The properties chosen are all potentially relevant to choosing an appropriate search algorithm. The analysis includes a scaling study of the time to reach a local optimum, the number of local optima, the expected probability of reaching a local optimum as a function of its fitness, the expected fitness found by local search and the best fitness, the probability of reaching a global optimum, the distance between the local optima and the global optimum, the expected fitness as a function of the distance from an optimum, their basins of attraction and a principal component analysis of the local optima. The principal component analysis shows the correlation of the local optima in the component space. We show how the properties of the principal components of the local optima change from one problem type to another.

1  Introduction

The travelling salesman problem (TSP) is probably the most famous and best studied combinatorial optimisation problem. In this paper we describe a detailed analysis of the properties of the fitness landscape for 11 different types of problems, ranging from random problems to those motivated by real-world constraints. The problems are artificial in that they are all generated by a instance generator; however, many of them share the constraints that would be expected to arise in real problems. For example, we consider instances that could plausibly occur in a drilling problem, or instances arising from a scheduling problem.

This paper follows two previous papers, where we investigated the fitness landscape properties of the maximum satisfiability (MAX-SAT) problem (Prügel-Bennett and Tayarani, 2011) and graph colouring (Tayarani and Prügel-Bennett, 2015). This current paper expands on those previous papers, not only in looking at a different problem but in studying how properties vary over different types of problems. This is intended to be a self-contained paper and not a comparison with MAX-SAT or graph colouring; however, the previous papers informed the present study about what properties are interesting.

“The relation between the Fourier expansion of a landscape and the computational complexity of the optimization problem on the landscape is of great importance for devising practical optimization heuristics,” Stadler (1996) wrote. “Disappointingly, there seems to be no simple relationship.” We have seen little evidence that it is possible to characterise the complexity of a problem in just a few measures. It is our view that combinatorial optimisation is a tough field requiring an understanding of many different aspects of a problem. This paper examines a large number of properties, as it is our belief that all these properties could be relevant in the design and choice of heuristic search algorithms. Alas, this does not lead to short, punchy papers but requires us to perform large empirical studies. This paper provides an illuminating insight into the structure of the fitness landscape and shows which properties are universal and which depend critically on the details of the instances.

We introduce a new method of analysing the fitness landscape. That is, we use principal component analysis (PCA) to study the structure of the local optima. The motivation for this is that many different local optima often have parts of a tour in common. Thus, the set of local optima does not necessarily span the whole search space but is confined to a low-dimensional subspace. PCA extracts this information. To the best of our knowledge, this is the first time PCA has been used to study the structure of the fitness landscape in this way.

1.1  Previous Landscape Analysis

The analysis of the fitness landscape goes back to the seminal work of Sewell Wright (1932) and has attracted considerable attention. Examples of measures that have attempted to capture the ruggedness of the landscape include the autocorrelation (Weinberger, 1990; Angel and Zissimopoulos, 1998) and fitness distance correlation (Jones and Forrest, 1995; Merz and Freisleben, 2000). But it was soon realised that these measures do not necessarily capture the hardness of the problems, when a counterexample was proposed that was an easy problem but showed no relation between the fitness of the solutions and their distance to the global optimum (Altenberg, 1997). The question of what makes a fitness landscape hard continues (Forrest and Mitchell, 1993; Jones and Forrest, 1995), and new methods of capturing the hardness of the problems are proposed, like algebraic properties of the solutions in the landscape (Grover, 1992; Stadler, 1995), modality (number of local optima) (Horn and Goldberg, 1995), and the fractal dimension of the fitness landscape (Hoshino et al., 1998). Some researchers try to explain, when a problem becomes hard, by studying the area in the landscape called Olympus, in which the better local optima are located (Verel et al., 2007, 2008). Fitness cloud is another method proposed to visualise the fitness landscape, which tries to represent some properties of the fitness landscape (Collard et al., 2007; Lu et al., 2011; Vanneschi et al., 2007), reflecting the problem hardness. Problems can be difficult for many different reasons. The aim of this paper and its sister papers (Prügel-Bennett and Tayarani, 2011; Tayarani and Prügel-Bennett, 2015) is to understand why real-world problems, exemplified by some of the classic combinatorial optimisation problems, become hard. We have not sought a single measure of hardest (we believe there can be many reasons for problems to be hard) but rather attempted to provide a full picture of the fitness landscape.

Explaining why and how problems become hard has not been the only field of research in landscape analysis. Some researchers use landscape analysis to study some parameters of the evolutionary algorithms, like the population size (Alander, 1999), operators like mutation and crossover (Mathias and Whitley, 1992; Suzuki and Iwasa, 1997), the recombination operators (Hornby, 1996), or the perturbation operator (Martin et al., 1999). Another field of study has been to study the role of the representation of the solutions on the landscape properties, and find the best representation for each problem (Tavares et al., 2006; Riley and Ciesielski, 2010). Some researchers have used landscape analysis to explain why some algorithms, like local search algorithms (Fonlupt et al., 1997), memetic algorithms (hybrid genetic algorithms) (Merz, 2004), or metaheuristic algorithms based on local search (Watson, 2010) work better on particular landscapes. Some other works have tried to categorise the problems based on their landscape properties (Merz and Freisleben, 2000).

Several papers try to exploit the concept of the landscape and landscape analysis in developing new sets of algorithms for different problems. The fitness landscape analysis is used to propose memetic algorithms for graph bi-partitioning (Merz and Freisleben, 1998), resource allocation (Huang et al., 2009), and maximum satisfiability (Zhang et al., 2003; Zhang, 2004), or improving the performance of evolutionary algorithms by landscape approximation (Ratle, 1998; Pošík and Franc, 2007; Shen and He, 2010). In a more recent work, landscape analysis is used to propose a new population-based algorithm (Qasem and Prügel-Bennett, 2010). The landscapes of many other problems have been studied, including the landscape of graph drawing (Lehn and Kuntz, 2001), graph colouring (Hertz et al., 1994; Bouziri et al., 2011; Tayarani and Prügel-Bennett, 2015), antenna design (Alander et al., 2002), maximum satisfiability (Prügel-Bennett and Tayarani, 2011), flow-shop scheduling (Czogalla and Fink, 2012), and Bayesian network structure (Wu et al., 2011).

The landscape of the TSP has also been studied in a number of different papers. In the first major effort at understanding the landscape of TSP, a big valley structure was observed in the landscape (Stadler and Schnabl, 1992). A similar big valley structure was subsequently observed in other TSP problem instances (Boese, 1995), the graph partitioning (Wiles and Tonkes, 2006), and graph bijection (Boese et al., 1994). The most recent work on the big valley hypothesis studies different problems with different sizes using the Concord algorithm (Hains et al., 2011). Our paper expands on this previous work in two directions. First, we examine considerably more properties, particularly those related to the structure of the local minima. Second, we study how these properties differ depending on the structure of the distance matrix.

This paper is organised as follows. In the next section we introduce the travelling salesman problem and describe the local search algorithm we use to find the optima in the landscape. Section 3 describes some statistical properties of random solutions, autocorrelation, and some properties of global and local optima, including the number of steps a local search algorithm takes to get to a local optimum, number of local and global optima, and distance between the optima. We also examine the expected fitness of configurations in Hamming spheres of different radii from a local optimum. We also consider the probability of returning to a local optimum starting from a randomly chosen configuration in the Hamming sphere. Principal component analysis of the local optima is presented in Section 4. Sections 5 and 6 conclude the paper.

2  Travelling Salesman Problem

In the travelling salesman problem one is given a list of cities and the distances between every pair of them. The task is to find the shortest tour that visits each city exactly once and then returns to the start. Despite its deceptively simple description, the problem is surprisingly difficult to solve. In general it is NP-hard, although for some problem types such as Euclidean TSP, there is a polynomial time approximation scheme (Arora, 1998). In this paper, we investigate 11 different types of instances, many inspired by real-world problems. Our chosen categories have been studied in other papers (Cirasella et al., 2001; Rardin et al., 1993). The problems differ in how the distance matrix is generated.

2.1  Problem Types

There are different ways of representing the TSP problem mathematically. One method is in terms of finding a bijection from each city to the successor city , which minimises the total tour distance
formula
where is the distance between cities i and j and is an indicator function. For a legal tour we require . We consider 11 different problem types, which differ in construction on the distance matrix . The only common assumption is that the distances, , are not negative.

2.1.1  Random Asymmetric Matrices

In this problem the distance between the cities is chosen randomly,
formula
1
where Rand(A, B) is a uniform random number generator that generates integer deviates, , where . Although this set of problems does not belong to the real-world problems, many algorithm designers use this type as a challenge to their algorithms. We refer to this set of problems as the Random problems.

2.1.2  Random Asymmetric Matrices Closed under Shortest Paths

The previous set of problems lacks the correlation between the distances, making that set an unrealistic problem type. Taking the random problems made through the previous method and closing them under shortest path computation make a more realistic set of problems. A matrix M is closed under shortest path if and only if
formula
2
That is, the distances satisfy the triangular inequality. Closing a random matrix can easily be performed by applying the following operator on all the values in the matrix,
formula
3
This process is applied until all the values in the matrix satisfy Eq. (2). We refer to this set of problems as the Random C problems.

2.1.3  Random Symmetric Matrices

This class of problems is like the random problems, except that the problem matrix is symmetric,
formula
4
We refer to this set of problems as the Random S problems.

2.1.4  Random Symmetric Matrices Closed under Shortest Paths

In this set of problems, the matrices are both closed and symmetric. We refer to this set of problems as the Random SC problems. In this case, the distances between cities form a proper metric distance.

2.1.5  Random Two-Dimensional Rectilinear Problems

In this set of problems, the cities are uniformly distributed in a by square, and the distance between the cities is computed based on the rectilinear metric
formula
5
where shows the coordinates of the ith city. We refer to this set of problems as the Rectilinear problems.

2.1.6  Tilted Drilling Machine Instances with Additive Norm

This set of the problems is motivated by considering the cost of driving an idealised drilling machine. The task is to drill a collection of holes on a tilted surface where the drill is moved by two motors. The first motor moves the drill in the x-coordinate, and the second motor moves it in the y-coordinate. Because of the drill weight, the second motor needs less energy when moving the drill down than when moving it up. The problem generator places the holes uniformly in a by square and has three parameters: ux is a coefficient that shows how much energy the first motor needs to move the drill one unit in the x direction; is a coefficient that shows the energy needed to move the drill one unit down; and , which is the needed energy when moving upward. So the problem matrix is generated according to
formula
6
We consider ux = 1, , and . We refer to this set of problems as the Additive Drilling problems.

2.1.7  Tilted Drilling Machine Instances with Sup Norm

For many drilling machines, the cost of drilling will depend on the maximum time the drill moves in either the x and y directions rather than their sum. For this problem type the holes are placed just as in the previous problem, but the distances are specified as follows,
formula
7
For this problem type we consider ux = 2, =4 and =1. The points are randomly and uniformly put in a by square and take integer values. This problem type is referred to as the Sup Drilling problems.

2.1.8  Random Euclidean Stacker Crane Problem

We consider a crane having to move a set of n objects. Each object is at a (source) location and has to be moved to a destination location . The problem is to chose the order of the tasks to minimise the Euclidean distance between the destination location of one task and the source location of the next location. That is, we take the distance to be
formula
where denotes the Euclidean distance. The sources are uniformly picked up from a by square,
formula
8
while the destinations are chosen to be close to the sources,
formula
9
where n is the number of source-destination pairs (problem size). We denote this problem by Crane. In Figure 1 we show an example of a possible tour made by the crane.
Figure 1:

Example of the crane problem for five cities. The dashed lines show one possible tour around the cities.

Figure 1:

Example of the crane problem for five cities. The dashed lines show one possible tour around the cities.

2.1.9  Disk Drive Problem

These instances are generated by an idealised model of the scheduling job of the read head of a computer disk. The task is to extract n records from a disk. This problem is similar to the stacker crane problem in that the files to be read have a start position and an end position in their tracks. The position is represented in polar coordinates, with the first coordinate representing the radius and the second coordinate the angle. The destination is at the same radius as the source. This is illustrated in Figure 2, which shows an example of five records stored on a disk drive. The read head has to move from the destination of the current record to the source of the next record. We assume that changing the position of the read head (its radius) is 10 times slower than disk rotation.

Figure 2:

Example of the disk problem with five records.

Figure 2:

Example of the disk problem with five records.

The source points are generated using Eq. (8), while the corresponding destination is generated using
formula
where is the coordinate of the source. This models the situation where the record lies on the same track as the source and the disk has circular boundary conditions in the first coordinate. Furthermore, we assume that the head is moved in the second coordinate but must wait until the spinning disk reaches the correct first coordinate. The head motion is assumed to be 10 times slower than the disk motion. The cost is taken to be the time to access all the records and return to the starting position as follows:
formula

Although, this is clearly idealised (for example, no account is made of acceleration and deceleration times of the read head), it nevertheless captures much of the structure that a real disk problem would have. We refer to this problem as Disk Drive.

2.1.10  Euclidean Problem

This is the well-known Euclidean TSP, the description of which can be found in Cirasella et al. (2001). To generate instances, we uniformly and randomly select the cities in a by square, and the distance between the cities is the Euclidean distance between them. We refer to this problem type as Euclidean.

2.1.11  No-Wait Job Scheduling

In this problem we assume that we have k processors and a set of n tasks, each of which must use the k processors in order with no waiting time allowed between ending one task and starting the next. The no-waiting condition would be realistic if, for example, the processing required the material to be kept hot, or alternatively, if there were no storage space between two consecutive tasks. We denote the times taken to complete the lth process for the ith task as . Thus each task can be represented by a list of subtask times . What makes the scheduling problem nontrivial is that one can start the next task before the previous task is finished, provided that none of the subtasks have to wait. Thus the distance between tasks is the wait time
formula
We illustrate this wait time for two problems in Figure 3. We consider , and the times are made by a uniform random number generator between 1 and 1,000.
Figure 3:

Example of the wait time between processes and for a problem consisting of four tasks. If a smaller waiting time, Mij, were used, then task 3 in process j would have to wait for processor 3 to complete task 3 for process i, but this is not allowed.

Figure 3:

Example of the wait time between processes and for a problem consisting of four tasks. If a smaller waiting time, Mij, were used, then task 3 in process j would have to wait for processor 3 to complete task 3 for process i, but this is not allowed.

2.2  Local Search Operator

To define the landscape of a problem requires some definition of the neighbourhood of the configurations. The neighbourhood is usually defined in terms of the configurations that can be reached by some local search operator. We consider k-opt moves, which are the most commonly used move set used by heuristic algorithms to solve TSP. A k-opt move consists of dividing the tour into k segements and then recombining the segments in such a way as to obtain a legal tour. The simplest k-opt move is 2-opt. For most of the analysis we carried out, we found 2-opt impractical, for two reasons. First, for tours of moderate size (n around 50) we found that the number of local optima was so large that we were unable to store them. Second, the 2-opt move reverses one segment of the tour. For asymmetric problems this would disrupt a large number of edge distances, so 2-opt cannot be viewed as a minimal move for this set of problems. Thus, we focus on 3-opt moves, which both substantially reduces the number of local optima and contains moves that change the tour but does not necessarily change the direction in which a segment is traversed. We also considered using 4-opt; however, we found this had such a large neighbourhood that it became computationally very expensive to ensure that a local optimum had been reached.

Note that there is a difference in the interpretation of distance between symmetric and asymmetric problems. In symmetric problems the edges and are treated as the same, while they are treated as different edges for asymmetric problems. If we swap three bonds in a 3-opt, this would correspond to making a move of size 3 in Hamming distance for symmetric problems. For asymmetric problems the distance would also depend on whether any of the segments of the tour were reversed. Reversing a segment would on average change around edges for asymmetric problems.

Although it is convenient to consider k-opt neighbourhoods, it is often easier to use the Hamming distance in the space of edges rather than the minimum number of k-opt moves needed to go from one configuration to another. That is, we can represent a tour as a binary vector in the space of edges, with 1 denoting the existence of and edge in the tour. For symmetric problems there are edges, while for asymmetric problems there are edges. Each tour consists of exactly n nonzero components. The Hamming distance counts the number of edges that differ between two tours.

2.3  Exhaustive Local Search Algorithm

One way of studying the topology of the local optima is to check all the solutions in the landscape and find every local optimum. This strategy is feasible for problem sizes up to 20 or 30. However, as the search space for TSP grows as the factorial of the system size, it is impossible to do such an experiment on larger problems. In order to study the local optima for larger problem sizes, we use a local search algorithm that exhaustively explores the neighbours of the current solution and any neighbouring solutions at the same cost to find an improving move. Tayarani and Prügel-Bennett (2015) give a comprehensive description of the exhaustive local search algorithm that guarantees that the solution it returns is located on a local optimum and returns a unique ID for each local optimum. The algorithm starts from a random configuration and uses the 3-opt local search process to find a better solution until a local optimum is reached. It then returns a unique ID for the local optimum it has found.

3  Landscape Analysis

This section explores many properties of the landscape of TSP, and in particular it focuses on how these properties scale as the problem size increases.

3.1  Density of States

We start our analysis by studying the statistical properties of randomly drawn solutions in the landscape. In our experience these properties do not correlate well with problem hardness, but they provide information on the probability of finding an improving neighbour at a given cost level. This might provide useful information, for example, in deciding the size of the neighbourhood in local search.

The density of states shows the number of configurations at each cost level. The average cost of the solutions in TSP is the average distance between the nodes. By randomly sampling the solutions and finding the histogram of cost of the random solutions, we can compute the spread of costs around the mean. Figure 4 shows the natural logarithm of the histogram of costs around the average cost scaled by for 100, 500, and 1,000 for the Euclidean problem, where n is the size of the problem. The results are similar for every randomly drawn problem instance. The cost of a random solution is approximately normally distributed around the average cost with a variance that grows approximately as n. We did the same experiment on all the 11 problem types. In each case we averaged over 50 different problem instances of each type. We observed very similar behaviour for all the problem types.

Figure 4:

Natural logarithm of histogram of costs for random solutions of particular instances of the Euclidean problem for .

Figure 4:

Natural logarithm of histogram of costs for random solutions of particular instances of the Euclidean problem for .

3.1.1  Statistical Measures

The density of states for all problems is a bell-shaped curve resembling a normal distribution. To quantify the difference between the problems, we can measure the cumulants, . The first cumulant is the mean, the second cumulant is the variance, while the higher cumulants provide a natural measure of the deviation from the mean. The third and fourth cumulants can also be expressed in terms of the central moments
formula
where c is the cost of the tour, and the expectation is over all instances of the problem. The third cumulant is equal to the third central moment , while the fourth cumulant is equal to the fourth central moment minus 3, times the variance squared. In a normal distribution the third and fourth cumulants are zero. The higher-order cumulants are invariant to changing the definition of the costs by an additive constant but are not invariant to the scale of the cost. A more useful set of measures to understand how much a distribution differs from a normal distribution is to consider the scaled cumulants where we scale the nth cumulant by a factor of . The scaled cumulants are now invariant to an affine transformation of the costs (for ). The third scaled cumulant is known as skewness, denoted by , while the fourth scaled cumulant is known as kurtosis and is denoted by .
In principle we could compute the cumulants ab initio; we did this for graph colouring in a previous paper (Tayarani and Prügel-Bennett, 2015). For example, the mean is equal to
formula
For random tours so that
formula
where is the mean of the distance values. We can thus compute the mean length of a random tour averaged over all instances by computing the average distance for the 11 problems. For example, for a 2D Euclidean tour with cities randomly placed in a unit square,
formula
Such calculations, however, are not very informative. Furthermore, for higher cumulants they rapidly become very complicated. We therefore do not pursue these calculations but rather show empirically obtained results.

In TSP, as the size of the problem grows, the skewness and kurtosis of the cost of the random solutions change. Figure 5 shows the relation between the skewness and kurtosis of the distribution of costs for the Euclidean problem for different problem sizes. For small problems the skewness and kurtosis are negative. The negative skewness means that for small problems the bulk of the costs lies to the right of the mean, and the negative kurtosis means that costs are more distributed around the mean than a normal distribution. The only problem type that has a positive skewness is the random problem class. For random symmetric problems the skewness is around zero for all the problem sizes.

Figure 5:

The skewness and kurtosis of the distribution of costs for Euclidean problem instances.

Figure 5:

The skewness and kurtosis of the distribution of costs for Euclidean problem instances.

The skewness versus the kurtosis of the distribution of costs for different problem types for n = 10 and n = 1,000 are represented in Figure 6. The results are averaged over 50 different problem instances and 10 sampling for each. It is clear that for all the problem types as the system size increases, the distribution of the costs converges toward a normal distribution. This might seem an inevitable consequence of the central limit theorem, but that is not true; for example, in graph colouring the skewness and kurtosis remain nonzero as  (Tayarani and Prügel-Bennett, 2015).

Figure 6:

The kurtosis of randomly constructed solutions versus skewness for different problem types with different sizes. The smaller symbols grouped at the upper right-hand corner of the figure represent the problems with , and the larger symbols in the rest of the figure represent the problems with n = 10. The results are computed by sampling 10 random configurations for 50 different problem instances.

Figure 6:

The kurtosis of randomly constructed solutions versus skewness for different problem types with different sizes. The smaller symbols grouped at the upper right-hand corner of the figure represent the problems with , and the larger symbols in the rest of the figure represent the problems with n = 10. The results are computed by sampling 10 random configurations for 50 different problem instances.

3.2  Autocorrelation

One way of measuring the ruggedness of the fitness landscape is to find the autocorrelation of a random walk through the search space (Weinberger, 1990). In order to compute this, we use a random walk algorithm, starting from a random point and moving to a neighbour at each step. The autocorrelation of the landscape is given by
formula
10
where is the cost at step t, and is the variance in the cost for random solutions. We computed the autocorrelation for different sizes of different types of problems. The curves representing the autocorrelation for particular instances of the Euclidean problem for , 200, and 1,000 for three different walks, 2-opt, 3-opt, and 4-opt, are shown in Figure 7. Note that for large values of the estimated value of is unreliable because of statistical fluctuations, even though we used steps in our estimate. Under the rescaling used in this figure, the graphs for all the problems are very similar. The autocorrelation function appears to fall off approximately exponentially as
formula
where l is known as the correlation length (Stadler, 1996). The gradient of the curves for 2-opt in Figure 7 is about −1.94; therefore for 2-opt, . The correlation length is considered a measure of the landscape ruggedness—the smaller l, the more rugged the landscape. As n increases, the ruggedness decreases; that is, if we consider two instances of size n and , we get roughly the same degree of ruggedness if we make random steps on the instance of size as we would for a single step on the instance of size n. The correlation length shows that the landscape of TSP is relatively smooth with long-range correlation. It is also clear that for the correlation between the solutions in k-opt is weaker than that in -opt. This is because as we increase k, we are making larger steps and thus decorrelating faster.
Figure 7:

Natural logarithm of auto correlation for three instances of size n = 100, 200, and 1,000 for the Euclidean problem, plotted against the time difference . The number of steps is .

Figure 7:

Natural logarithm of auto correlation for three instances of size n = 100, 200, and 1,000 for the Euclidean problem, plotted against the time difference . The number of steps is .

Figure 8 shows the autocorrelation function for all 11 problem types for , where we use a 3-opt walk. We observe that the problems split into two major groups. For random symmetric, random symmetric closed, rectilinear, additive drilling, crane, disk drive, and Euclidean problems, the autocorrelation is a straight line, while for the random, random closed, sup drilling, and job scheduling problems, the curves have a distinct kink at the beginning. The kink appears at .

Figure 8:

Natural logarithm of the autocorrelation for different problem types with the size of , plotted against the time difference . The number of steps is . This is for the 3-opt operator.

Figure 8:

Natural logarithm of the autocorrelation for different problem types with the size of , plotted against the time difference . The number of steps is . This is for the 3-opt operator.

The kink occurs in those problems with asymmetric weights. The explanation of the kink is that typically a random 3-opt move will reverse the direction in which a tour is visited. For asymmetric problems this causes a rapid decrease in the correlation. However, later moves can reverse the direction in which these cities are visited to the original direction, thus restoring the correlation.

We can measure the degree of asymmetry of a problem using
formula
11
In Figure 9 we show versus the degree of asymmetry. We note that there are some problems, notably the additive drilling problem and the sup drilling problem, that have a reasonably high asymmetry but do not show a marked decrease in the correlation produced by a single step. The reason for this is that they contain a hidden symmetry. If one reverses a segment of a tour for these problems, the cost depends only on the difference in height of the starting and end points of the segment. As a consequence, these tours are much closer to the symmetric tours than the degree of asymmetry would indicate.
Figure 9:

The correlation length for the first steps of different problem types against the asymmetry of the matrix. The size of the problem is .

Figure 9:

The correlation length for the first steps of different problem types against the asymmetry of the matrix. The size of the problem is .

3.3  Time to Local Optimum

In the study of the landscape of TSP, we mainly focus on the properties of local and global optima. Our analysis begins by studying the number of steps taken by the local search algorithm to reach a local optimum. We count the number of 3-opts performed on the starting configuration until reaching a local optimum. Note that for symmetric problems the 3-opt operator changes three edges at each step, and for asymmetric problems many of the edges may change. To check if the local search algorithm has reached a local optimum, an exhaustive local search algorithm is used to make sure that the local search algorithm is at a local optimum.

Figure 10 shows the mean time to reach a local optimum versus n for all the problem types. It is clear that for all the problem types the time to local optima increases linearly with the system size (with the possible exception of job scheduling, which seems to grow slightly faster). The data suggest that the time to local optima increases as approximately . From the point of view of time to local optima, the problem becomes linearly harder as the system size grows. But of course it is not only the time that matters; the number of local optima is another important property that determines how hard the problem is. We discuss this in the next sections.

Figure 10:

Time to local optima versus system size for different problem sizes.

Figure 10:

Time to local optima versus system size for different problem sizes.

Although the average time to reach a local optimum is quite similar for different problem types, the distribution of runtimes differs markedly between problem types. This is illustrated in Figure 11, where we show the histogram runtimes for the sup drilling problem and the Euclidean problem. We have plotted the log-frequency for the sup drilling problem to show the rare events. Although the average number of steps is approximately the same for the two different problem types (34.73 for sup drilling and 31.67 for Euclidean), the sup drilling problem shows a long tail at the right side of the mean, while the Euclidean problem shows a more symmetric shape with no long tail. The reason for such a difference lies in the values of the distance matrices of the problems. For the Euclidean problem the distances between the cities are real numbers, so even if truncated, it is exponentially rare to have two edges with the same length. This characteristic makes the fitness of almost all the solutions in the search space different from each other. Having such a characteristic, the landscape tends to have no plateau, so starting from a random solution, the local search algorithm directly reaches a local optimum. For the sup drilling problem the edges in the graphs take integer values; consequently there can be many different tours of the same length. This produces plateau regions that have to be explored, thus increasing the runtime.

Figure 11:

The histogram of number of steps to local optima. The size of the problem is n = 50. The data are for 10 different problem instances and 10 descents on each.

Figure 11:

The histogram of number of steps to local optima. The size of the problem is n = 50. The data are for 10 different problem instances and 10 descents on each.

So far we have studied the number of steps to reach a local optimum, but we have not shown how the cost reduces during the run. In Figure 12 we show the cost versus the number of steps taken averaged over 1,000 descents for an instance of the sup drilling problem. We truncate the curves after the optima are reached. We plot the same data using a density plot and on a logarithmic scale in Figure 13. The most prominent characteristic is that the improvement in cost decreases roughly exponentially with the number of steps. However, there is clearly a large deviation from this for a large number of steps caused by the relatively small number of runs that take a long time. These characteristics are shared by all the problems, although the exceptionally large tail is a feature only of those problems with large plateaus.

Figure 12:

Cost of the solutions the local search algorithm visits at each step, for 1,000 descents. This is for the sup drilling problem with the size of .

Figure 12:

Cost of the solutions the local search algorithm visits at each step, for 1,000 descents. This is for the sup drilling problem with the size of .

Figure 13:

Density graph of logarithm of the cost of the solutions the local search algorithm visits at each step, for 1,000 descents. This is for the sup drilling problem with the size of .

Figure 13:

Density graph of logarithm of the cost of the solutions the local search algorithm visits at each step, for 1,000 descents. This is for the sup drilling problem with the size of .

3.4  Number of Local Optima

What makes many combinatorial optimisation problem instances difficult is the number of local optima. To find the number of local optima in the landscape we use an exhaustive local search algorithm and store the local optimum it returns. The exhaustive local search algorithm is repeated for a large number of times from randomly chosen starting configurations. Each local optimum and the number of times it is hit are stored. Of course, there is no guarantee that all the local optima in the landscape are found, particularly if a local optimum has a small basin of attraction. Typically there are many local optima that have a very small probability of being visited.

The logarithm of the number of times each local optimum is hit versus the cost of the local optimum for the Euclidean problem is shown in Figure 14. The data are based on one single instance and descents. Of course, there are likely to be local optima that we have not found. However, note that the local optimum with the highest cost (the worst local optimum) is visited 18 times. The minimum number of times a local optimum is visited is 13. Thus with high probability any local optimum with a cost more than the worst local optimum that we have not found would have a basin of attraction 18 times smaller than those that we have found. Although we cannot rule out the existence of such local optima theoretically or practically, we strongly doubt their existence, as we have not found any example of them in a very large number of trials.

Figure 14:

Logarithm of the number of times the local search algorithm hits local optima versus the cost of the local optima. The size of the problem is n = 50. This is for one instance of the Euclidean problem. The number of descents is .

Figure 14:

Logarithm of the number of times the local search algorithm hits local optima versus the cost of the local optima. The size of the problem is n = 50. This is for one instance of the Euclidean problem. The number of descents is .

We observe that gradient of the fitting line in Figure 14 is , which implies that the probability of ending up in a local optimum of cost c decays exponentially as . Using the extrapolation, we find for the descents the expected number of times we would visit a local optimum with a cost greater than would be less than 1, if any such optima exist. The probability of finding the global optimum was 16.7%, while the probability of finding the second best optimum was 14.9%.

Although for all the problem types the probability of getting to a local optimum decays exponentially as the cost of the local optimum grows, the decay rate is different for different problem types. Since the average cost of the local optima is different for different problem types, in order to be able to compare the decay rate of different problems, the cost of the local optima has to be normalized. To do so, the cost on the horizontal axis in Figure 14 is divided by the average cost of the local optima. After normalization, the gradient of the fitting line to the data shows the decay rate of the probability of getting to a local optimum as a function of its cost. The probability of getting to the global optimum versus the decay rate for all the problem types is shown in Figure 15. For the random problems we found the best solution on average 1.07 times in descents. Thus for this problem we can have no confidence that we have found the global optimum. There seems little evidence of a relation between the cost of the optima and the size of their basin of attraction. When the random problems become symmetric, the probability of getting to the global optimum grows; on average the best solution is hit 20 times. When the random problems are forced to satisfy the triangle inequality (random closed), the correlation between the cost of a local optimum and the probability of its being visited increase further. For this problem type the average number of times the best found optimum is hit is 80 times in descents. Going from random problems to random closed problems, the decay rate grows as well. When the random problems are both symmetric and closed, then the number of times the global optimum is hit grows to 4,119 times in descents. For the case of the job scheduling problem, the best observed solution on average is hit 4 times in descents. This means that in comparison with the Euclidean problem, for example, in which the best observed solution is visited for more than 10,000 times, the best observed local optimum in the job scheduling problem does not show any significant superiority over other local optima in terms of its size of the basin of attraction. As the job scheduling problem is not a random problem, we might expect to observe a correlation between the cost of a local optimum and the size of its basin of attraction, but Figure 15 shows no such relation.

Figure 15:

The probability of getting to the global optimum solution versus the decay rate for all the problem types. These are averaged over 10 different instances and descents on each.

Figure 15:

The probability of getting to the global optimum solution versus the decay rate for all the problem types. These are averaged over 10 different instances and descents on each.

Problem types sup drilling and crane show a quite similar behaviour. For the sup drilling problem the global optimum on average is hit 695 times, and in the crane problem it is hit 853 times in descents. For the four remaining types the decay rate is similar, but the probability of getting to the global optimum is different. The Euclidean problem shows the highest probability of getting to the global optimum. In the case of Figure 14, the best observed optimum is hit 16,752 times in descents.

The histograms of the number of local optima at each cost for two different problem types are shown in Figure 16. The histogram shows different behaviour for the different problem types: for the sup drilling problem it is close to a normal distribution, while for the Euclidean problem it is closer to a uniform distribution. In some cases like job scheduling, the distribution has a long tail at the right side. A tail at the right side of the distribution means that there could be high-cost local optima but with smaller probability of being visited. The distribution for the Euclidean problem shows no tail at either of its sides. There is quite a large proportion of local optima at both high and low cost levels.

Figure 16:

The histogram of the number of local optima at each cost for two different problem types. The size of the problem is . The data are for 10 different problem instances and 10 descents on each.

Figure 16:

The histogram of the number of local optima at each cost for two different problem types. The size of the problem is . The data are for 10 different problem instances and 10 descents on each.

One way of measuring the size of the tails is to consider the distribution of costs of the local optima and to measure the proportion of local optima in the 10% of highest and lowest costs. This is shown in Figure 17. There is clearly a strong correlation indicating some degree of symmetry. The Euclidean problem has the highest proportion of local optima at the extremes of the cost distribution, while the random problems have the lowest proportion.

Figure 17:

The frequency of the local optima at the higher 10% costs versus the lower 10% costs. The size of the problem is , and the results are computed by sampling descents for 50 different problem instances.

Figure 17:

The frequency of the local optima at the higher 10% costs versus the lower 10% costs. The size of the problem is , and the results are computed by sampling descents for 50 different problem instances.

3.4.1  Growth in the Number of Local Optima

The number of local optima grows with system size. If we take the logarithm of the number of local optima as a function of the cost, and scale by a function of the system size, we obtain very similar profiles. This is illustrated in Figure 18, where we show the natural logarithm of the number of local optima at each cost, divided by for the Euclidean problem with different problem sizes. The results are averaged over 20 different problem instances and descents on each. This suggests that the number of local optima at each cost level grows exponentially as
formula
where is the roughly quadratic, as shown in Figure 18. Figure 14 shows that as the cost of a local optimum increases, the probability of the local search algorithm finding it decreases exponentially, so the number of local optima at higher costs (right part of the graph) is underestimated in both Figure 16 and Figure 18. The Figure 18 plot shows that the histograms are quite similar for different system sizes. Such behaviour is consistent with the hypothesis that the number of local optima grows exponentially with the . In order to fit the histogram for different system sizes, in case of the Euclidean problem, the natural logarithm of the number of local optima has been scaled by . Note that the size of the search space also grows as ). Although the number of local optima for all the problem types grows exponentially as a function of the system size, the scaling behaviour changes across different the problem types. For example, for rectilinear, disk drive and job scheduling problems the scaling is n, and for the sup drilling and crane problems the scaling is . For all four random types a vast majority of the local optima are hit once in descents, so it is hard to estimate the relation between the system size and the number of local optima, although for all the problem types the number of local optima increases exponentially as some function of the system size.
Figure 18:

Natural logarithm of the number of local optima at each cost divided by n for the Euclidean problem with different problem sizes. This is averaged over 20 problem instances and descents on each.

Figure 18:

Natural logarithm of the number of local optima at each cost divided by n for the Euclidean problem with different problem sizes. This is averaged over 20 problem instances and descents on each.

It is interesting to note that the difficulty of the problem is not obviously correlated with the growth in the number of local optima. For example, Euclidean TSP has a larger probability of finding the global optima than sup drilling despite have significantly greater growth in the number of local optima than sup drilling.

3.4.2  Probability of Visiting an Optimum

We have seen that for some problems the probability of finding a local optimum is correlated with its cost, while this is not true for other problems. We illustrate this in Figure 19, which shows the proportion of local optima at each cost level and the probability of finding local optima at the cost level. For the sup drilling problem the lower cost solutions tend to have significantly larger basins of attraction. In contrast, in the job scheduling problem the sizes of the basins of attraction are almost uncorrelated with the cost.

Figure 19:

Histograms showing the proportion of local optima at a particular cost and the probability of finding a local optimum at a particular cost in one instance.

Figure 19:

Histograms showing the proportion of local optima at a particular cost and the probability of finding a local optimum at a particular cost in one instance.

Clearly, it makes the problems easier to solve when their basins of attraction are larger for lower-cost solutions. To measure how the size of the basin of attraction of the local optima changes with the cost, we define the bias to be the difference between the expected cost of all the optima and the expected cost of the optima found by local search, divided by the standard deviation in the cost of all the optima. A large bias indicates that the local search is much more likely to find a fit local optimum than a less fit one. Figure 20 shows the probability of getting to the global optimum versus the bias averaged over 10 different instances for each problem type. There is clearly some correlation between the probability of finding the global optimum and the bias, but interestingly it is not that strong.

Figure 20:

The probability of getting to the global optimum versus average of the cost of the local optima minus the expected cost of the visited local optima divided by the standard deviation of the cost of the local optima for all the problem types. The size of the problem is . The data are averaged over 10 different problem instances and descents on each.

Figure 20:

The probability of getting to the global optimum versus average of the cost of the local optima minus the expected cost of the visited local optima divided by the standard deviation of the cost of the local optima for all the problem types. The size of the problem is . The data are averaged over 10 different problem instances and descents on each.

3.5  Reaching the Global Optima

As we increase the system size, the gap between the expected cost found by the local search and the minimum cost increases. This is illustrated in Figure 21, where the expected minimum cost (averaged cost of the global optima over 20 problem instances) and the expected cost of the local optima found by the local search algorithm over the same ensemble of randomly drawn instances are shown. We also show the expected cost of the local optima found by local search plus and minus one standard deviation. It is clear that as the system size grows, the gap between the expected cost and the cost of the global optimum solutions grows. For problem sizes n up to 30, the cost of the global optimum solution lies between the expected cost of the local optima and the expected cost minus one standard deviation. For the size of 80, it lies between the expected cost and expected cost minus two standard deviations. Clearly as the system size grows, it becomes less likely to find the global optimum.

Figure 21:

Plot of the expected minimum cost and the average cost of the local optima found by 3-opt versus n. We computed the standard deviation in the cost and show the expected cost plus and minus one standard deviation. This is found by performing hill-climbs on 20 randomly generated instances of the problem for the sup drilling problem.

Figure 21:

Plot of the expected minimum cost and the average cost of the local optima found by 3-opt versus n. We computed the standard deviation in the cost and show the expected cost plus and minus one standard deviation. This is found by performing hill-climbs on 20 randomly generated instances of the problem for the sup drilling problem.

Note that we are not sure if we have found the global optimum. So by global optimum we simply mean the best observed solution. However, as shown for many of the problem types, the better local optima have exponentially larger basins of attraction. As in our experiments the best found local optimum has been visited thousands of times; we believe it is highly unlikely to have an optimum with a better fitness than the best observed one and yet a basin of attraction thousands of times smaller. Therefore the best observed solution, with a high probability, is the global optimum (although this is not true for random problems of the job scheduling problem).

We can also compute the probability of finding the best found local optimum directly. This is shown in Figure 22, where we plot the log-probability of finding the best optimum versus system size for four different problem types. The data are averaged over 20 different problem instances and the number of descents is . The straight line fit is consistent with the hypothesis that finding global optima becomes exponentially unlikely as the system size grows. Using the straight line fit in Figure 22 to extrapolate to large n, the probability of the local search algorithm finding a global optimum for an instance of size 1,000 of the Euclidean problem would be and for the sup drilling problem . Although such extrapolation is unlikely to provide a precise value, nevertheless it provides a strong indication that for larger problem instances, using multiple runs of local search algorithms begins to be useless. For random and job scheduling problems the number of times the best visited optimum is visited rapidly shrinks to 1 in descents. For these problem types, for n larger than 40, the best visited solution is found just once, so we can almost be sure that for this system size there are better optima that have not been found in this number of descents. The data show that for all the problem types as the system size grows, the probability of finding the global optimum solution decreases exponentially, although at different rates.

Figure 22:

Log-probability of finding the lowest cost local optimum for four different problem types versus system size n. The number of hill-climbs is , and the data are averaged over 20 different random problem instances.

Figure 22:

Log-probability of finding the lowest cost local optimum for four different problem types versus system size n. The number of hill-climbs is , and the data are averaged over 20 different random problem instances.

Figure 21 shows that the gap between the expected cost and minimum cost increases with the system size, and Figure 22 shows that the probability of finding the global optimum decreases exponentially with system size. These two figures show two different aspects of the same property of the fitness landscape. This property shows how the problem becomes harder as the system size grows. In order to show this for all the problem types, Figure 23 shows the gradient of the fitting line in Figure 22 versus the increasing rate of the gap in Figure 21 with the system size. The data clearly show the correlation between the gap and the probability of finding the global optimum. The job scheduling problem and the random problem can be considered the hardest problems, as the decay rate of the probability of finding a global optimum is the highest among all the problem types. When the random problems become symmetric or closed, the decay rate decreases, meaning the random symmetric and the random closed problems are easier than completely random problems. Among these problem types, the additive drilling and the Euclidean problems, which show similar behaviour, are the two easiest problems.

Figure 23:

The increasing rate of the gap between the expected cost and cost of the global optimum with n versus the decay rate of the probability of getting to the global optimum with n. The data are found for n = 20 to n = 80 with the step of 10 for 20 different problems and descents on each.

Figure 23:

The increasing rate of the gap between the expected cost and cost of the global optimum with n versus the decay rate of the probability of getting to the global optimum with n. The data are found for n = 20 to n = 80 with the step of 10 for 20 different problems and descents on each.

3.6  Distance between Optima

One important property of the fitness landscape is the distance between the local optima, which shows how the local optima are correlated. The distance between two solutions is simply defined as the number of noncommon edges in the two solutions (note we are using here the Hamming distance rather than the number of 3-opt moves between configurations). For symmetric problems the edges are undirected, so both the directions of the edges are considered the same. The histogram of the distances between local optima for two different problem types is shown in Figure 24. The results are averaged over 20 randomly drawn problem instances for n = 50. The interesting property of the histogram is the double peaks in the histogram of the additive drilling problem, one higher and wider peak at the left, and one lower and thinner at the right. This peculiar property is because of the nature of this problem. As described in Section 2.1.6, in the definition of the drilling problem, the energy needed for the machine to move the drill down is zero. Because of this property, although this problem is not a symmetric problem, some subtours in a solution can be reversed without posing a significant change to the cost of the solution. The change to the cost of a solution posed by reversing a subtour is the difference in the y-coordinates of the starting and ending points of the subtour. In this sense the problem can be considered as a partly symmetric problem. This property causes the peculiar behaviour in the histogram of the distances between the local optima.

Figure 24:

Histogram of the distances between the local optima for two different problem types. The size of the problem is n = 50, and the results are averaged over 20 different randomly drawn problem instances. The number of hill-climbs is on each problem instance.

Figure 24:

Histogram of the distances between the local optima for two different problem types. The size of the problem is n = 50, and the results are averaged over 20 different randomly drawn problem instances. The number of hill-climbs is on each problem instance.

The histogram of the distances has a bell shape in the Euclidean problem. This bell shape behaviour of the histogram is seen in all the problem types except the additive drilling and to some degree in the sup drilling problem. In all the problem types the peak of the histogram is located at the left side of the expected distance between two randomly drawn solutions. Note that for the size of , the expected distance between two randomly drawn solutions is around 49.5 (we found this by randomly generating pairs of configurations and calculating the average distance). Therefore the histograms of Figure 24 clearly suggest that the local optima are much closer to each other than the expected distance between two random solutions.

Figure 24 shows the histogram of distances for all local optima irrespective of their cost. It is also interesting to see how the distances between local optima depend on their cost. We show this information for the sup drilling problem in Figure 25, where a three-dimensional histogram is plotted. We observe that the best (lowest cost) local optima are closer to each other than the less fit local optima. Half of the best 10% of local optima are within a distance of 5 from each other. We observe that the less fit local optima are much further from each other than the best fit optima.

Figure 25:

Histogram of the distances between the best local optima for different numbers of local optima for the sup drilling problem. The size of the problem is . The data are averaged over 20 different problem instances and descents on each.

Figure 25:

Histogram of the distances between the best local optima for different numbers of local optima for the sup drilling problem. The size of the problem is . The data are averaged over 20 different problem instances and descents on each.

3.7  Distance to Global Optimum

The measure of the mean distance to the global optimum from a local optimum of cost c for different sizes of the Euclidean problem is shown in Figure 26. The scaling in the vertical axis demonstrates that the distance from a local optimum to a global optimum scales linearly with the problem size. The scaling behaviour of the curve is the same for all the problem types. Making the last step from an optimum of cost to cmin clearly grows with n, although it is difficult to be sure how this distance grows. This question depends on extrapolating the curve in Figure 26 to the y-axis.

Figure 26:

Measure of the mean distance to the global optimum from a local optimum of cost c for different sizes of the problem. This is for a Euclidean problem, averaged over 20 problem instances.

Figure 26:

Measure of the mean distance to the global optimum from a local optimum of cost c for different sizes of the problem. This is for a Euclidean problem, averaged over 20 problem instances.

Another way of viewing this information is as a density plot of the cost versus distance to the global optimum. We show these plots for a Euclidean problem and a random problem in Figure 27. There is clearly a strong fitness distance correlation for the Euclidean problem, but very little for the random problem. Many population-based algorithms attempt to exploit the fitness distance correlation. Clearly, there is some hope in case of Euclidean problems but much less hope for random TSP problems.

Figure 27:

Density plot of local optima as a function of their distances from the most frequently visited global optimum. The shading of the point shows the number of local optima at that cost and distance. The size of the problem is for two problem types, Euclidean () and random (). The number of hill climbs is .

Figure 27:

Density plot of local optima as a function of their distances from the most frequently visited global optimum. The shading of the point shows the number of local optima at that cost and distance. The size of the problem is for two problem types, Euclidean () and random (). The number of hill climbs is .

3.8  Expected Cost in Hamming Sphere

We noted earlier that all the problems had a relatively slow decline in the autocorrelation function, corresponding to a large correlation length. This is a result of the cost being the sum of all the edges in the tour. If we make a local move that changes only a few edges, then there will be only a relatively small change in the cost. To examine this, we study the mean cost in a Hamming sphere from a configuration, the Hamming sphere of radius h being the set of configurations that differ from the start configuration by exactly h edges.

Figure 28 shows the expected cost of a configuration in a distance sphere of radius h from a global optimum. To construct the graph, we performed some random number of 3-opt moves to the global optimum and then measured the distance from the global optimum. The process was repeated times and the costs at each Hamming distance averaged.

Figure 28:

Expected cost of configurations in distance sphere of radius h around a global optimum for . This is for a Euclidean problem. The dotted lines show one standard deviation around the mean. The dashed line shows the expected cost of the random solutions.

Figure 28:

Expected cost of configurations in distance sphere of radius h around a global optimum for . This is for a Euclidean problem. The dotted lines show one standard deviation around the mean. The dashed line shows the expected cost of the random solutions.

Interestingly, we observe that the cost increases linearly with the distance up to a Hamming distance of . We note that the average Hamming distance between random tours is around . We also show the expected cost of a random tour in Figure 28 as a horizontal dashed line. A very simple model for the expected cost in a Hamming sphere of radius h about a tour s is
formula
where is the cost of the current tour and is the average cost.

3.9  Return Probabilities

Another interesting property is the probability of returning to a local optimum starting at a fixed Hamming distance from the optimum. We can empirically measure the “basin of attraction” of a local optimum by repeatedly jumping to a configuration at a certain distance from the local optima and running the local search algorithm until it reaches a local optimum. We then see if this is the same local optimum from which we started. We can thus measure the return probability starting from a configuration in a given distance sphere. In Figure 29 we show these return probabilities for four problems starting from the global optimum using 3-opt moves. The curves are not monotonic because 3-opt moves can change some number of edges easier than others. For example, it is easier to change three edges than five edges. As we would expect, the return probabilities falls off faster as the problems become harder.

Figure 29:

The return probability starting from different distances from the global optima. This is for four different problem types for .

Figure 29:

The return probability starting from different distances from the global optima. This is for four different problem types for .

4  Principal Component Analysis

Although we have seen that the number of local optima grows exponentially with the system size, many of these may be formed by combining other optimal tours. For example, we could have a tour consisting of 10 segments. There may be two routes through the cities in each segment, both of which are local optima. In this case there would be local optima that are a consequence of these 10 binary choices. To investigate the underlying variety in the local optima, we use principal component analysis to find the subspaces with a large variation. To do this we consider each tour as a binary vector in the space of edges. That is, for a symmetric problem the tour is represented as a binary vector consisting of n nonzero components in a total of possible components. Similarly, for asymmetric problems the tour is represented as a binary vector consisting of n nonzero components in a total of possible components. To perform PCA we subtract the mean vector of all the optima and then compute the eigenvalues and eigenvectors of the covariance matrix (we can compute this efficiently using singular value decomposition).

To illustrate this approach we show the average solution of the local optima in Figure 30 for a Euclidean problem with 30 cities. The width of the edges shows the number of times they appear in different local optima. It is clear from the graph that there are many edges that do not appear in any of the local optima, and there are some that appear in all the local optima. This property means that the search algorithms do not need to involve such edges in their search process, and a huge proportion of the landscape can be ignored in the search process.

Figure 30:

The schematic shape of the average of local optima solutions for a Euclidean problem with 30 cities. The width of the edges shows the number of times they appear in different local optima.

Figure 30:

The schematic shape of the average of local optima solutions for a Euclidean problem with 30 cities. The width of the edges shows the number of times they appear in different local optima.

The eigenvectors correspond to orthogonal directions in edge space. The eigenvectors with larger eigenvalues correspond to directions where there is a large variance. That is, the local optima vary significantly in these directions. We can think of these eigenvectors, or eigentours, as showing the most significant way in which the local optima differ. In Figure 31 we show the eigenvector corresponding to the largest eigenvalue (i.e., the principal component) for the same problem as shown in Figure 30. Some components are positive, and others are negative. We can (roughly) interpret an eigentour as a choice between choosing the positive edges and removing the negative edges, or the other way around.

Figure 31:

The schematic shape of the eigenvector of the local optima solutions with highest eigenvalue for a Euclidean problem with 30 cities. The width of the edges shows the values in the eigenvector. The solid lines represent the positive values, and the dashed lines represent the negative values.

Figure 31:

The schematic shape of the eigenvector of the local optima solutions with highest eigenvalue for a Euclidean problem with 30 cities. The width of the edges shows the values in the eigenvector. The solid lines represent the positive values, and the dashed lines represent the negative values.

By plotting the eigenvalues against their rank we can see the dimensionality of the subspace where there is a significant variation between local optima. The size of the eigenvalue is equal to the mean squared reconstruction error of the optima if we project out that dimension. In Figure 32 we plotted the spectrum of eigenvalues for four instances from different problem types. We see that in the degree of variation in the Euclidean and the rectilinear problems we have a relatively small number of nonzero eigenvalues, and the variance is small (indicating that the eigentours do not involve that many edges). In contrast, the scheduling and sup drilling problems show many more directions of variation and considerably larger variance.

Figure 32:

The eigenvalues of the eigenvectors of the edges in local optima in descending order. This is for four different problem types with .

Figure 32:

The eigenvalues of the eigenvectors of the edges in local optima in descending order. This is for four different problem types with .

One final use of PCA is that it allows us to project the local optimum solution into a meaningful low-dimensional subspace, allowing some visualisation of the structure of the local optima. We did this for six problems in Figure 33, where we plotted the local optima into the space spanned by the first two principal eigenvectors (the eigenvectors with the two largest eigenvalues). Furthermore, we plotted the data so that the size of the dots represents the fitness of the local optima; the better local optima are represented with larger dots.

Figure 33:

The projection of the local optima solutions into two dimensions using the two eigenvectors with the highest eigenvalues for six problem types.

Figure 33:

The projection of the local optima solutions into two dimensions using the two eigenvectors with the highest eigenvalues for six problem types.

The first graph is for a random problem instance, where there is no particular structure in the local optima. For all the problem instances of the crane problem we almost always see a squared clustered structure for local optima. The squares may form any angle with respect to the horizontal line or may consist of different number of clusters for different problem instances, but the common feature among all the problem types (except random problems) and problem instances is the fitness of the local optima form clusters. We see some of the clusters consisting of better local optima and some others containing inferior local optima solutions. The third graph is for the Euclidean problem, of which the projected local optima usually form two parallel elongated football-shape clusters. But not all the problem instances show exactly the same behaviour; the angle of the clusters to the horizontal line differs from one problem instance to another; the clusters are occasionally not parallel to each other; and sometimes there are more than two clusters. For the additive drilling problem we see a similar behaviour as that for the Euclidean problem, but almost always the number of the clusters is 2 (in over 50 problem instances we studied), and the gap between the clusters is much wider. The next graph shows the disk drive problem, where the local optima usually form parallel lines of clusters. In this problem type the local optima sometimes form two or more parallel lines and in some cases 10 parallel lines. The lines are sometimes close and mixed together such that they form a square shape. And finally, the last graph shows the job scheduling problem, where for almost all the problem instances we observed (for more than 50 problem instances with different sizes) the local optima formed squares, although the squares sometimes consist of some clusters.

5  Discussion

In this paper we use 3-opt for our studies. Clearly, many of the properties we report depend on the neighbourhood structure and the local search operator. This is particularly true for properties, like autocorrelation, that measure the local ruggedness of the landscape via random walk. However, the degree to which these properties depend on the local search operator varies from one property to another. For example, using an operator that takes longer steps and offers larger number of neighbours to a solution (like 4-opt) would change some of the data presented here. For time to local optima and the plateaus, for example, choosing such an operator would cause an operator to take fewer steps to reach a local optimum. Also the operator might escape plateaus more easily, so the size of the plateaus might decrease. Using such an operator also decreases the number of local optima in the landscape, as many of the basins of attractions merge together to form a single basin of attraction. For the same reason, the probability of finding a local optimum or the global optimum would increase, although this does not imply that such an operator is better to use, as each step of the algorithm is more time-consuming.

We believe some properties, however, would not change if different operators were used. These properties include the shape of the distribution of local optima at each cost level, the property that the probability of finding the global optimum decreases exponentially with the problem size, the property that the closer optima to the global optima have, in expectation, lower cost and the PCA properties. We believe these properties reflect the nature of these problems and so are operator-independent. We have two major reasons to believe this. First, these properties are also seen on other NP-hard combinatorial problems, including MAX-SAT (Prügel-Bennett and Tayarani, 2011), graph-colouring (Tayarani and Prügel-Bennett, 2015) and quadratic assignment (Tayarani and Prügel-Bennett, 2013) problems.

Second, some properties like the relation between the probability of finding the global optimum and time to local optima should not change if a different operator were used. To explain it more clearly, let’s assume the following example. We showed that time to local optima increases linearly, and the probability of finding the global optimum increases exponentially, with the problem size. So solving the problem becomes exponentially more time-consuming as the problem size increases. This should be the case for any operator. Assume there is an operator that if used, the probability of finding the global optimum increases polynomially, and the time to local optima increases linearly with problem size. This would mean that the time needed (to keep the expected number of times that the global optimum is found equal to 1), grows polynomially with problem size. So, in one sense, NP-hard problems are solved polynomially. Although this might be the case, we believe that there is more evidence for believing the opposite.

The PCA properties, and the property that local optima closer to the global optimum are on average fitter than further away, are other examples of properties that we believe are local search operator-independent, because these show the structure of the problems, and the relation between good solutions, and it does not matter how these good solutions are found.

The other issue that can affect the analyses in this paper is the isotropy of the landscape. Greenwood and Hu (1998) showed that studying anisotropic landscapes requires some considerations. A landscape is anisotropic if the fitness of solutions in some regions in the landscape are considerably different from other regions. This, for example, happens when some regions in the landscape do not satisfy some problem constraints and are markedly at lower fitness compared to other regions. These low-cost regions are called sink hole regions. For this type of landscape, an analysis that uses a long random walk to estimate ruggedness or correlation does not provide accurate results, as some walks may hit sink holes. For several reasons, we believe that many of our analyses, in terms of isotropy of the landscape, are valid. First, as discussed earlier, we believe such sink holes do not exist in the landscape of TSP. Second, to study the proposed properties, we do not use one single long random walk. We use a large number of local searches, starting from different configurations (although properties like autocorrelation are prone to this problem, and this is why we believe these type of properties are not truly representative of problem hardness). Third, even if there were sink hole regions in the fitness landscape, they would not affect many of the results presented in this paper. Sink holes affect random walks because when they pass over the holes, they observe different ruggedness. Our local search, on the other hand, always chooses the best neighbours; thus it never passes over a sink unless it starts from one. Even if the search starts from a sink, it still has a good chance of escaping from it and finding a local optimum outside the sink.

6  Conclusion

Fitness landscapes are complicated objects that are not easily characterised by a small number of properties. In consequence we have to settle for a drawn-out conclusion. For the sake of clarity, we enumerate some of the most striking properties:

  • For TSP the number of steps to reach a minimum appears to grow linearly for most problem types. However, each step of 3-opt naively takes time to compute (as we potentially have to search all edge triples to find one that can be optimised), so this can be time-consuming. This does not in itself make the problems difficult. What makes the problems difficult is that the local optimum we find is unlikely to be the global optimum.

  • For all problem types we observed that the number of local optima grows exponentially as some function of the size of the instance, but this rate of this growth varied considerably between problem types.

  • In most TSP types that we examined, the fitter local optima tended to have a significantly larger probability of being found than the less fit local optima. Thus, the number of local optima does not necessarily determine the probability of finding the global optimum.

  • The probability of reaching a global optimum decreases exponentially with the system size. This property is correlated with a widening gap between the cost of the global optimum and the expected cost of an optimum found by local search, measured in units of the number of standard deviations in the cost of the optima found by local search.

  • There is a correlation between the fitness of a local optimum and its distance to the global optimum, with fitter solutions closer on average to the less fit solutions. This behaviour appears to scale linearly with the problem size. However, this correlation is dependent on the problem type, and the correlation is especially low for random problems.

  • The difference between the cost of a reference configuration, and the average cost of the set of configurations at a fixed Hamming distance, h, from the reference, is almost exactly linearly correlated with h. This is a result of the fact that the objective function consists of a sum of terms, where each term consists of only a few variables. Thus when we change a small number of variables, we only produce a small change in the cost. This function is almost exactly linear between the cost of the tour we start from and the expected cost of random tour at a Hamming distance of n. This long-range correlation means that there is considerable similarity in the long-range structure of the fitness landscape.

Although it is not the intention of this paper to make a comparison with other problems, it is perhaps worth noting that very similar properties to those described here were also observed in MAX-SAT (Prügel-Bennett and Tayarani, 2011) and graph colouring (Tayarani and Prügel-Bennett, 2015). The current paper adds new texture to our description of fitness landscapes by showing how the properties vary between different problem types. Most of the properties of the landscape that we have analysed are qualitatively similar for all problem types, but they vary significantly quantitatively. In particular, the more random problems tend to have significantly more local optima, with the cost of the local optima being significantly less correlated with its proximity to the global optimum tour. One important lesson of this analysis is that Euclidean TSP is exceptional in the set of problem types we analysed so far in that it is relatively easy to solve. One should therefore be on guard in extrapolating results from Euclidean TSP to more general TSP problems.

A final contribution of this paper was the introduction of PCA to analysing the structure of the fitness landscape. This technique provides a qualitative understanding of the degrees of variability between local optima in terms of the spectrum of eigenvalues. It clearly shows that although there is an exponential number of local optima, there are actually many fewer original ways in which the local optima differ from each other. Again, this is a property that depends on the problem type. We have also seen that PCA can provide a means of visualising the structure of the local optima by allowing a meaningful projection to a low-dimensional subspace. This often reveals considerable clustering in the local optima, which is not transparent otherwise. Although we have shown PCA results for small problem instances, it is not limited to small instances. The complexity of PCA depends linearly on the number of edges that differ between the local optima and cubically on the number of local optima. Thus, provided the number of local optima found is not too large, it is feasible to perform PCA on large problems. We believe that PCA could be used to direct the search of a new algorithm. The usefulness of PCA is clearly problem-dependent, but as we saw in the crane example in Figure 33, the fit optima are often confined to particular clusters. PCA does not require the solutions to be optimal solutions and can just use good solutions returned by an optimiser. Although it may seem expensive to compute many good solutions, on large problems this may be necessary. Finally, it is worth pointing out that PCA is not limited to TSP but can be used on pretty much any problem.

References

Alander
,
J. T.
(
1999
).
Population size, building blocks, fitness landscape and genetic algorithm search efficiency in combinatorial optimization: An empirical study
. In
L. D.
Chambers
(Ed.),
Practical handbook of genetic algorithms: Complex coding systems
, pp.
459
486
.
Boca Raton, FL
:
CRC Press
.
Alander
,
J. T.
,
Zinchenko
,
L.
, and
Sorokin
,
S.
(
2002
).
Analysis of fitness landscape properties for evolutionary antenna design
. In
Proceedings of the IEEE International Conference on Artificial Intelligence Systems
, pp.
363
368
.
Altenberg
,
L.
(
1997
).
Fitness distance correlation analysis: An instructive counterexample
. In
Proceedings of the International Conference on Genetic Algorithms
, pp.
57
64
.
Angel
,
E.
, and
Zissimopoulos
,
V.
(
1998
).
Autocorrelation coefficient for the graph bipartitioning problem
.
Theoretical Computer Science
,
191
:
229
243
.
Arora
,
S.
(
1998
).
Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems
.
Journal of the ACM
,
45
(
5
):
753
782
.
Boese
,
K. D.
(
1995
).
Cost versus distance in the traveling salesman problem
.
(TR=950018) University of California Los Angeles, Computer Science Department
.
Boese
,
K. D.
,
Kahng
,
A.
, and
Muddu
,
S.
(
1994
).
On the big valley and adaptive multi-start for discrete global optimizations
.
Operation Research Letters
,
16
(
2
):
101
113
.
Bouziri
,
H.
,
Mellouli
,
K.
, and
Talbi
,
E.-G.
(
2011
).
The k-coloring fitness landscape
.
Journal of Combinatorial Optimization
,
21
:
306
329
.
Cirasella
,
J.
,
Johnson
,
D. S.
,
McGeoch
,
L. A.
, and
Zhang
,
W.
(
2001
).
The asymmetric traveling salesman problem: Algorithms, instance generators, and tests
. In
Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
, pp.
32
59
.
Collard
,
P.
,
Verel
,
S.
, and
Clergue
,
M.
(
2007
).
Local search heuristics: Fitness cloud versus fitness landscape
.
Retrieved from arXiv:0709.4010
.
Czogalla
,
J.
, and
Fink
,
A.
(
2012
).
Fitness landscape analysis for the no-wait flow-shop scheduling problem
.
Journal of Heuristics
,
18
(
1
):
25
51
.
Fonlupt
,
C.
,
Robilliard
,
D.
, and
Preux
,
P.
(
1997
).
Fitness landscape and the behavior of heuristics
. In
Proceedings of Evolution Artificielle 1997
, pp.
321
329
.
Forrest
,
S.
, and
Mitchell
,
M.
(
1993
).
What makes a problem hard for a genetic algorithm? Some anomalous results and their explanation
.
Machine Learning
,
13
(
2-3
):
285
319
.
Greenwood
,
G. W.
, and
Hu
,
X. S.
(
1998
).
On the use of random walks to estimate correlation in fitness landscapes
.
Computational Statistics and Data Analysis
,
28
(
2
):
131
137
.
Grover
,
L. K.
(
1992
).
Local search and the local structure of NP-complete problems
.
Operations Research Letters
,
12
:
235
243
.
Hains
,
D. R.
,
Whitley
,
L. D.
, and
Howe
,
A.
(
2011
).
Revisiting the big valley search space structure in the TSP
.
Journal of the Operational Research Society
,
62
:
305
312
.
Hertz
,
A.
,
Jaumard
,
B.
, and
de Aragao
,
M. P.
(
1994
).
Local optima topology for k-coloring problem
.
Discrete Applied Mathematics
,
49
:
257
280
.
Horn
,
J.
, and
Goldberg
,
D. E.
(
1995
).
Genetic algorithm difficulty and the modality of fitness landscapes
. In
Proceedings of the Workshop on Foundations of Genetic Algorithms
, pp.
243
270
.
Hornby
,
G. S.
(
1996
).
The recombination operator, its correlation to the fitness landscape and search performance (Unpublished master's thesis)
.
University of Alberta, Edmonton, Canada
.
Hoshino
,
T.
,
Mitsumoto
,
D.
, and
Nagano
,
T.
(
1998
).
Fractal fitness landscape and loss of robustness in evolutionary robot navigation
.
Autonomous Robots
,
5
:
199
213
.
Huang
,
D.
,
Shen
,
Z.
,
Miao
,
C.
, and
Leung
,
C.
(
2009
).
Fitness landscape analysis for resource allocation in multiuser OFDM based cognitive radio systems
.
ACM SIGMOBILE Mobile Computing and Communications Review
,
13
:
26
36
.
Jones
,
T.
, and
Forrest
,
S.
(
1995
).
Fitness distance correlation as a measure of problem difficulty for genetic algorithms
. In
Proceedings of the 6th International Conference on Genetic Algorithms
, pp. 
184
192
.
Lehn
,
R.
, and
Kuntz
,
P.
(
2001
).
A contribution to the study of the fitness landscape for a graph drawing problem
. In
Proceedings of Evo Workshops: Applications of Evolutionary Computing
, pp. 
172
181
.
Lecture Notes in Computer Science
, Vol.
2037
.
Lu
,
G.
,
Li
,
J.
, and
Yao
,
X.
(
2011
).
Fitness-probability cloud and a measure of problem hardness for evolutionary algorithms
. In
Proceedings of the 11th European Conference on Evolutionary Computation in Combinatorial Optimization
, pp.
108
117
.
Martin
,
W.
,
Barker
,
A.
, and
Cohoon
,
J.
(
1999
).
Problem perturbation: Implications on the fitness landscape
. In
Proceedings of the Congress on Evolutionary Computation
, vol.
1
.
Mathias
,
K.
, and
Whitley
,
D.
(
1992
).
Genetic operators, the fitness landscape and the traveling salesman problem
. In
Parallel Problem Solving from Nature
, pp.
219
228
.
Merz
,
P.
(
2004
).
Advanced fitness landscape analysis and the performance of memetic algorithms
.
Evolutionary Computation
,
12
:
303
325
.
Merz
,
P.
, and
Freisleben
,
B.
(
1998
).
Memetic algorithms and the fitness landscape of the graph bipartitioning problem
. In
Parallel Problem Solving from Nature
, pp.
765
774
.
Merz
,
P.
, and
Freisleben
,
B.
(
2000
).
Fitness landscape analysis and memetic algorithms for the quadratic assignment problem
.
IEEE Transactions on Evolutionary Computation
,
4
(
4
):
337
352
.
Pošík
,
P.
, and
Franc
,
V.
(
2007
).
Estimation of fitness landscape contours in EAS
. In
Proceedings of the Conference on Genetic and Evolutionary Computation (GECCO)
, pp.
562
569
.
Prügel-Bennett
,
A.
, and
Tayarani
,
M. H.
(
2011
).
Maximum satisfiability: Anatomy of the fitness landscape for a hard combinatorial optimisation problem
.
IEEE Transactions on Evolutionary Computation
,
16
(
3
):
319
338
.
Qasem
,
M.
, and
Prügel-Bennett
,
A.
(
2010
).
Learning the large-scale structure of the MAX-SAT landscape using populations
.
IEEE Transactions on Evolutionary Computation
,
14
(
4
):
518
529
.
Rardin
,
R. L.
,
Tovey
,
C.
,
Pilcher
,
M.
, and
Pardalos
,
P.
(
1993
).
Analysis of a random cut test instance generator for the TSP
. In
P. M.
Pardalos
(Ed.),
Complexity in Numerical Optimization
, pp.
388
405
.
Singapore
:
World Scientific
.
Ratle
,
A.
(
1998
).
Accelerating the convergence of evolutionary algorithms by fitness landscape approximation
. In
Parallel Problem Solving from Nature
, pp.
87
96
.
Riley
,
J.
, and
Ciesielski
,
V.
(
2010
).
Fitness landscape analysis for evolutionary non-photorealistic rendering
. In
Proceedings of IEEE Congress on Evolutionary Computation
, pp.
1
9
.
Shen
,
L.
, and
He
,
J.
(
2010
).
A mixed strategy for evolutionary programming based on local fitness landscape
. In
Proceedings of the IEEE Congress on Evolutionary Computation
, pp.
1
8
.
Stadler
,
P.
(
1995
).
Towards a theory of landscapes
.
Proceedings of the Conference on Complex Systems and Binary Networks
, pp.
78
163
.
Stadler
,
P.
(
1996
).
Landscapes and their correlation functions
.
Journal of Mathematical Chemistry
,
20
(
1
):
1
45
.
Stadler
,
P. F.
, and
Schnabl
,
W.
(
1992
).
The landscape of the traveling salesman problem
.
Physics Letters A
,
161
(
4
):
337
344
.
Suzuki
,
H.
, and
Iwasa
,
Y.
(
1997
).
GA performance in a Babel-like fitness landscape
. In
Proceedings of the IEEE International Conference on Tools with Artificial Intelligence
, pp.
357
366
.
Tavares
,
J.
,
Pereira
,
F. B.
, and
Costa
,
E.
(
2006
).
The role of representation on the multidimensional knapsack problem by means of fitness landscape analysis
. In
Proceedings of IEEE Congress on Evolutionary Computation
, pp.
2307
2314
.
Tayarani
,
M. H.
, and
Prügel-Bennett
,
A.
(
2013
).
On the landscape of combinatorial optimisation problems
.
IEEE Transactions on Evolutionary Computation
,
18
(
3
):
420
434
.
Tayarani
,
M. H.
, and
Prügel-Bennett
,
A.
(
2015
).
Anatomy of the fitness landscape for graph-colouring problem
.
Swarm and Evolutionary Computation
,
22
(June):
47
65
.
Vanneschi
,
L.
,
Tomassini
,
M.
,
Collard
,
P.
,
Verel
,
S.
,
Pirola
,
Y.
, and
Mauri
,
G.
(
2007
).
A comprehensive view of fitness landscapes with neutrality and fitness clouds
. In
Proceedings of the European Conference on Genetic Programming
, pp.
241
250
.
Lecture Notes in Computer Science
, Vol.
4445
.
Verel
,
S.
,
Collard
,
P.
,
Tomassini
,
M.
, and
Vanneschi
,
L.
(
2007
).
Fitness landscape of the cellular automata majority problem: View from the Olympus
.
Theoretical Computer Science
,
378
:
54
77
.
Verel
,
S.
,
Collard
,
P.
,
Tomassini
,
M.
, and
Vanneschi
,
L.
(
2008
).
Neutral fitness landscape in the cellular automata majority problem
.
Retrieved from arXiv:0803.4240
.
Watson
,
J.-P.
(
2010
).
An introduction to fitness landscape analysis and cost models for local search
. In
M.
Gendreau
and
J.-Y.
Potvin
(Eds.),
Handbook of metaheuristics
, pp.
599
623
.
Weinberger
,
E. D.
(
1990
).
Correlated and uncorrelated fitness landscapes and how to tell the difference
.
Biological Cybernetics
,
63
:
325
336
.
Wiles
,
J.
, and
Tonkes
,
B.
(
2006
).
Hyperspace geography: Visualizing fitness landscapes beyond 4d
.
Artificial Life
,
12
:
211
216
.
Wright
,
S.
(
1932
).
The roles of mutation, inbreeding, crossbreeding, and selection in evolution
. In
Proceedings of Sixth International Congress of Genetics
, pp.
356
366
.
Wu
,
Y.
,
McCall
,
J.
, and
Corne
,
D.
(
2011
).
Fitness landscape analysis of Bayesian network structure learning
. In
Proceedings of the IEEE Congress on Evolutionary Computation
, pp.
981
988
.
Zhang
,
W.
(
2004
).
Configuration landscape analysis and backbone guided local search: Part 1: Satisfiability and maximum satisfiability
.
Artificial Intelligence
,
158
:
1
26
.
Zhang
,
W.
,
Rangan
,
A.
, and
Looks
,
M.
(
2003
).
Backbone guided local search for maximum satisfiability
. In
Proceedings of the International Joint Conference on Artificial Intelligence
, pp.
1179
1184
.