Abstract

The generalized travelling salesperson problem is an important NP-hard combinatorial optimization problem for which metaheuristics, such as local search and evolutionary algorithms, have been used very successfully. Two hierarchical approaches with different neighbourhood structures, namely a cluster-based approach and a node-based approach, have been proposed by Hu and Raidl (2008) for solving this problem. In this article, local search algorithms and simple evolutionary algorithms based on these approaches are investigated from a theoretical perspective. For local search algorithms, we point out the complementary abilities of the two approaches by presenting instances where they mutually outperform each other. Afterwards, we introduce an instance which is hard for both approaches when initialized on a particular point of the search space, but where a variable neighbourhood search combining them finds the optimal solution in polynomial time. Then we turn our attention to analysing the behaviour of simple evolutionary algorithms that use these approaches. We show that the node-based approach solves the hard instance of the cluster-based approach presented in Corus et al. (2016) in polynomial time. Furthermore, we prove an exponential lower bound on the optimization time of the node-based approach for a class of Euclidean instances.

1  Introduction

Evolutionary algorithms and other metaheuristics have been applied to a wide range of combinatorial optimization problems. Understanding the behaviour of metaheuristics on problems from combinatorial optimization is a challenging task due to the large amount of randomness involved in these algorithms.

During the past decade, much progress has been made on the analysis of evolutionary algorithms and ant colony optimization for problems of classical benchmark functions and problems from combinatorial optimization (Auger and Doerr, 2011; Jansen, 2013). Results have been achieved for classical polynomially solvable problems such as sorting, shortest path, minimum spanning trees, and maximum matching as well as for some of the best known NP-hard combinatorial optimization problems such as vertex cover, makespan scheduling, and the travelling salesperson problem (Neumann and Witt, 2010; Theile, 2009).

Furthermore, bio-inspired computing methods have been studied in the context of parameterized complexity (Downey and Fellows, 1999; Kratsch et al., 2010; Kratsch and Neumann, 2013). This approach allows us to study the runtime in dependence of some structural parameters of the given instances and helps us to classify when an instance gets hard for the examined algorithm. Results have been obtained for some of the most prominent NP-hard combinatorial optimization problems such as vertex cover, makespan scheduling (Sutton and Neumann, 2012), and the Euclidean travelling salesperson problem (Sutton et al., 2014). The parameterized analysis has also been used to study the generalized minimum spanning tree problem (GMSTP) and the generalized travelling salesperson problem (GTSP) (Corus et al., 2016). This article aims to investigate the latter problem in more detail.

The GTSP is given by a set of cities with distances between them. The cities are divided into clusters and the goal is to find a tour of minimal cost that visits one city from each cluster exactly once. Hu and Raidl (2008) have presented two hierarchical approaches for solving the GTSP: the cluster-based approach, which uses a permutation on the different clusters in the upper level and finds the best node selection for that permutation on the lower level, and the node-based approach, which selects a node for each cluster and then works on finding the best permutation of the chosen nodes. Combining the two hierarchical approaches, they have also presented a variable neighbourhood search algorithm for solving the GTSP. With this article, we contribute to the theoretical understanding of local search methods and simple evolutionary algorithms based on these hierarchical approaches for GTSP. The analysis on local search methods (based on the conference version, Pourhassan and Neumann, 2015) is presented in Section 3. We investigate the local search methods by presenting instances for which the two approaches mutually outperform each other. We also present a situation where both cluster-based and node-based local search approaches stick to a local optimum, but the combination of the two approaches solves the problem to optimality.

After investigating local search methods, this article extends the conference version (Pourhassan and Neumann, 2015) by investigating simple evolutionary algorithms in Section 4. A (1 + 1) EA using the cluster-based approach is analysed in Corus et al. (2016) by presenting upper and lower bounds for the optimization time of the algorithm. In this article, we show that the worst case instance presented there for the cluster-based approach can be solved in polynomial time by means of the node-based approach; hence, there are instances of the problem which the latter approach can solve more efficiently. Then we provide a lower bound analysis of this approach for the Euclidean generalized travelling salesperson problem.

Showing lower bounds for the Euclidean travelling salesperson problem has been shown to be quite difficult. Englert et al. (2014) have shown that there are instances of the Euclidean TSP for which finding a local optimal solution takes exponential time by means of a deterministic local search algorithm based on 2-opt. In this article, we present a Euclidean class of instances where a simple evolutionary algorithm using the node-based approach requires exponential time with respect to the number of clusters. To our knowledge currently an exponential lower bound for solving TSP by a stochastic search algorithm is available only for ant colony optimization in the non-Euclidean case (Kötzing et al., 2012). Our instance for the GTSP places nodes on two different circles with radius r and r' of a given centre. Exploiting the geometric properties of this instance class, we show by multiplicative drift analysis (Doerr et al., 2012) that the evolutionary algorithm under investigation ends up in a local optimum which has different chosen nodes for almost all clusters. Leaving such a local optimum requires exponential time for many mutation-based evolutionary algorithms and leads to an exponential lower bound with respect to the number of clusters for the investigated algorithm.

The outline of this article is as follows. Section 2 introduces the problem and the algorithms that are subject to our investigations. Our runtime analysis for local search methods and simple evolutionary algorithms are presented in Sections 3 and 4, respectively. Finally, we finish with some concluding remarks in Section 5.

2  Problem and Algorithms

The GTSP is a combinatorial optimization problem with applications in routing, design of ring networks, sequencing of computer files, and manufacture planning (Gutin and Punnen, 2007). The input is given by a complete undirected graph G=(V,E,c) with cost function c:ER+ on the edges and a partitioning of the set of nodes V into m clusters V1,V2,,Vm such that V=i=1mVi and ViVj= for ij. The aim is to find a tour of minimum costs that contains exactly one node from each cluster.

A candidate solution for this problem consists of two parts. The set of spanning nodes, P=p1,,pm where piVi, and the permutation of the clusters, π=(π1,,πm), which makes a Hamiltonian cycle on G[P]=G(P,{eEeP},c). Here, G[P] is the subgraph induced by P consisting of all nodes in P and all edges between them. Following Hu and Raidl (2008), we represent a candidate solution as S=(P,π). Let pπi be the chosen node for cluster Vπi, 1im. Then the cost of a solution S=(P,π) is given by c(S)=c(pπm,pπ1)+i=1m-1c(pπi,pπi+1).

There are two hierarchical approaches for solving this problem (Hu and Raidl, 2008), the cluster-based approach and the node-based approach. In the former, an upper-level algorithm searches for finding the best permutation of clusters, while a lower-level algorithm finds the optimal spanning node set. In the two levels of node-based approach, these tasks are swapped. In the following, we describe four algorithms that make use of these two hierarchical approaches. We analyse these algorithms with respect to the (expected) number of iterations on the upper level, until they have found an optimal solution and call this the (expected) optimization time of the algorithms.

2.1  Cluster-Based Local Search

In the cluster-based approach, constructing the permutation of clusters constitutes the upper level and the node selection is done in the lower level (Hu and Raidl, 2008). Let π=(π1,,πm) be a permutation of the m clusters. The 2-opt neighbourhood of π is given by
N(π)={π'π'=(π1,,πi-1,πj,πj-1,,πi,πj+1,,πm),1i<jm}.

The cluster-based local search (CBLS) algorithm working with this neighbourhood structure, given in Algorithm 1, starts with an initial permutation of clusters. At each step, a new permutation π' is selected from the 2-opt neighbourhood of π, the current permutation of clusters. Then the lower level uses a shortest path algorithm to find the best spanning node set. Hu and Raidl (2008) have applied an incremental bidirectional shortest path calculation for this purpose. The shortest path algorithm of Karapetyan and Gutin (2012) is another option, which is an improved version of dynamic programming algorithm given in Fischetti et al. (1997) for finding an optimal set of spanning nodes for a given permutation in time O(n3). The new solution S'=(P',π') replaces the old one if it is less costly, and the algorithm terminates if no better solution can be found in the 2-opt neighbourhood of π.

graphic

2.2  Node-Based Local Search

In the node-based approach (Hu and Raidl, 2008), selection of the spanning nodes is done in the upper level and the lower level consists of finding a shortest tour on the spanning nodes. Given a spanning nodes set P, in the node-based local search algorithm, the upper level performs a local search based on the node exchange neighbourhood N'(P), which is defined as
N'(P)={P'P'={p1,,pi-1,pi',pi+1,,pm},pi'Vi{pi},1im}.

Note that the lower level involves solving the classical TSP; it therefore poses in general an NP-hard problem on its own. For our theoretical investigations, we consider two algorithms: NEN-LS (node exchange neighbourhood local search) and NEN-LS*, presented in Algorithms 2 and 3, respectively. NEN-LS computes a permutation on the lower level using 2-opt local search and is therefore not guaranteed to reach an optimal permutation π for a given spanning node set P. NEN-LS* uses an optimal solver to find an optimal permutation π for a given spanning node set P. Such a permutation can be obtained in time O(m22m) using dynamic programming (Held and Karp, 1961) and is practical if the number of clusters is small. We use NEN-LS* and show where it gets stuck in local optima even if the travelling salesperson problem on the lower level is solved to optimality.

graphic

graphic

NEN-LS and NEN-LS* start with a spanning node set P and search for a good or optimal permutation with respect to P. Then each solution P'N'(P) together with its permutation π' is considered and S'=(P',π') replaces the current solution S=(P,π) if it is of smaller cost. Both algorithms terminate if there is no improvement possible in the neighbourhood N'(P) of the current solution P.

2.3  Variable Neighbourhood Search

Now we describe the combination of two approaches into variable neighbourhood search, which is introduced in Hu and Raidl (2008). Two neighbourhood structures of CBLS and NEN-LS are used in this algorithm, where the NEN-LS neighbourhood is used only when the algorithm is in a local optimum with respect to the CBLS neighbourhood.

Let S=(P,π) be a solution to the GTSP. We define the two neighbourhoods N1 and N2 based on the 2-opt neighbourhood N and the node exchange neighbourhood N' as

  • N1(S)={S'=(P',π')π'N(π),P'= optimal set of nodes with respect to π'},

  • N2(S)={S'=(P',π')P'N'(P),π'= order of clusters obtained by 2-opt from π on G[P']}.

Combining the two local searches of the cluster-based approach and the node-based approach is done by alternating between N1 and N2. Since the computational complexity of finding P' for solutions in neighbourhood N1 is lower than that of finding π' for solutions in neighbourhood N2, the first neighbourhood to search is N1. When a local optimum has been found with respect to that neighbourhood, then N2 is searched. The resulting variable neighbourhood search (VNS) algorithm is given in Algorithm 4.

graphic

2.4  Node-Based (1 + 1) EA

In the node-based approach, selecting the spanning nodes is done in the upper level and the corresponding shortest Hamiltonian cycle is found in the lower level. The node-based (1 + 1) EA is presented in Algorithm 5. In contrast to node-based local search algorithm of Section 2.2, the upper level uses the (1 + 1) EA to search for the best spanning set instead of a local search method; hence, more than one change on the spanning set is possible on the upper level, at each iteration of the algorithm. The condition for accepting the new solution is a strict improvement.

graphic

Note that the lower level consists of an NP-hard problem; hence, when showing polynomial upper bounds on the expected optimization time of this algorithm, we consider only instances where the lower level can be solved in polynomial time. For the general case, there exist very effective solvers for TSP such as Concorde (Applegate et al., 2013), that can be used in the lower level. Note that the lower level does not need to solve an NP-hard problem in the cluster-based approach. Nevertheless, we prove that there are instances that can be solved in polynomial time with the node-based (1 + 1) EA, while the cluster-based (1 + 1) EA (Corus et al., 2016) needs exponential time to find an optimal solution for them.

3  Local Search Methods

This section presents the analysis on the behaviour of the local search methods. Cluster-based and node-based local search algorithms and also the variable neighbourhood search algorithm presented in Sections 2.1, 2.2, and 2.3 are investigated in this section.

3.1  Benefits of NEN-LS

In this section, we present an instance of the problem that cannot be solved by CBLS. In contrast to this, NEN-LS finds an optimal solution in polynomial time.

We consider the undirected complete graph, G1=(V,E) which is illustrated in Figure 1. The graph has n nodes and 6 clusters Vi, 1i6. Cluster V1 contains n/12 white and n/12 grey nodes. We denote by V1W the subset of white nodes and by V1G the subset of grey nodes of cluster V1. Each other cluster Vj, 2j6, consists of n/6 white nodes. The node set V=i=16Vi of G1 consists of nodes of all clusters. For simplicity, Figure 1 shows only one node for each group of similar nodes with similar edges in the figure. The edge set E consists of 4 types of edges which we define in the following.

  • Type A: Edges of this type have a cost of 1. All edges between clusters 2 and 3, and between clusters 4 and 5 and also between clusters 6 and 1, are of this type.
    A={{vi,vj}(viV1WV1GvjV6)(viV2vjV3)(viV4vjV5)}.
  • Type B: Edges of this type have a cost of 3. All edges connecting the nodes of clusters 1 to 2 are of this type. So are the edges that connect nodes of clusters 3 to 4 and clusters 5 to 6.
    B={{vi,vj}(viV1WV1GvjV2)(viV3vjV4)(viV5vjV6)}.
  • Type C: Edges of this type have a cost of 4. All edges between nodes of clusters 2 and 5 and also between clusters 3 and 6 are of this type. All edges that connect white nodes of the first cluster to nodes of the fourth cluster are also of this type.
    C={{vi,vj}(viV1WvjV4)(viV2vjV5)(viV3vjV6)}.
  • Type D: Edges of this type have a large cost of 100. All edges other than those of type A or B or C in this complete graph, including the edges between grey nodes of the first cluster and the nodes of the fourth cluster, are of Type D.
    D=E{ABC}.

Figure 1:

G1, an easy instance for NEN-LS and a hard instance for CBLS.

Figure 1:

G1, an easy instance for NEN-LS and a hard instance for CBLS.

We say that a permutation π=(π(1),,π(n)) visits the cities in consecutive order iff π(i+1)=(π(i)modn)+1, 1in and say that π=(π(1),,π(n)) visits the cities in reverse-consecutive order iff π(i)=(π(i+1)modn)+1, 1in.

We now define a property, and then in Theorem 2 we analyse the behaviour of CBLS on G1.

Property 1:

For the instance G1, each solution visiting the clusters in consecutive or reverse-consecutive order is optimal.

Proof:

The graph consists of 6 clusters which implies that 6 edges are needed for a tour. The least costly edges are of type A, which are available only between 3 pairs of clusters. The second least costly type of edge is B with weights of 3. This implies that no tour can be less costly than 3·1+3·3=12, which is the cost of every solution with a permutation in consecutive or reverse-consecutive order.

Theorem 2:

Starting with the solution consisting of only white nodes and the permutation π=(1,4,5,2,3,6), CBLS is not able to achieve any improvement.

Proof:

Here we analyse the behaviour of CBLS on G1, starting with all white nodes and a permutation π=(1,4,5,2,3,6). The initial solution contains three type-A edges of cost 1 and three type-C edges of cost 4. This implies a total cost of 15 which is not optimal. The edges belonging to this tour are marked solid in Figure 1. We claim that this solution is locally optimal; that is, cannot be improved by a 2-opt step.

When a 2-opt move is performed, depending on the different types of edges that are removed from the current tour, we show that the resulting tours have costs greater than 15.

Note that all 3 edges of cost 1 are already used in the current permutation which means that no additional edge of cost 1 can be added. We inspect the different 2-opt steps with respect to the edges that are removed.

  • If two edges of type A which have cost of 1 are removed, two other edges need to be added and the least costly edges that can be added have a weight of 3. This makes the total cost of the resulting solution to be at least 15-2·1+2·3=19 which is greater than 15.

  • If one edge of type A (weight 1) and one edge of type C (weight 4) are removed, again with the minimum two edges of cost 3 that are added, the total cost is at least 15-1-4+2·3=16 which is greater than 15.

  • For removing two edges of type C, there are three options, listed below. In all of them, the operation adds two edges of type D to the solution, making the total cost greater than 15.

    • Remove the edge between clusters 1 and 4 and also the edge between clusters 2 and 5. This 2-opt results in permutation π'=(1,5,4,2,3,6).

    • Remove the edge between clusters 1 and 4 and also the edge between clusters 3 and 6. This 2-opt results in permutation π'=(1,3,2,5,4,6).

    • Remove the edge between clusters 2 and 5 and also the edge between clusters 3 and 6. This 2-opt results in permutation π'=(1,4,5,3,2,6).

We have shown that no 2-opt step is accepted, which completes the proof.

In contrast to the negative result for CBLS, we show that NEN-LS is able to reach an optimal solution when starting with the same solution.

Theorem 3:

Starting with π=(1,4,5,2,3,6), NEN-LS finds an optimal solution for the instance G1 in expected time O(n).

Proof:

Starting with a solution with only white nodes and the permutation of π=(1,4,5,2,3,6), no improvement can be found by a 2-opt local search (similar to the arguments in the proof of Theorem 2). Therefore, the lower level is already locally optimal and the solution does not change unless a grey node in cluster V1 is selected.

Let P={p1,,p6} be the current set of spanning nodes. Selecting a grey node p1' for cluster V1 leads to the set of spanning nodes P'={p1',p2,,p6}. P' in combination with the the current permutation π=(1,4,5,2,3,6) has a total cost of 111 as there is one edge of type D with cost 100. We now show that starting from this solution and performing a 2-opt local search on the lower level results in an optimal solution.

In order to accept a new permutation on the lower level a solution of cost at most 111 has to be obtained. We do a case distinction according to the different types of edges that are removed in a 2-opt operation. If we remove only edges of types A and C, we reach a solution with a total cost of greater than 111 using the arguments in the proof of Theorem 2. Hence, we need to consider only the case where at least one edge of type D is removed.

  • There are two possibilities of removing one edge of type D and one edge of type C leading to the permutations π'=(1,5,4,2,3,6) and π''=(1,3,2,5,4,6). Both have two edges of type D which implies a total cost of greater than 111 and are therefore rejected.

  • Considering the case of removing the edge of type D and one of the edges of type A, the only applicable 2-opt move leading to a different permutation results in the permutation π'=(1,2,5,4,3,6). The resulting solution has cost 16 and is therefore accepted.

After reaching permutation π'=(1,2,5,4,3,6), the only acceptable 2-opt move leads to the global optimum πopt=(1,2,3,4,5,6).

The 2-opt neighbourhood for this instance has a constant size, as the number of clusters is constant. Moreover, all permutations that were investigated in the lower level were either locally optimal with respect to the spanning nodes, or were improved only twice. Therefore, each lower-level optimization is done in constant time. Furthermore, it takes expected time O(n) on the upper level to select a grey node for the first cluster. As a result, the expected optimization time is bounded by O(n).

3.2  Benefits of CBLS

We now introduce an instance where NEN-LS* with a random initial solution finds it hard to obtain an optimal solution, while CBLS with an arbitrary starting solution obtains an optimum in polynomial time. The instance G2=(V,E) is illustrated in Figure 2. There are m clusters where m>2, and all the clusters contain only 2 nodes; one white and one black. We refer to the white and black nodes of cluster i, 1im, by viW and viB, respectively. We call cluster V1 the costly cluster as edges connecting this cluster to others are more costly than edges connecting other clusters together. The edge set E of this complete graph is partitioned into 4 different types.

  • Type A: Edges of this type have a weight of 1. All connections between white nodes of different clusters except cluster V1 are of this type.
    A={{viW,vjW}2i,jm}.
  • Type B: Edges of this type have a weight of 2. All connections between black nodes of different clusters are of this type.
    B={{viB,vjB}1i,jm}.
  • Type C: Edges of this type have a weight of m. All edges between white nodes of the costly cluster and white nodes of other clusters are of this type.
    C={{v1W,viW}2im}.
  • Type D: Edges of this type have a weight of m2. All edges between a white and a black node are of this type.
    D=E{ABC}={{viW,vjB}1i,jm}.

Figure 2:

Graph G2.

Figure 2:

Graph G2.

We first claim that the optimal solution consists of only black nodes. Then, we bring our main theorems on the runtime behaviour of solving this instance with the two mentioned approaches.

Property 4:

For the graph G2 any solution containing all black nodes is optimal.

Proof:

A solution that contains only black nodes has m edges of type B and therefore total cost of 2m.

Choosing a combination of black and white nodes implies a connection of type D and therefore a solution of cost at least m2. Choosing all white nodes implies 2 edges of cost m connected to cluster V1 and m-2 edges of cost 1. Hence, the total cost of such a solution is 2m+(m-2) which implies that a solution selecting all black nodes is optimal.

We now show that CBLS always finds an optimal solution due to selecting optimal spanning nodes in time O(n3).

Theorem 5:

Starting with an arbitrary permutation π, CBLS finds an optimal solution for G2 in time O(n3).

Proof:

As mentioned in Property 4, visiting black nodes of the graph in any order is a globally optimal solution. For each permutation π, the optimal set of nodes is given by all black nodes and found when constructing the first spanning node set. This set is constructed in time O(n3) by the shortest path algorithm given in Karapetyan and Gutin (2012).

In contrast to the positive result for CBLS, NEN-LS* is extremely likely to get stuck in a local optimum if the initial spanning node set is chosen uniformly at random. Note that NEN-LS* is even using an exact solver for the lower level.

Theorem 6:

Starting with a spanning node set P chosen uniformly at random, NEN-LS* gets stuck in a local optimum of G2 with probability 1-e-Ω(n).

Proof:

Selecting P={p1,,pm} uniformly at random, the expected number of white nodes is n2. Using Chernoff bounds, the number of white nodes is at least n/4 with probability 1-e-Ω(n). The same applies to the number of black nodes.

Since connecting white nodes to black nodes is costly, the lower level selects a permutation which forms a chain of white nodes and a chain of black nodes connected to each other by only two edges of type D to form a cycle.

Let p1 be the selected node of the costly cluster V1. If p1 is initially white, the lower level places it at one border between the black chain and the white chain to avoid using one of the edges of type C. This situation is illustrated in Figure 3a. If p1 is initially black, then the initial solution would look like Figure 3b, in which the costly cluster is placed somewhere in the black chain. Here we present two auxiliary claims, which will be used in the rest of the proof of Theorem 6.

Claim 7:Starting with a random initial solution, with probability 1-e-Ω(n) for all the clusters Vi,2im; a change from black to white is improving while no change from white to black is improving.

Proof:

As mentioned earlier, a random initial node set has both kinds of nodes with probability 1-e-Ω(n); therefore, the exact solver of the lower level forms a chain of black nodes and a chain of white nodes. Changing a black node pi,i1 to white results in shortening the chain of black nodes by removing an edge of type B and cost 2, while the chain of white nodes gets longer by adding an edge of type A and cost 1. The new solution is hence improved in terms of fitness and accepted by the algorithm. On the other hand, the opposite move increases the cost of the solution; therefore in a cluster Vi,i1 a change from white to black is not accepted.

The number of selected white nodes for clusters Vi,i1 never decreases; therefore, at all time during the run of the algorithm we have both chains of black nodes and white nodes, until all the black nodes change to white.

Claim 8:As long as there is at least one cluster Vi, i1 for which the black node is selected, a change from white to black is accepted for cluster V1 and the opposite change is rejected.

Proof:

Since there is at least one cluster Vi,i1, for which the black node is selected, we know that the current solution and the new solution both have a chain of black nodes and a chain of white nodes. If the white node of cluster V1 is selected in the current solution, changing it to black shortens the chain of white nodes by removing the edge of type C and increases the number of black nodes by adding an edge of type B. This move is accepted because the new solution is improved in terms of cost. The result is illustrated in Figure 3b. Using similar arguments, if the black node of cluster V1 is selected in the current solution, changing it to white is rejected because it increases the cost.

Using Claim 7 we can conclude that all nodes pi,i1 are gradually changed to white in NEN-LS. As long as at least one node pi,i1 is black, a mutation from white to black for p1 is accepted, and this node remains black. When all other nodes are changed to white, if p1 is black at this point, it is connected to two white nodes with edges of type D and cost m2 as illustrated in Figure 4a. If it changes to white, these two edges are removed and two edges of type C and cost m are added to the solution (Figure 4b). This change is accepted because two edges of cost m are less costly than two edges of cost m2.

This eventually results in a local optimum with all white nodes selected. The algorithm needs to traverse the clusters on the upper level only twice which gives O(m) iterations on the upper level for the algorithm to get stuck in a local optimum. In the first traversal, for all the clusters the white node will be selected except for the costly cluster, V1. In the second traversal, the white node will be selected for V1 as well (Figure 4b). This completes the proof of Theorem 6.

Figure 3:

The initial solution for G2 if (a) A white node is selected for the costly cluster. (b) A black node is selected for the costly cluster.

Figure 3:

The initial solution for G2 if (a) A white node is selected for the costly cluster. (b) A black node is selected for the costly cluster.

Figure 4:

(a) All other clusters change to white one by one. (b) Local optimum for G2.

Figure 4:

(a) All other clusters change to white one by one. (b) Local optimum for G2.

3.3  Benefits of VNS

In this section we introduce an instance of the problem for which both of the mentioned neighbourhood search algorithms fail to find the optimal solution. Nevertheless, the combination of these approaches as described in Algorithm 4 finds the global optimum.

We consider the undirected complete graph G3 shown in Figure 5 which has 6 clusters each containing n/6 nodes. There are three kinds of nodes in this graph: white, grey, and black. The first cluster consists of n/12 black, n/24 white, and n/24 grey nodes. All other clusters contain n/12 white and n/12 black nodes. We refer to the set of white, black, and grey nodes of cluster Vi by ViW, ViB, andViG, respectively.

Figure 5:

Graph G3 showing one node of each type for each cluster and omitting edges of cost 100.

Figure 5:

Graph G3 showing one node of each type for each cluster and omitting edges of cost 100.

There are 5 types of edges in this graph, 4 of which are quite similar to the 4 types of the instance in Section 3.1. The other type, named type D below, includes the edges between two consecutive black nodes with a cost of 1.5.

  • Type A: Edges of this type have a cost of 1.
    A={{vi,vj}(viV1WV1GvjV6W)(viV2WvjV3W)(viV4WvjV5W)}.
  • Type B: Edges of this type have a cost of 3.
    B={{vi,vj}(viV1WV1GvjV2W)(viV3WvjV4W)(viV5WvjV6W)}.
  • Type C: Edges of this type have a cost of 4.
    C={{vi,vj}(viV1WvjV4W)(viV2WvjV5W)(viV3WvjV6W)}.
  • Type D: Edges of this type have a cost of 1.5.
    D={{vi,vj}(viVkBvjV(k+1)B1k5)(viV6BvjV1B)}.
  • Type F: Edges of this type have a large cost of 100. All edges other than those of type A or B or C or D in this complete graph are of Type F. Note that the edges between grey nodes of the first cluster and the white nodes of the fourth cluster are also of this type.
    F=E{ABCD}.

We now show that an optimal solution visits a black node from each cluster in consecutive or reverse-consecutive order. Then in Theorem 8, we show that the algorithms CBLS and NEN-LS may get stuck in local optimums.

Property 9:

The optimal solution for the graph G3 is visiting all black nodes with the consecutive or reverse-consecutive order.

Proof:

There are three kinds of nodes in this graph; white, grey, and black. Any solution that contains black and one other kind of node has at least two edges of type F and weight 100 which makes the total cost of that solution more than 200. A solution that visits all black nodes in consecutive or reverse-consecutive order has 6 edges of type D and a total cost of 9. On the other hand, if we consider only white and grey nodes, our graph is the same as the instance of Section 3.1 with the optimal solution of cost 12. Therefore, visiting all black nodes with the cost of 9 is the optimal solution.

Theorem 10:

Starting with a spanning node set P consisting of only white nodes and the permutation π=(1,4,5,2,3,6), CBLS and NEN-LS get stuck in a local optimum of G3.

Proof:

We first show that the mentioned initial solution is a local optimum for CBLS. The cost of this solution is 15 which is less than any of the edges between black nodes and white or grey nodes. Therefore, any solution consisting of black and another kind of node cannot be accepted. If we do not consider the black nodes and their edges, then G3 is similar to G1, and according to Theorem 2, starting with the initial permutation, no improvements can be achieved with Algorithm 1. Particularly, permutation π'=(1,2,3,4,5,6) is not achievable by searching the 2-opt neighbourhood of the initial solution. A solution consisting of black nodes is less costly only if they are visited in the optimal order of π'=(1,2,3,4,5,6) which is proved not to be achievable by CBLS.

Now we investigate the behaviour of NEN-LS which performs a local search based on the node-based approach for this instance. We show that this algorithm finds another locally optimal solution. Starting with the initial solution that is specified in the theorem, all black nodes cannot be selected in one step and trying any one of the black nodes is rejected because using two edges of type F are inevitable which makes the solution worse than the initial solution. The only spanning node set left in the NEN has the grey node of the first cluster. For this selection of nodes, the 2-opt TSP solver of the lower level finds the optimal order of clusters similar to what we described in the proof of Theorem 3 of Section 3.1 which form a solution of cost 12. From this point any node-exchange-neighbourhood search fails to find a better solution.

Using a variable-neighbourhood search that combines the two hierarchical approaches, we are able to escape these local optima. In the following, we show that VNS obtains an optimal solution when starting with the same solution as investigated in Theorem 8.

Theorem 11:

Starting with a spanning node set P consisting only of white nodes and the π=(1,4,5,2,3,6), VNS obtains an optimal solution in time O(n3).

Proof:

This approach is supposed to start with the cluster-based algorithm and alternate between the two algorithms whenever CBLS is stuck in a locally optimal solution. As we saw from the initial solution, Algorithm 1 cannot find any better solutions because the initial solution is a local optimum for that algorithm. Finding this out requires searching all the 2-opt neighbourhood which can be done in constant time because the number of clusters is fixed. Then NEN-LS manages to find another solution with the permutation of π'=(1,2,3,4,5,6). This can also be done in polynomial time as we described in Theorem 3 of Section 3.1. Then CBLS uses this as a starting solution. As π'=(1,2,3,4,5,6) is an optimal permutation the optimal set of nodes P consisting of all black nodes is found in time O(n3) on the lower level.

The investigations of this section have pointed out that a combination of the two hierarchical approaches into a variable neighbourhood search is beneficial because each approach helps escape local optimum of the other approach.

4  Simple Evolutionary Algorithms

A simple evolutionary algorithm with the cluster-based approach for solving GTSP has been studied in Corus et al. (2016) and a hard instance is presented there to prove the exponential lower bound on the runtime of that algorithm which holds with high probability. In this section, we analyse the behaviour of node-based (1 + 1) EA presented in Algorithm 5 on that instance (Section 4.1). Moreover, we find a lower bound for optimisation time of node-based (1 + 1) EA in Section 4.2. Our analysis gives an exponential lower bound on the optimization time of the upper level, which implies exponential time even if the lower level is solved efficiently.

4.1  Behaviour of Node-Based (1 + 1) EA on the Hard Instance of Cluster-Based (1 + 1) EA

In this section, we show that the hard instance for cluster-based (1 + 1) EA introduced in Corus et al. (2016) can be solved in polynomial time by the node-based approach. Moreover, we perform experiments in Section 4.1.2, which confirm the theoretical results of this section.

The hard instance of cluster-based (1 + 1) EA (Corus et al., 2016) is illustrated in Figure 6. In this instance, there are m clusters and all of them comprise two nodes; a white node which represents the suboptimal node, and a black node which is the optimal node. All the white nodes are connected to each other with edges of cost 1, except for the white nodes of consecutive clusters (shown in the picture) which are connected with edges of cost 2. All the edges between a black node and a white node have a cost of m2. All edges between black nodes also have a cost of m2, except the ones that connect consecutive clusters (shown in the picture) which have a small cost of 1m.

Figure 6:

Illustration of GG, hard instance of cluster-based (1 + 1) EA (Corus et al., 2016).

Figure 6:

Illustration of GG, hard instance of cluster-based (1 + 1) EA (Corus et al., 2016).

The optimal node selection is to select all the black nodes and the optimal permutation of clusters is a clockwise or anti-clockwise order of them. The cost of edges between black nodes and white nodes in this permutation are 1m and 2, respectively. Therefore, the optimal solution will consist of all 1m edges and it is shown that the local optimum is selecting all white nodes in an order which does not have any of the 2-weighted edges. For this instance of the problem, it is proved in Corus et al. (2016) that with an overwhelmingly high probability, the proposed cluster-based (1 + 1) EA needs exponential time to find the optimal solution.

4.1.1  Theoretical Analysis of Node-Based (1 + 1) EA on GG

Here we prove that, with probability 1-o(1), GG can be solved in polynomial time by the node-based approach. We call this probability a high probability, since by definition, o(1) approaches 0 when the input size approaches infinity. In order to prove this, we first need to analyse how an optimal TSP tour can be found on the lower level of this approach. Although solving TSP in general is NP-hard, it can be solved in polynomial time for the instances induced by picking one node of each cluster of the graph GG. Algorithm 6 provides such a method. In Step 3 of this algorithm, if the number of white nodes is at most 3, finding the shortest path can be done by checking all configurations. If the number of white nodes is more than 3, only edges of cost 1 will be used in the shortest path since all white nodes are connected to m-2 other white nodes with a cost of 1. Finding this can be done by a depth-first-search and checking all configurations of connecting the last 4 nodes of the path. Therefore, Step 3 needs time O(m) to find the shortest path on white nodes. Since the required time for other steps of the algorithm is also at most O(m), we can conclude that Algorithm 6 runs in time O(m).

graphic

To prove that Algorithm 6 finds the optimal tour with respect to the spanning set fixed on the upper level, we first present two properties on the solutions of the lower level. Then, in Lemma 12 we show that Algorithm 6 finds the optimal tour.

Property 12:

Let w be the number of white nodes selected on the upper level. If 2w3 and all the selected white nodes are from consecutive clusters; then Step 3 of Algorithm 6 uses one edge of cost 2 (and one edge of cost 1 in case w=3). Otherwise, it uses only edges of cost 1.

Property 13:

Let C(S) denote the total cost of a solution S. Also, let Y and X be two solutions with r and s edges of weight m2, respectively. If r>s, then we have C(Y)>C(X).

Lemma 14:

Let w>0 and r=m-w be the number of white and black nodes selected on the upper level, respectively. Moreover, let s be the number of black nodes where the selected node in proceeding cluster with respect to the optimal solution is also black. Algorithm 6 finds an optimal tour with total cost of

  • s·1m+(m-r-1)+(r-s+1)·m2; if conditions of Property 12 do not hold,

  • s·1m+(m-r)+(r-s+1)·m2; if conditions of Property 12 hold.

Proof:

There are r black nodes in the spanning set; therefore, in order to make a Hamiltonian cycle, at least r+1 edges that are connected to these nodes are required. Since all edges connected to black nodes, except for s edges of cost 1/m are of cost m2, at least r+1-s edges of cost m2 are needed, and refusing to select any of edges of cost 1/m increases this number. Moreover, according to Property 13, the optimal solution of the lower level has a minimum number of m2-edges. Therefore, the lower level has to select all edges of cost 1/m, which is done in Step 2 of the algorithm.

On the other hand, in order to minimise the number of white-black connections which are of cost m2, all white nodes need to form one chain which is done is Step 3 of the algorithm. This chain will be connected to two black nodes from its two ends. If conditions of Property 12 do not hold, then only edges of cost 1 are used in forming the white chain. Otherwise, one edge of cost 2 is also required. Therefore, the cost of forming the white chain is m-r-1 in the former case, and m-r in the latter case.

So far, we have formed some chains of black nodes and one chain of white nodes. In order to connect these chains together, we have to use r+1-s edges of weight m2, which is done in Step 4 of the algorithm. Summing up, the optimal tour on the selected set of nodes consists of s edges of weight 1m and r-s+1 edges of weight m2. Furthermore, if conditions of Property 12 hold, it contains m-r-2 edges of cost 1 and one edge of cost 2; otherwise, it contains m-r-1 edges of cost 1. All together, these edges give the total cost as stated in the lemma.

From now on, we consider only the number of iterations on the upper level. Note that the lower level uses Algorithm 6 which adds only a factor of O(m) to our analysis. We start analysing the behaviour of node-based (1 + 1) EA on GG with a couple of definitions that help us in describing a TSP tour that the lower level forms. In the following, w denotes the number of white nodes in the solution.

Definition 15:

A black block of size l, l>0 an integer, is a path on exactly l consecutive black nodes, which consists of l-1 edges of cost 1/m.

The two end nodes of a black block are connected to edges of cost m2. Black blocks of size 1, 2, and 3 nodes are illustrated in Figure 7.

Figure 7:

Blocks of black nodes.

Figure 7:

Blocks of black nodes.

Definition 16:

A solution is critical if 3w4 and all the selected white nodes are from consecutive clusters.

Note that a one-bit flip on a white node of a critical solution results in either a solution with a greater number of black blocks, or a solution that fulfils conditions of Property 12. In the rest of this section, we prove that with high probability, in time O(m2) the algorithm either finds the optimal solution, or reaches a critical solution. From a critical solution, we prove that a 2-bit flip can make an improvement, and with high probability, in time O(m2logm) the optimal solution is found. Lemmata 17 and 20 prove the upper bound if we do not face a critical solution, and Lemma 21 investigates the behaviour of the algorithm, otherwise. Lemmata 15 and 16 help us with the proof of Lemma 17.

Lemma 17:

Other than a situation where w=3 and all selected white nodes are from consecutive clusters, w can only increase in a step in which the number of m2-edges decreases.

Proof:

Having r=m-w black nodes, Lemma 12 gives the total cost of a solution as s·1m+(m-r-1)+(r-s+1)·m2 when conditions of Property 12 do not hold. Here s is the number of edges of weight 1m, m-r-1 is the number of edges of weight 1 (that connect white nodes), and (r-s+1) is the number of edges of weight m2. When w increases, the number of edges of weight 1 increases, and since the total number of edges stay the same, either s has to decrease or r-s+1. Decreasing s cannot compensate the increase in the total cost that is caused by adding new edges of weight 1. Therefore, in order to prevent an increase in the total cost, r-s+1, which is the number of m2-edges, has to decrease.

For the situation where w=2 and the selected white nodes are from consecutive clusters, according to Lemma 12 the total cost is m-3m+2+2·m2. Observe that all solutions with w4 have a larger cost. For w=3, the solution either needs more than 2 m2-edges, which is clearly more costly, or all three white nodes need to be from consecutive clusters. In this situation, Lemma 12 gives the total cost as m-4m+3+2·m2 which is also more costly and rejected by the algorithm.

Lemma 18:

In a phase of Cm2 steps, C a constant, if we do not face a critical solution, with probability 1-o(1), the sum of all increments on the number of white nodes is at most 5m.

Proof:

From Lemma 15 we know that the number of white nodes can increase only when the number of blocks reduces. Since the number of blocks is bounded by m, this can happen in at most only m steps. At each of those steps, either two black blocks are merged or a black block mutates to white, and some additional nodes may also mutate. We here prove that with high probability no blocks of size larger than 3 mutate in this phase, which results in at most 3m white to black mutations. We also prove that the number of additional nodes that mutate at the same steps is with high probability bounded by 2m. Therefore, we find that in this phase, the sum of all increments on the number of white nodes is at most 5m.

At each step, each cluster is selected for a mutation with probability 1m, and its white node is selected with probability 12. Therefore, the probability that a block of size at least 4 mutates to white in one step is at most 1(2m)4. Since the number of blocks is bounded by m, the probability that at least one block of size at least 4 mutates to white at one step is at most 116m3. Hence, the probability that at least one of them mutates in a phase of C·m2 steps is O(1m). Therefore, with probability at least 1-o(1), no black block of size 4 or more mutates to white. In other words, all blocks that mutate to white in a phase of C·m2 steps are of size at most 3. This implies that at most 3m nodes can belong to the blocks that mutate from black to white in a phase of C·m2 steps.

However, at each step that the number of blocks is reduced, some additional nodes may also mutate to white. Let Xij be a random variable such that Xij=1 if node j is selected for mutation at step i. Note that we need to consider only the steps in which the number of black blocks is reduced because according to Lemma 15 a mutation from black to white is not accepted in other steps. Since the number of blocks is bounded by m, there are at most m steps in which the number of blocks reduce. The expected value of X=i=1mj=1mXij is E[X]=i=1mj=1m1m=m and by Chernoff bounds we get Prob(X2m)e-Ω(m). Therefore, with probability 1-e-Ω(m) at most 2m additional nodes mutate during the steps at which the number of black blocks is reduced, and with probability (1-o(1))(1-e-Ω(m))=1-o(1) at most 2m additional nodes mutate during the considered phase. As a result, together with at most 3m black to white mutations, we find that with probability 1-o(1) at most 3m+2m black nodes mutate to white in a phase of C·m2 steps.

Lemma 19:

If we do not face a solution with no black nodes or a critical solution, then with probability 1-o(1), in time 24em2 a solution with w=0 is found.

Proof:

According to Lemma 16, with probability 1-o(1) during a phase of C·m2 steps, C a constant, at most 5m black nodes turn into white. Since the number of white nodes in the initial solution is at most m, at most 6m steps of increasing the number of black nodes is sufficient for reaching a situation with w=0.

While the number of black nodes is at least one and we have not reached w=0 or a critical solution yet, there is always at least one white node that if it mutates to black, the length of a black block increases. This move is accepted by the algorithm, because it shortens the white path by removing an edge of cost 1, while it adds one edge of cost 1m to the black block. At each step, the node of each cluster is mutated to white with probability 12m. Therefore, the probability that only the mentioned mutation happens at one step is at least 12·m·1-1mm-112em, where 1-1mm-1 is the probability that no other mutations happen at that step.

Let X=i=1TXi, where Xi is a random variable such that Xi=1 if only a single white node is mutated into black while other nodes are not changed at Step i, and Xi=0 otherwise. At each Step i, before reaching w=0 or a critical solution, Prob(Xi=1)12em. Considering a phase of T=24em2 steps, by linearity of expectation we get EX24em2·12em=12m. Using Chernoff bounds we get ProbX(1-12)12me-Ω(m). As a result, in a phase of 24em2 steps, we either find a solution with w=0, or with probability 1-e-Ω(m) at least 6m white nodes mutate to black, which results in a situation with w=0 because 6m is an upper bound on the number of white to black mutations. Overall with probability 1-o(1), the solution with w=0 is reached in time 24em2.

Lemma 20:

The initial solution, chosen uniformly at random, has at least m48 single black nodes with probability 1-e-Ω(m)

Proof:

Considering the consecutive clusters with respect to their optimal permutation, for any specific cluster, a black (or white) node may be selected for its following cluster with a probability of 1/2. As a result, any selection of nodes in 3 consecutive clusters can happen with probability (1/2)3. There are at least m/3 separate sets of consecutive clusters; therefore, the expected number of single black nodes is at least m3·8. Using Chernoff bounds and considering X to be the number of single black nodes in the initial solution, we have: P(X<(1-1/2)m3·8)e-m3·8·18.

As a result, with a probability 1-e-Ω(m) the initial solution has at least m48 single black nodes as described.

In the proof of the next lemma, we use the Simplified Drift Theorem (Theorem 19) presented in Oliveto and Witt (2011, 2012). Consider a random variable Xt, t0 with positive values that is changed in a stochastic process. Also consider an interval of [a,b],a0. The simplified drift theorem shows that the lower limit of the interval is not reached by Xt with high probability, if the starting point is above b, the average drift of the value of the random variable is positive, and the probability of having big changes on it is small. In this theorem, Ft denotes a filtration on states. In the proof of Lemma 20, we analyse the changes on the size of a large black block, and the filtration is done according to the steps where an accepted change happens on the size of that block.

Theorem 21:

(Simplified Drift Theorem, Oliveto and Witt, 2012). Let Xt, t0, be real-valued random variables describing a stochastic process over some state space. Suppose there exist an interval [a,b]R, two constants δ,ɛ>0 and, possibly depending on l:=b-a, a function r(l) satisfying 1r(l)=o(l/log(l)) such that for all t0 the following two conditions hold:

  1. E[Xt+1-XtFta<Xt<b]ɛ,

  2. Prob(|Xt+1-Xt|jFta<Xt)r(l)(1+δ)j for jN.

Then there is a constant c*>0 such that for T*:=min{t0:Xta|FtX0b} it holds Prob(T*2c*l/r(l))=2-Ω(l/r(l)).

Lemma 22:

With probability 1-o(1), the number of black nodes is at least one during 24e·m2 steps of the node-based (1 + 1) EA.

Proof:

Let r be the number of all black blocks in the solution. From Lemma 18 we know that with high probability, the initial solution consists of at least m48 single black nodes. As a result, in the initial solution r=Ω(m).

In order to reach a solution in which all nodes are white, the number of black blocks needs to reduce. Let's consider the step when for the first time rmε, where 0<ε<1 is a small constant. At this step, rmε2; otherwise, at least mε2 mutations have to had happened at one step which is exponentially unlikely.

We first show that we either have a block of size greater than one at this stage, or we will reach such a situation. Let us assume that all of the blocks at this stage are of size one. For any single black node, there exist two adjacent white nodes that can extend the size of that block, by mutating to black. The probability that a white node is selected and mutated to black is 12m; therefore, with probability P1+22·e·m the size of that block is extended. On the other hand, the probability that this single black node mutates to white is P1-12·m. Therefore, if a change happens on the size of this block, it would be an increase with probability at least
P1N+=P1+P1++P1-1em1em+12m22+e.
Therefore, the probability that none of these blocks experience an increase in the size when they change for the first time is 1-22+er1-22+emε2=e-Ω(mε). As a result, with probability 1-e-Ω(mε), we reach a stage at which there are rmε blocks and one of the blocks is of size at least 2. We refer to this block as the large block.

Now we show that in a phase of m1+ε we reach this stage. Since each single black node has a probability of P1++P1-1em to change at each step, the expected number of steps that is required to make a change on each single black node is at most em. Therefore, by Markov's inequality we know that the probability of not changing each single black node in a phase of 2em is at most 12. Considering a phase of m1+ε steps, we see that the probability of not changing each single black nodes is at most e-Ω(mε). There are at most mε single black nodes, and by union bound we can see that with probability at most mε·e-Ω(mε)=e-Ω(mε) at least one of them does not change. Therefore, with probability 1-e-Ω(mε) all these nodes face a change in the mentioned phase.

For a black block of size l2, there is a probability of Pl+22·e·m that a white node mutates to black and extends the size of that block. But to decrease the size of the block, either the whole block needs to mutate at one step (with probability at most 1(2m)l), or one improving move needs to happen somewhere else at the same step that a black node of either end of the large block is mutating to white (with probability at most 22m). An improving move can be a mutation on a white node that extends a black block, which happens with probability at most 22m for each block, or a mutation on all black nodes of a block, the probability of which is upper bounded by 12m for each block. Since the number of blocks is at most mε, the probability of an improving move to happen, is at most 2·mε2m+mε2m. Overall, the probability of decreasing the size of the large block is
Pl-1(2m)l+22m·2·mε2m+mε2m1(2m)l+1m·3·mε2m4mε2m2.

Now consider a phase of m32 steps. With probability at most 4mε2m2·m32=2mεm the size of the large block is decreased at least once. Therefore, with probability 1-O(mε-1/2)=1-o(1) its size is not decreased in the mentioned phase.

On the other hand, there is a probability of at least Pl+1e·m at each step, that the size of the block is increased. Let Xi be a random variable such that Xi=1 if the size of large block is increased at Step i, and Xi=0 otherwise. The expected number of increases in the size of that block in a phase of m32 steps is i=0m3/2Xime. Moreover, by Chernoff bounds we have i=0m3/2Xim2e with probability at least 1-e-Ω(m), which means with probability 1-o(1), the size of the large block is at least m2e after a phase of m32 steps.

After this phase, we consider a phase of 24e·m2 steps and show that with high probability, the large black block does not lose more than half of its nodes. In order to show this, we use the Simplified Drift Theorem (Oliveto and Witt, 2011, 2012) presented in Theorem 19. Let t0 be the first step after the previous phase has finished and let L be the largest block at that time. We define Xt, t0, as
Xt:=sizeofLatt0+thenumberofstepsincreasingsizeofLfromt0untilt0+t-thenumberofnodesremovedfromLfromt0untilt0+t.

Note that Xt always represents a lower bound on the size of L at time t+t0. We filter the steps and consider only the relevant steps, that is, the steps in which a change happens on the size of L. Moreover, we set a=X02, b=X0, r=1, ɛ=14e and δ=1.

Earlier, we found an upper bound on Pl- and a lower bound on Pl+. An upper bound on the latter is Pl+22m because in order to increase the size of a black block, at least one of the two white neighbours of it need to mutate to black. Using these bounds, we get upper and lower bounds for Prel=Pl++Pl-, the probability of each step to be a relevant step:
1e·mPrel1m+4mε2m22m.
The last inequality holds for sufficiently large m, because ε<1. At each step, with probability at least 1em, an increase happens on the size of L; hence, the positive drift on Xt is 1em. Considering conditional probability, at each relative step, the probability of an increase on the size of L is 1/emPrel. Therefore, the positive drift on Xt in the relevant steps is:
Δ+1em.1Prel1em·m212e.
Similarly, the expected decrease in the number of black nodes of that block, in the relevant steps is
Δ-l(2m)l+k=1mk·k+1(2m)k·2·mε2m+mε2m·1Prel1(2m)l+2m·3·mε2m·1Prel4mεm2·1Prel4mεm2·em4emεm,
where k is the number of black nodes that are removed from the large block, and k+1(2m)k is the probability of such mutations happening in one step. Here, k+1 is the number of possible ways that L can lose k nodes, since all these nodes have to be taken from the two ends of the block. Also, 1(2m)k is the probability that those nodes mutate to white. Moreover, k=1mk·k+1(2m)k=1m+2×3(2m)2+3×4(2m)3++m×(m+1)(2m)m1m+12m+122m++12k-1m++12m-1m2m holds for m3. Using Δ- we find the total expected difference of
E[Xt+1-XtFta<Xt<b]=Δ+-Δ-12e-4emεm.
Therefore, the first condition of the simplified drift theorem holds for an appropriate choice of ɛ. The second condition also holds because at each step, Xt can be increased by at most 1 and the probability of decreasing it by j is
Prob(Xt-Xt+1jFta<Xt)j+1(2m)j·1Prel12j.
Therefore, the conditions of simplified drift theorem hold and we get
Prob(T*2c*·X0/2)=2-Ω(X0/2).
As a result, with probability 1-2-Ω(m), the size of the large block does not decrease to less than a, in a phase of c·m2 steps. Overall, with probability 1-o(1), the number of black blocks is at least one during the mentioned phase.
Lemma 23:

From a critical solution, with probability 1-o(1), the optimal solution is reached in time O(m2+logm).

Proof:

The cost of a critical solution with w=3 is m-4m+3+2·m2, and it can be observed from Lemma 15 that all solutions with w5 have a larger cost and cannot replace this solution. Therefore, only a solution with w2 or a critical solution with w=4 can replace this solution which can be obtained by a 1-bit flip.

From a critical solution with w=4, the number of white nodes does not increase, because there are only two m2-edges which connect black and white chains in this situation, and according to Lemma 15, in order to increase the number of white nodes, all black nodes need to mutate to white at one step, which is exponentially unlikely. Moreover, a noncritical solution with the same number of white nodes is not accepted after a critical solution either because if the selected white nodes are not from consecutive clusters, more than two m2-edges are required in the tour.

Here we show that there exists a 2-bit flip in a critical solution that reduces the number of white nodes by two, and results in a solution with one chain of n-2 black nodes and one chain of 2 white nodes. From that solution, similar to our argument in the previous paragraph, increasing the number of white nodes is exponentially unlikely, and according to Lemma 17, with probability 1-o(1), the optimal solution is found in time O(m2).

From Lemma 12 we know that the cost of a critical solution is (m-5)·1m+3+2·m2. By flipping two white nodes of one end of the white chain, conditions of Property 12 hold and the cost of the new solution is (m-3)·1m+2+2·m2, which is better than the cost of the critical solution with respect to the fitness function. Therefore, this solution is accepted by the algorithm. This move has a probability of 1m2. Therefore, the expected time until it happens is m2, and with probability at least 12 it happens in 2·m2 steps. Considering logm phases of 2·m2 steps, by Markov's inequality we get that with probability 1-(12)logm=1-1m, this 2-bit flip happens in time O(m2logm), which completes the proof.

Theorem 24:

Starting from an initial solution chosen uniformly at random, the node-based (1 + 1) EA finds the optimal solution of GG in time O(m2logm) with probability 1-o(1).

Proof:

Lemma 20 shows that in a phase of c·m2, c=24e, steps, the number of black nodes does not decrease to 0 with probability 1-o(1). Therefore, due to Lemma 17, if we do not face a critical solution, the optimal solution is found in time O(m2) with probability 1-o(1). Moreover, if we face a critical solution, according to Lemma 21, with probability 1-o(1) it takes O(m2logm) additional steps to find the optimal solution. Overall, with probability 1-o(1), in time O(m2logm), the optimal solution is found by the node-based (1 + 1) EA.

4.1.2  Experimental Results

In this section, we present experimental results that confirm our theoretical analysis on the behaviour of nodes-based (1 + 1) EA optimizing GG. We have run the algorithm for instances of different sizes, and we have done that 30 times with a maximum of 107 iterations for each instance. The results are summarised in Table 1. The first and second columns indicate the input size and the percentage of runs that result in the optimal solution. The average and the maximum number of iterations until finding this solution are presented in the third and fourth columns. The observed maximum runtime is consistent with the asymptotic bound that we found in our theoretical analysis.

Table 1:
Experimental results of node-based (1 + 1) EA on GG.
Input Size (m)%OptimumAverage RuntimeMaximum Runtime
20 100 495 3954 
50 100 1873 10800 
100 100 4076 19252 
200 100 27817 190882 
500 100 33280 280873 
1000 100 110186 2518061 
Input Size (m)%OptimumAverage RuntimeMaximum Runtime
20 100 495 3954 
50 100 1873 10800 
100 100 4076 19252 
200 100 27817 190882 
500 100 33280 280873 
1000 100 110186 2518061 

In Figure 8 we also include a graphical plot comparing the observed running times to a quadratic curve, more precisely 10n2. This plot suggests that the O(m2) bound for the worst-case running time of this algorithm on this problem may not be tight.

Figure 8:

Comparing observed runtime with a quadratic curve.

Figure 8:

Comparing observed runtime with a quadratic curve.

4.2  Lower-Bound Analysis for Node-Based (1 + 1) EA

In this section, we prove an exponential lower bound on the optimization time of node-based (1 + 1) EA. In Section 4.2.1, we introduce an instance of the Euclidean GTSP, GS, that is difficult to solve by means of our algorithm and discuss some geometric properties of it. In Section 4.2.2, we show how the algorithm reaches a local optimum in our instance and discuss how it can reach the global optimum after reaching the local optimum. Consequently, we find a lower-bound for the optimization time of the algorithm. In Section 4.2.3, using an efficient algorithm that solves the lower-level problems of this instance, we present some experimental results that confirm the obtained lower bound.

4.2.1  A Hard Instance and Its Geometric Properties

The hard instance presented in this section, which is partly illustrated in Figure 9, is composed of m clusters. Let a>1 be a constant. Only ma of these clusters have one node. Other clusters contain m nodes which makes the total number of nodes n=m(m-ma)+ma. All nodes are connected to each other and the cost of travelling between them is their Euclidean distance.

Figure 9:

Euclidean hard instance, GS, for node-based (1 + 1) EA.

Figure 9:

Euclidean hard instance, GS, for node-based (1 + 1) EA.

In the clusters that have m nodes, m-1 nodes are placed on the small circle and are shown by a star in the figure. We refer to them as white nodes or inner nodes. For simplicity we assume that the inner nodes of each cluster all lie on the same position. The same result can be obtained by placing the nodes within a small circle having an arbitrarily small radius ε. The remaining node of each cluster, shown black in the figure, is placed on the larger circle. Other ma clusters do not have any nodes on the small circle and have only one black node on the larger circle. The figure demonstrates how the clusters are distributed on the two circles. The arc between black nodes of two consecutive clusters subtend an angle of 2πm, while the arc between two consecutive one-node clusters subtend an angle of a·2πm.

If we represent the radius of inner and outer circles by r and r', respectively, then a black node and a white node have a distance of at least r'-r and the length of edges between two adjacent black nodes is 2r'sin(πm). The minimum length of edges between two black nodes of one-node clusters is also quite similar to previous formula with a greater angle: 2r'sin(πma).

Here we define a characteristic for the introduced instance, which is required for proving the exponential lower bound on the optimization time of node-based (1 + 1) EA. This characteristic shows the ratio of r to r' and is defined by the following inequality:
r<122sinπm-sin2πmr'.
(1)

We now prove that if the introduced instance has this characteristic, then for m8a, the best tour on any spanning set that has at least one white node, contains only 2 edges between outer and inner circles. We also show that the optimal solution consists of all the black nodes, but with high probability, the node-based (1 + 1) EA reaches a plateau of local optimums with ma black nodes and m-ma white nodes. Note that in such local optimums, selecting ma black nodes is a must, since there's no other choice for those clusters. The local optimum in this instance is a basin of attraction since the distance between white nodes are smaller than the distance between the black nodes.

In the proof of the following property, we have used a couple of theorems from Quintas and Supnick (1965). Given a set of vertices, it is stated in Theorem 1 of that paper that the shortest spherical or planar polygon does not intersect itself. Moreover, Theorem 2 of that paper proves that the shortest polygon contains the vertices on the boundary of its convex hull in their cyclic order.

Property 25:

The best tour on a spanning set that has at least one white node, contains only two edges between nodes on the inner and outer circle for m8a.

Proof:

We first take into account the tour on a node set consisting of only the black nodes of one-node clusters. There is no other choice except selecting those nodes because their clusters have no other node. For such a node set, due to Theorem 2 of Quintas and Supnick (1965), the optimal tour is to visit all the nodes in the order they appear on the convex hull. This order will be respected in an optimal tour even if there are some inner nodes to visit as well, because according to Theorem 1 in Quintas and Supnick (1965) the optimal solution cannot intersect itself. In other words, if some white nodes are selected in the upper level, while visiting the outer nodes with respect to their convex hull order, a solution occasionally travels the distance between outer circle and inner circle to visit some inner nodes, and then travels roughly the same distance back to the outer circle, to continue visiting the remaining outer nodes. As illustrated in Figure 10, this can be done generally in two ways:

  1. Case 1: Leaving the outer circle only once and visiting all inner nodes together.

  2. Case 2: Leaving the outer circle more than once and visiting some of the inner nodes each time.

We now show that there exists a solution for Case 1 that is less costly than all the solutions of Case 2. As a result, the best tour on a spanning set with at least one white node travels the distance between two circles only twice.

If we represent the number of times a tour leaves the outer circle to visit some nodes in the inner circle with k, then for the solutions of Case 1, k=1 and for solutions of Case 2, k2. For both cases the number of edges connecting the two circles is 2k. The picture at the left side of Figure 10 illustrates a solution with k=1 for which we find an upper bound of the tour cost as the following:
C(1)<2πr'+2πr+2(r'-r).
(2)

The last part of this formula is two times of the length of edge AB which is a direct line from the inner circle to the outer circle along their radius. The lengths of edges A'B' and A''B'' are actually more than that because their ends are not from same clusters. Nevertheless, Formula 2 presents an upper bound of the total cost of the tour because we are considering the complete circumference of both circles. In other words, the distance between A and A' is included in the circumference of the large circle and the distance between B and B' is included in that of the small circle and according to quadrilateral inequality |A'B'|<|A'A|+|AB|+|BB'|.

On the other hand, a lower bound of the tour cost in all solutions of Case 2 is:
C(k)>ma-k2sinπm/ar'+2k(r'-r).
(3)

In Formula 3, 2sin(πm/a)r' is the length of the edges connecting two consecutive clusters with one black node. These edges are the longest edges that can be removed from the tour when we add two edges connecting inner and outer circles. There are initially at least m/a of these edges and in this formula we have omitted k of them from the tour.

We can rewrite the right side of inequality 3 as:
C(k)>ma2sinπm/ar'-k·2sinπm/ar'+2k(r'-r).
Since for m>8a, sin(πm/a)<0.39 and (ma)sin(πm/a)>3.06 the above expression is at least:
2(3.06)r'-2k·0.39r'+2k(r'-r).
This expression is monotone increasing in k when r0.61r'; therefore, setting k=2 we get the smallest lower bound of C(k) for k2:
C(k)>4.56r'+4(r'-r).
Now if we prove that the upper bound we found for C(1) in Inequality 2 is less than the above expression, we can then conclude that C(1)<C(k) for k2. Therefore, we should prove that:
2πr'+2πr+2(r'-r)4.56r'+4(r'-r)2πr'+2πr4.56r'+2r'-2r(π+1)r(-π+3.28)r'r-π+3.28π+1r'0.033r'.

The latest inequality holds, because the constraint we introduced on the value of r in Equation 1 is quite tight and we can see that for m8 it gives us r<0.03r' which is a tighter bound for r than what the right side of the last equation gives us.

Figure 10:

Left side: Case 1; right side: Case 2.

Figure 10:

Left side: Case 1; right side: Case 2.

Property 26:

An optimal solution chooses all black nodes and visits them in clockwise or anti-clockwise order when m7a.

Proof:
The tour comprising all black nodes has a cost strictly less than 2πr' which is the length of the circumference of the circle with radius r'. Therefore, we can state that 2πr' is an upper bound on the cost of the optimal solution. Besides, in Property 25 we saw that the best tour when at least one white node is selected has only two edges connecting the two circles. Therefore, as a lower bound on the cost of a solution with any spanning set other than all black nodes, we can use Formula 3 with k=1 and get
C(1)ma-12sinπmar'+2(r'-r).
We here show that with the assumptions we have on the value of r, this lower bound is greater than the upper bound we found for the cost of optimal solution. By replacing r with its maximum value from Equation 1 we have:
C(1)2ma-1sinπma+2-2sinπm-sin2πmr',
since for m7a
ma-1sinπma>2.60,
and for m4
2sinπm-sin2πm0.42,
we can conclude that for mmax{4,7a}
C(1)(2(2.60)+2-(0.42))r'=6.78r'>2πr'.
As a result, for m7a the minimum cost of such tours is greater than 2πr' which is the maximum cost when all black nodes are selected. Hence, the tour consisting of all black nodes is the optimal solution and since they comprise the convex hull the optimal Hamiltonian cycle on them would be visiting them in the order they appear in the convex hull.
Property 27:

Let P and P' be nonoptimal spanning sets and PoutP and Pout'P' be their subset of outer nodes. Moreover, let S and S' be optimal solutions with respect to P and P', respectively. For m8a, if PoutPout' and |Pout'|=|Pout|+1 then C(S)<C(S').

Proof:

The main idea behind this property is that distances in the inner circle are significantly shorter than distances in the large circle and if r is sufficiently smaller than r' (Inequality 1), any single mutation that replaces an inner node with the outer node of the same cluster, increases the cost of the whole tour.

According to Property 25, the permutation chosen in the lower level has all the inner nodes listed between two black nodes. If one inner node is removed and one outer node is added, the part of total tour that includes all inner nodes gets shorter and the part that connects black nodes gets longer. The edges connecting the inner nodes are at most the size of the diameter of the inner circle. Therefore, the maximum decrease for removing an inner node is upper bounded by 4r. In the following, we find the minimum increase for adding a black node.

We analyse the increase in two cases. The first case is illustrated in the left side of Figure 11 in which the new black node is placed between two black nodes in the tour. N is the new node and M and O are its neighbours. The edge connecting M and O will be removed from the tour and the two other edges in the triangle will be added. If we show the length of these edges by C, A and B respectively and the cost of the tour before and after this change by Cold and Cnew, then:
Cnew=Cold-C+A+B.
So the increase caused by this change would be:
d=Cnew-Cold=A+B-C.
By splitting C with an orthogonal line from N we can write d as:
d=(A-C1)+(B-C2).
(4)
We claim that when A and B have their smallest values, d has also its smallest value. We assume A'A and B'B and show that the corresponding d' will be at least d. When A'A, the arc between O' and N' is also greater than or equal to the arc between O and N. Therefore, the angle y', facing that arc, would be also greater than or equal to y. Besides, C2=B·cos(y), and all the things we said about C2 and y hold for C1 and x too. Altogether, we can write d' as:
d'=A'-A'·cos(x')+B'-B'·cos(y'),
since y'y and x'x, and all of them are acute angles
d'A'-A'·cos(x)+B'-B'·cos(y).
If we represent A' by αA and B' by βB where α and β are real numbers greater than one, then we have:
d'αA-A·cos(x)+β(B-B·cos(y))d'α(A-C1)+β(B-C2).
By comparing d' with the value of d in Equation 4 it holds that d'd.
The shortest edges on the outer circle are from two consecutive clusters and have a length of A=B=2sin(π/m)r'. C has similarly the value of 2sin(2π/m)r'. As a result, the minimum increase in the convex tour will be:
d=A+B-C=4sinπmr'-2sin2πmr'.
(5)

The second case is when the new node is added just before or after visiting the inner nodes, as illustrated in the right side of Figure 11. In this case, comparing to the previous case, edge B is longer and the angle between A and B is closer to the right angle. The minimum length of A is also the same as the minimum length of that in previous case. Altogether, with quite similar explanation to what we had for Case 1, the minimum increase in this case is larger than that in Case 1. Therefore the minimum increase in the convex tour, d, which is found in 5 is also less than the minimum increase in Case 2 and can be used for both cases.

On the other hand, as mentioned earlier the maximum decease caused by removing an inner node is 4r. Therefore, the total increase of the tour cost is at least
4sinπmr'-2sin2πmr'-4r.
From our assumption on the value of r in Equation 1 we can find that the above expression has a positive value; therefore, C(S)<C(S').
Figure 11:

Left: Case 1, adding a new outer node between two outer nodes. Right: Case 2, adding a new outer node just before inner nodes.

Figure 11:

Left: Case 1, adding a new outer node between two outer nodes. Right: Case 2, adding a new outer node just before inner nodes.

4.2.2  Runtime Analysis

In this section, we give a lower bound on the runtime of node-based (1 + 1) EA. We start by presenting a lemma about the initial solution that is chosen uniformly at random. Then we introduce the Multiplicative Drift Theorem (Doerr et al., 2012) which is used in our analysis of Lemma 28 to upper bound the time of reaching a locally optimal solution. Then we discuss the main theorem of this section.

Lemma 28:

The initial solution with a spanning set that is chosen uniformly at random, has at least 0.91-1a(m-1) white nodes with probability 1-e-Ω(m).

Proof:
For m-ma clusters that have m nodes, the probability of selecting one of white nodes is m-1m. Therefore, the expected number of selected white nodes is
E[X]=m-mam-1m=1-1a(m-1).
By Chernoff bounds we can have:
ProbX<(0.9)(1-1a)(m-1)e-0.005(1-1a)(m-1)=e-Ω(m).
Therefore, the probability that the initial solution has at least (0.9)(1-1a)(m-1) white nodes is 1-e-Ω(m).
Theorem 29:
(Multiplicative Drift, Doerr et al., 2012). Let SR be a finite set of positive numbers with minimum smin. Let {X(t)}tN be a sequence of random variables over S{0}. Let T be the random variable that denotes the first point in time tN for which X(t)=0. Suppose that there exists a real number δ>0 that
EX(t)-X(t+1)X(t)=sδs
holds for all sS with Prob[X(t)=s]>0.
Then for all s0S with Prob[X(0)=s0]>0, we have
E[T|X(0)=s0]1+ln(s0/smin)δ.
Lemma 30:

Starting with an initial solution chosen uniformly at random, with probability 1-e-Ω(m), the node-based (1 + 1) EA reaches a local optimum on GS in expected time of O(mlnm).

Proof:

For a solution x(t) at time t, we define X(t) to be the number of m-node clusters for which the outer node is selected. Note that this function, as required in Theorem 27, maps the local optimum to zero and all other solutions to positive numbers.

If we assume that the number of m-node clusters that their outer node is chosen in solution x(t) is k, we can find the expected number of that for x(t+1) as follows.

As mentioned in Property 27, if only one mutation operation happens to increase the number of outer nodes, it will increase the cost and the algorithm will refuse it. Therefore, if only one mutation happens that is accepted by the algorithm, it has to change a node from the outer circle to the inner circle and decrease Xt by 1. The probability of this event is at least
p1=k1mm-1m1-1mm-1kmm-1m1e.
In the above formula, 1m is the probability of mutation for any of the nodes in the spanning set and m-1m is the probability that the new selected node for the mutated cluster is a white node. We need one of k clusters to mutate and all others to stay unchanged. In other words, (1-1m)m-1 in the above formula is the probability of m-1 clusters to stay unchanged.
On the other hand, in some situations, some (one or more) mutations in the opposite direction can happen beside a mutation from outer circle to inner circle. If we want Xt to increase by one, then at least two mutations must happen to change a node from inner circle to outer circle. The probability of this event is at most
p-1=kmm-1mm-ma-k21m21m2km12!m2.

In the above formula, (km)(m-1m) is the probability of one node to change from outer circle to inner circle. Then we have the number of different ways we can select two clusters that their selected nodes lies on the inner circle. The first (1m)2 is the probability that the two selected clusters mutate and the second (1m)2 is the probability that after mutation the node on the outer circle is selected in those two clusters.

Generally, for Xt to increase by q, we need at least one mutation from outer circle to inner circle and q+1 mutations in opposite direction and the probability of this even would be at most
p-q=kmm-1mm-ma-kq+11mq+11mq+1km1(q+1)!mq+1.
As a result, the difference made in Xt by the next step would be at least
E[X(t)-X(t+1)X(t)=k]p1-q=1mq·p-q.
By replacing the lower bound of p1 and upper bounds of p-q, we get
E[[X(t)-X(t+1)X(t)=k]kmm-1m1e-km12!m2--mkm1(m+1)!mm+1kmm-1em-1m2--1mm+1kmm-1em-m·1m2kmm-1-eem.
For m4 the expression (m-1-eem) is at least 3-e4e. So setting δ=3-e4em and using the Multiplicative Drift Theorem we find the expected time of reaching the local optimum as:
E[TX(0)=0.1m]1+ln(0.1m/1)3-e3em=O(mlnm).
In the above formula we have assumed X(0)0.1m because from Lemma 26 we know that with probability 1-e-Ω(m), the initial solution has less than 0.1m black nodes other than the fixed black nodes.
Theorem 31:

Starting with an initial solution chosen uniformly at random, if m8a, then the optimization time of the node-based (1 + 1) EA presented in Algorithm 5 on GS is Ω(n2)m-ma with probability 1-e-Ω(mδ), δ>0.

Proof:

In order to prove this theorem, we introduce a phase P in which

  1. The algorithm reaches a local optimum with high probability

  2. The algorithm does not reach the global optimum with high probability

Then we show that after this phase, only a direct jump from the local optimum to the global optimum will help the algorithm improve the results, the probability of which is 1m2m-ma.

As we saw in Lemma 28, the expected time of node-based (1 + 1) EA to reach the local optimum is O(mlnm). Let c be the appropriate constant, so that c·mlnm is an upper bound on the expected time for reaching that local optimum. Now consider a phase of 2c·mlnm steps. If T is the actual time at which the local optimum is reached, by Markov's inequality we have: Prob(T>2c·mlnm)12. If we repeat this phase for mɛlnm times, ɛ>0 a constant, then we get a phase of P=2c·m1+ɛ steps in which the probability of not reaching the local optimum is:
Prob(T>2c·m1+ɛ)12-mɛlnm=e-Ω(mδ),
where 0<δ<ɛ. As a result, the algorithm reaches the local optimum in phase P with probability 1-e-Ω(mδ). We here prove that in this phase, the algorithm does not reach the global optimum with probability 1-e-Ω(mɛ).
From Lemma 26 we know that with high probability the initial solution has not too many black nodes other than the fixed black nodes. Here we show that with high probability the number of these nodes does not increase significantly during the phase P; hence, the global optimum will not be reached. The probability of selecting each of the clusters for a mutation is 1/m and for clusters with m nodes, the probability of changing the selected node to the black node is 1m; therefore, at each step, the probability that each cluster's node is changed from one of its inner nodes to its outer node is 1m2. For m-ma clusters, at each step the expected number of clusters that face such a mutation is at most 1m and in a phase of 2cm1+ɛ steps, is 2cmɛ. If we define X as the number of clusters that will have a mutation like this, then by Chernoff bound we have
Prob(X3cmɛ)e-2cmɛ(0.5)2/3=e-Ω(mɛ).
Therefore, with high probability, during the mentioned phase, at most 3cmɛ clusters will happen to have a mutation with the result of selecting their black node. Besides, from Lemma 26 we know that with probability 1-e-Ω(m), the initial solution has at least 0.9(1-1a)(m-1) white nodes. Hence, with probability e-Ω(mɛ), the algorithm will not reach a state with less than 0.9(1-1a)(m-1)-3cmɛ white nodes during phase P. As a result, the probability of having a direct jump to the global optimum in phase P is at most
2c·m1+ɛ1m20.9(1-1a)(m-1)-3cmɛ=m-Ω(m).

Consequently, with high probability, the global optimum will not be reached during phase P. According to Property 27, no mutation from the inner circle to the outer circle can decrease the tour cost when the resulting solution is not the optimal solution. Hence, such a change may be accepted by the algorithm only when another mutation on the other direction happens at the same step. At the local optimum, there is no black node other than the fixed black nodes and no mutation from the outer circle to the inner circle can happen; therefore, a mutation from the inner circle to the outer circle cannot happen either. As a result, after reaching a local optimum, only a direct jump to the global optimum can help moving towards the global optimum and the probability of such a jump is 1m2m-ma. We now consider (m22)m-ma steps following phase P. The probability of reaching the optimum solution is by union bound at most: m22m-ma1m2m-ma=(12)m-ma. Hence the probability of not reaching the global optimum in the mentioned phase is 1-(12)m-ma=1-e-Ω(m). Altogether, with probability 1-e-Ω(mδ), the optimization time is at least m22m-ma.

4.2.3  Experimental Results

In this section, we include experimental results that confirm the exponential lower bound that we have proved in Section 4.2.2 for the optimization time of the studied instance. We try the algorithm with a maximum of 106 iterations on instances of 6 different input sizes, and we show that they all stick to the local optimum. For the lower-level optimization we have developed an algorithm that visits all the selected black nodes in the order they appear on the large circle, then visits all selected white nodes in the order they appear on the small circle, and goes back to the first black node to form a Hamiltonian circuit. This algorithm assures Property 25 of the optimal lower-level solution, which was important in our theoretical analysis. We have set r=1 and r'=108 so that the inequality of Equation 1 holds for the maximum input size that we are running the algorithm with. The results, based on 30 runs of the algorithm, are summarised in Table 2. The first and second columns indicate the input size and the percentage of runs that stick to the local optimum, respectively. The average and maximum number of iterations until finding the local optimum are presented in the third and fourth columns, respectively. As the table suggests, 100% of the runs for all input sizes find the local optimum and stick to that until the maximum iteration number is reached. This confirms the theory results of Section 4.2.2.

Table 2:
Experimental results of node-based (1 + 1) EA on GS.
Input Size (m)%LOAverage Runtime to Reach Local OptimumMaximum Runtime to Reach Local Optimum
20 100 15 69 
50 100 36 275 
100 100 66 346 
200 100 82 770 
500 100 197 1300 
1000 100 680 3120 
Input Size (m)%LOAverage Runtime to Reach Local OptimumMaximum Runtime to Reach Local Optimum
20 100 15 69 
50 100 36 275 
100 100 66 346 
200 100 82 770 
500 100 197 1300 
1000 100 680 3120 

5  Conclusion

Evolutionary algorithms and local search approaches have been shown to be very successful for solving the generalized travelling salesperson problem. We have investigated two common hierarchical representations together with local search algorithms and simple evolutionary algorithms from a theoretical perspective. In the first part of this article, which is based on the conference version (Pourhassan and Neumann, 2015), the focus is on local search approaches. By presenting instances where they mutually outperform each other, we have gained new insights into the complementary abilities of the two approaches. Furthermore, we have presented and analysed a class of instances where combining the two approaches into a variable-neighbourhood search helps to escape from local optima of the single approaches.

In the second part, we have investigated the behaviour of hierarchical evolutionary algorithms for this problem. We have proved that there are instances which node-based (1 + 1) EA solves to optimality in polynomial time, while cluster-based (1 + 1) EA needs exponential time to find an optimal solution for them. Then we have presented a Euclidean instance of the GTSP to find an exponential lower bound on the optimization time of this algorithm. Our lower bound analysis for a geometric instance shows that the Euclidean case is hard to solve even if we assume that the lower-level TSP is solved to optimality in no time.

References

Applegate
,
D.
,
Bixby
,
R. E.
,
Chvatal
,
V.
, and
Cook
,
W. J.
(
2013
).
Concorde TSP solver
.
Retrieved from
http://www.math.uwaterloo.ca/tsp/concorde/index.html
Auger
,
A.
, and
Doerr
,
B
. (
2011
).
Theory of randomized search heuristics: Foundations and recent developments
.
Singapore
:
World Scientific
.
Corus
,
D.
,
Lehre
,
P. K.
,
Neumann
,
F.
, and
Pourhassan
,
M
. (
2016
).
A parameterised complexity analysis of bi-level optimisation with evolutionary algorithms
.
Evolutionary Computation
,
24
(
1
):
183
203
.
Doerr
,
B.
,
Johannsen
,
D.
, and
Winzen
,
C
. (
2012
).
Multiplicative drift analysis
.
Algorithmica
,
64
(
4
):
673
697
.
Downey
,
R. G.
, and
Fellows
,
M. R
. (
1999
).
Parameterized complexity
.
New York
:
Springer
.
Englert
,
M.
,
Röglin
,
H.
, and
Vöcking
,
B
. (
2014
).
Worst case and probabilistic analysis of the 2-opt algorithm for the TSP
.
Algorithmica
,
68
(
1
):
190
264
.
Fischetti
,
M.
,
González
,
J. J. S.
, and
Toth
,
P
. (
1997
).
A branch-and-cut algorithm for the symmetric generalized traveling salesman problem
.
Operations Research
,
45
(
3
):
378
.
Gutin
,
G.
, and
Punnen
,
A
. (
2007
).
The traveling salesman problem and its variations
.
New York
:
Springer
.
Held
,
M.
, and
Karp
,
R. M.
(
1961
).
A dynamic programming approach to sequencing problems
. In
Proceedings of the 1961 16th ACM National Meeting
, pp.
71.201
71.204
.
Hu
,
B.
, and
Raidl
,
G. R.
(
2008
).
Effective neighborhood structures for the generalized traveling salesman problem
. In
EvoCOP
, pp.
36
47
.
Lecture Notes in Computer Science, Vol. 4972
.
Jansen
,
T
. (
2013
).
Analyzing evolutionary algorithms—The computer science perspective
.
Natural Computing Series. New York
:
Springer
.
Karapetyan
,
D.
, and
Gutin
,
G
. (
2012
).
Efficient local search algorithms for known and new neighborhoods for the generalized traveling salesman problem
.
European Journal of Operational Research
,
219
(
2
):
234
251
.
Kötzing
,
T.
,
Neumann
,
F.
,
Röglin
,
H.
, and
Witt
,
C
. (
2012
).
Theoretical analysis of two ACO approaches for the traveling salesman problem
.
Swarm Intelligence
,
6
(
1
):
1
21
.
Kratsch
,
S.
,
Lehre
,
P. K.
,
Neumann
,
F.
, and
Oliveto
,
P. S
. (
2010
).
Fixed parameter evolutionary algorithms and maximum leaf spanning trees: A matter of mutation
. In
Proceedings of Parallel Problem Solving from Nature
, pp.
204
213
.
Kratsch
,
S.
, and
Neumann
,
F
. (
2013
).
Fixed-parameter evolutionary algorithms and the vertex cover problem
.
Algorithmica
,
65
(
4
):
754
771
.
Neumann
,
F.
, and
Witt
,
C
. (
2010
).
Bioinspired computation in combinatorial optimization: Algorithms and their computational complexity
. 1st ed.
New York
:
Springer
.
Oliveto
,
P.
, and
Witt
,
C
. (
2011
).
Simplified drift analysis for proving lower bounds in evolutionary computation
.
Algorithmica
,
59
(
3
):
369
386
.
Oliveto
,
P.
, and
Witt
,
C.
(
2012
).
Simplified drift analysis for proving lower bounds in evolutionary computation
.
Technical Report. Retrieved from
http://arxiv.org/abs/1211.7184
Pourhassan
,
M.
, and
Neumann
,
F.
(
2015
).
On the impact of local search operators and variable neighbourhood search for the generalized travelling salesperson problem
. In
Proceedings of the 2015 on Genetic and Evolutionary Computation Conference
(
GECCO
), pp.
465
472
.
Quintas
,
L. V.
, and
Supnick
,
F
. (
1965
).
On some properties of shortest Hamiltonian circuits
.
The American Mathematical Monthly
,
72
(
9
):
977
980
.
Sutton
,
A. M.
, and
Neumann
,
F
. (
2012
).
A parameterized runtime analysis of simple evolutionary algorithms for makespan scheduling
. In
Proceedings of the Twelfth Conference on Parallel Problem Solving from Nature
, pp.
52
61
.
Sutton
,
A. M.
,
Neumann
,
F.
, and
Nallaperuma
,
S
. (
2014
).
Parameterized runtime analyses of evolutionary algorithms for the planar Euclidean traveling salesperson problem
.
Evolutionary Computation
,
22
(
4
):
595
628
.
Theile
,
M.
(
2009
).
Exact solutions to the traveling salesperson problem by a population-based evolutionary algorithm
. In
Evolutionary Computation in Combinatorial Optimization
, pp.
145
155
.
Lecture Notes in Computer Science, Vol. 5482
.