## Abstract

The generalized travelling salesperson problem is an important NP-hard combinatorial optimization problem for which metaheuristics, such as local search and evolutionary algorithms, have been used very successfully. Two hierarchical approaches with different neighbourhood structures, namely a cluster-based approach and a node-based approach, have been proposed by Hu and Raidl (2008) for solving this problem. In this article, local search algorithms and simple evolutionary algorithms based on these approaches are investigated from a theoretical perspective. For local search algorithms, we point out the complementary abilities of the two approaches by presenting instances where they mutually outperform each other. Afterwards, we introduce an instance which is hard for both approaches when initialized on a particular point of the search space, but where a variable neighbourhood search combining them finds the optimal solution in polynomial time. Then we turn our attention to analysing the behaviour of simple evolutionary algorithms that use these approaches. We show that the node-based approach solves the hard instance of the cluster-based approach presented in Corus et al. (2016) in polynomial time. Furthermore, we prove an exponential lower bound on the optimization time of the node-based approach for a class of Euclidean instances.

## 1 Introduction

Evolutionary algorithms and other metaheuristics have been applied to a wide range of combinatorial optimization problems. Understanding the behaviour of metaheuristics on problems from combinatorial optimization is a challenging task due to the large amount of randomness involved in these algorithms.

During the past decade, much progress has been made on the analysis of evolutionary algorithms and ant colony optimization for problems of classical benchmark functions and problems from combinatorial optimization (Auger and Doerr, 2011; Jansen, 2013). Results have been achieved for classical polynomially solvable problems such as sorting, shortest path, minimum spanning trees, and maximum matching as well as for some of the best known NP-hard combinatorial optimization problems such as vertex cover, makespan scheduling, and the travelling salesperson problem (Neumann and Witt, 2010; Theile, 2009).

Furthermore, bio-inspired computing methods have been studied in the context of parameterized complexity (Downey and Fellows, 1999; Kratsch et al., 2010; Kratsch and Neumann, 2013). This approach allows us to study the runtime in dependence of some structural parameters of the given instances and helps us to classify when an instance gets hard for the examined algorithm. Results have been obtained for some of the most prominent NP-hard combinatorial optimization problems such as vertex cover, makespan scheduling (Sutton and Neumann, 2012), and the Euclidean travelling salesperson problem (Sutton et al., 2014). The parameterized analysis has also been used to study the generalized minimum spanning tree problem (GMSTP) and the generalized travelling salesperson problem (GTSP) (Corus et al., 2016). This article aims to investigate the latter problem in more detail.

The GTSP is given by a set of cities with distances between them. The cities are divided into clusters and the goal is to find a tour of minimal cost that visits one city from each cluster exactly once. Hu and Raidl (2008) have presented two hierarchical approaches for solving the GTSP: the *cluster-based* approach, which uses a permutation on the different clusters in the upper level and finds the best node selection for that permutation on the lower level, and the *node-based* approach, which selects a node for each cluster and then works on finding the best permutation of the chosen nodes. Combining the two hierarchical approaches, they have also presented a variable neighbourhood search algorithm for solving the GTSP. With this article, we contribute to the theoretical understanding of local search methods and simple evolutionary algorithms based on these hierarchical approaches for GTSP. The analysis on local search methods (based on the conference version, Pourhassan and Neumann, 2015) is presented in Section 3. We investigate the local search methods by presenting instances for which the two approaches mutually outperform each other. We also present a situation where both cluster-based and node-based local search approaches stick to a local optimum, but the combination of the two approaches solves the problem to optimality.

After investigating local search methods, this article extends the conference version (Pourhassan and Neumann, 2015) by investigating simple evolutionary algorithms in Section 4. A (1 + 1) EA using the cluster-based approach is analysed in Corus et al. (2016) by presenting upper and lower bounds for the optimization time of the algorithm. In this article, we show that the worst case instance presented there for the cluster-based approach can be solved in polynomial time by means of the node-based approach; hence, there are instances of the problem which the latter approach can solve more efficiently. Then we provide a lower bound analysis of this approach for the Euclidean generalized travelling salesperson problem.

Showing lower bounds for the Euclidean travelling salesperson problem has been shown to be quite difficult. Englert et al. (2014) have shown that there are instances of the Euclidean TSP for which finding a local optimal solution takes exponential time by means of a deterministic local search algorithm based on 2-opt. In this article, we present a Euclidean class of instances where a simple evolutionary algorithm using the node-based approach requires exponential time with respect to the number of clusters. To our knowledge currently an exponential lower bound for solving TSP by a stochastic search algorithm is available only for ant colony optimization in the non-Euclidean case (Kötzing et al., 2012). Our instance for the GTSP places nodes on two different circles with radius $r$ and $r'$ of a given centre. Exploiting the geometric properties of this instance class, we show by multiplicative drift analysis (Doerr et al., 2012) that the evolutionary algorithm under investigation ends up in a local optimum which has different chosen nodes for almost all clusters. Leaving such a local optimum requires exponential time for many mutation-based evolutionary algorithms and leads to an exponential lower bound with respect to the number of clusters for the investigated algorithm.

The outline of this article is as follows. Section 2 introduces the problem and the algorithms that are subject to our investigations. Our runtime analysis for local search methods and simple evolutionary algorithms are presented in Sections 3 and 4, respectively. Finally, we finish with some concluding remarks in Section 5.

## 2 Problem and Algorithms

The GTSP is a combinatorial optimization problem with applications in routing, design of ring networks, sequencing of computer files, and manufacture planning (Gutin and Punnen, 2007). The input is given by a complete undirected graph $G=(V,E,c)$ with cost function $c:E\u2192R+$ on the edges and a partitioning of the set of nodes $V$ into $m$ clusters $V1,V2,\u2026,Vm$ such that $V=\u22c3i=1mVi$ and $Vi\u2229Vj=\u2205$ for $i\u2260j$. The aim is to find a tour of minimum costs that contains exactly one node from each cluster.

A candidate solution for this problem consists of two parts. The set of *spanning nodes*, $P=p1,\u2026,pm$ where $pi\u2208Vi$, and the *permutation* of the clusters, $\pi =(\pi 1,\u2026,\pi m)$, which makes a Hamiltonian cycle on $G[P]=G(P,{e\u2208E\u2223e\u2286P},c)$. Here, $G[P]$ is the subgraph induced by $P$ consisting of all nodes in $P$ and all edges between them. Following Hu and Raidl (2008), we represent a candidate solution as $S=(P,\pi )$. Let $p\pi i$ be the chosen node for cluster $V\pi i$, $1\u2264i\u2264m$. Then the cost of a solution $S=(P,\pi )$ is given by $c(S)=c(p\pi m,p\pi 1)+\u2211i=1m-1c(p\pi i,p\pi i+1).$

There are two hierarchical approaches for solving this problem (Hu and Raidl, 2008), the cluster-based approach and the node-based approach. In the former, an upper-level algorithm searches for finding the best permutation of clusters, while a lower-level algorithm finds the optimal spanning node set. In the two levels of node-based approach, these tasks are swapped. In the following, we describe four algorithms that make use of these two hierarchical approaches. We analyse these algorithms with respect to the (expected) number of iterations on the upper level, until they have found an optimal solution and call this the (expected) optimization time of the algorithms.

### 2.1 Cluster-Based Local Search

The cluster-based local search (CBLS) algorithm working with this neighbourhood structure, given in Algorithm 1, starts with an initial permutation of clusters. At each step, a new permutation $\pi '$ is selected from the 2-opt neighbourhood of $\pi $, the current permutation of clusters. Then the lower level uses a shortest path algorithm to find the best spanning node set. Hu and Raidl (2008) have applied an incremental bidirectional shortest path calculation for this purpose. The shortest path algorithm of Karapetyan and Gutin (2012) is another option, which is an improved version of dynamic programming algorithm given in Fischetti et al. (1997) for finding an optimal set of spanning nodes for a given permutation in time $O(n3)$. The new solution $S'=(P',\pi ')$ replaces the old one if it is less costly, and the algorithm terminates if no better solution can be found in the 2-opt neighbourhood of $\pi $.

### 2.2 Node-Based Local Search

Note that the lower level involves solving the classical TSP; it therefore poses in general an NP-hard problem on its own. For our theoretical investigations, we consider two algorithms: NEN-LS (node exchange neighbourhood local search) and NEN-LS*, presented in Algorithms 2 and 3, respectively. NEN-LS computes a permutation on the lower level using 2-opt local search and is therefore not guaranteed to reach an optimal permutation $\pi $ for a given spanning node set $P$. NEN-LS* uses an optimal solver to find an optimal permutation $\pi $ for a given spanning node set $P$. Such a permutation can be obtained in time $O(m22m)$ using dynamic programming (Held and Karp, 1961) and is practical if the number of clusters is small. We use NEN-LS* and show where it gets stuck in local optima even if the travelling salesperson problem on the lower level is solved to optimality.

NEN-LS and NEN-LS* start with a spanning node set $P$ and search for a good or optimal permutation with respect to $P$. Then each solution $P'\u2208N'(P)$ together with its permutation $\pi '$ is considered and $S'=(P',\pi ')$ replaces the current solution $S=(P,\pi )$ if it is of smaller cost. Both algorithms terminate if there is no improvement possible in the neighbourhood $N'(P)$ of the current solution $P$.

### 2.3 Variable Neighbourhood Search

Now we describe the combination of two approaches into variable neighbourhood search, which is introduced in Hu and Raidl (2008). Two neighbourhood structures of CBLS and NEN-LS are used in this algorithm, where the NEN-LS neighbourhood is used only when the algorithm is in a local optimum with respect to the CBLS neighbourhood.

Let $S=(P,\pi )$ be a solution to the GTSP. We define the two neighbourhoods $N1$ and $N2$ based on the 2-opt neighbourhood $N$ and the node exchange neighbourhood $N'$ as

$N1(S)={S'=(P',\pi ')\u2223\pi '\u2208N(\pi ),P'=$ optimal set of nodes with respect to $\pi '}$,

$N2(S)={S'=(P',\pi ')\u2223P'\u2208N'(P),\pi '=$ order of clusters obtained by 2-opt from $\pi $ on $G[P']$}.

Combining the two local searches of the cluster-based approach and the node-based approach is done by alternating between $N1$ and $N2$. Since the computational complexity of finding $P'$ for solutions in neighbourhood $N1$ is lower than that of finding $\pi '$ for solutions in neighbourhood $N2$, the first neighbourhood to search is $N1$. When a local optimum has been found with respect to that neighbourhood, then $N2$ is searched. The resulting variable neighbourhood search (VNS) algorithm is given in Algorithm 4.

### 2.4 Node-Based (1 + 1) EA

In the node-based approach, selecting the spanning nodes is done in the upper level and the corresponding shortest Hamiltonian cycle is found in the lower level. The node-based (1 + 1) EA is presented in Algorithm 5. In contrast to node-based local search algorithm of Section 2.2, the upper level uses the (1 + 1) EA to search for the best spanning set instead of a local search method; hence, more than one change on the spanning set is possible on the upper level, at each iteration of the algorithm. The condition for accepting the new solution is a strict improvement.

Note that the lower level consists of an NP-hard problem; hence, when showing polynomial upper bounds on the expected optimization time of this algorithm, we consider only instances where the lower level can be solved in polynomial time. For the general case, there exist very effective solvers for TSP such as Concorde (Applegate et al., 2013), that can be used in the lower level. Note that the lower level does not need to solve an NP-hard problem in the cluster-based approach. Nevertheless, we prove that there are instances that can be solved in polynomial time with the node-based (1 + 1) EA, while the cluster-based (1 + 1) EA (Corus et al., 2016) needs exponential time to find an optimal solution for them.

## 3 Local Search Methods

### 3.1 Benefits of NEN-LS

In this section, we present an instance of the problem that cannot be solved by CBLS. In contrast to this, NEN-LS finds an optimal solution in polynomial time.

We consider the undirected complete graph, $G1=(V,E)$ which is illustrated in Figure 1. The graph has $n$ nodes and 6 clusters $Vi$, $1\u2264i\u22646$. Cluster $V1$ contains $n/12$ white and $n/12$ grey nodes. We denote by $V1W$ the subset of white nodes and by $V1G$ the subset of grey nodes of cluster $V1$. Each other cluster $Vj$, $2\u2264j\u22646$, consists of $n/6$ white nodes. The node set $V=\u222ai=16Vi$ of $G1$ consists of nodes of all clusters. For simplicity, Figure 1 shows only one node for each group of similar nodes with similar edges in the figure. The edge set $E$ consists of 4 types of edges which we define in the following.

- Type $A$: Edges of this type have a cost of 1. All edges between clusters 2 and 3, and between clusters 4 and 5 and also between clusters 6 and 1, are of this type.$A={{vi,vj}\u2223(vi\u2208V1W\u222aV1G\u2227vj\u2208V6)\u2228(vi\u2208V2\u2227vj\u2208V3)\u2228(vi\u2208V4\u2227vj\u2208V5)}.$
- Type $B$: Edges of this type have a cost of 3. All edges connecting the nodes of clusters 1 to 2 are of this type. So are the edges that connect nodes of clusters 3 to 4 and clusters 5 to 6.$B={{vi,vj}\u2223(vi\u2208V1W\u222aV1G\u2227vj\u2208V2)\u2228(vi\u2208V3\u2227vj\u2208V4)\u2228(vi\u2208V5\u2227vj\u2208V6)}.$
- Type $C$: Edges of this type have a cost of 4. All edges between nodes of clusters 2 and 5 and also between clusters 3 and 6 are of this type. All edges that connect white nodes of the first cluster to nodes of the fourth cluster are also of this type.$C={{vi,vj}\u2223(vi\u2208V1W\u2227vj\u2208V4)\u2228(vi\u2208V2\u2227vj\u2208V5)\u2228(vi\u2208V3\u2227vj\u2208V6)}.$
- Type $D$: Edges of this type have a large cost of 100. All edges other than those of type $A$ or $B$ or $C$ in this complete graph, including the edges between grey nodes of the first cluster and the nodes of the fourth cluster, are of Type $D$.$D=E\u2216{A\u222aB\u222aC}.$

We say that a permutation $\pi =(\pi (1),\u2026,\pi (n))$ visits the cities in consecutive order iff $\pi (i+1)=(\pi (i)modn)+1$, $1\u2264i\u2264n$ and say that $\pi =(\pi (1),\u2026,\pi (n))$ visits the cities in reverse-consecutive order iff $\pi (i)=(\pi (i+1)modn)+1$, $1\u2264i\u2264n$.

We now define a property, and then in Theorem ^{2} we analyse the behaviour of CBLS on $G1$.

For the instance $G1$, each solution visiting the clusters in consecutive or reverse-consecutive order is optimal.

The graph consists of 6 clusters which implies that 6 edges are needed for a tour. The least costly edges are of type $A$, which are available only between 3 pairs of clusters. The second least costly type of edge is $B$ with weights of 3. This implies that no tour can be less costly than $3\xb71+3\xb73=12$, which is the cost of every solution with a permutation in consecutive or reverse-consecutive order.$\u25a1$

Starting with the solution consisting of only white nodes and the permutation $\pi =(1,4,5,2,3,6)$, CBLS is not able to achieve any improvement.

Here we analyse the behaviour of CBLS on $G1$, starting with all white nodes and a permutation $\pi =(1,4,5,2,3,6)$. The initial solution contains three type-$A$ edges of cost 1 and three type-$C$ edges of cost 4. This implies a total cost of 15 which is not optimal. The edges belonging to this tour are marked solid in Figure 1. We claim that this solution is locally optimal; that is, cannot be improved by a 2-opt step.

When a 2-opt move is performed, depending on the different types of edges that are removed from the current tour, we show that the resulting tours have costs greater than 15.

Note that all 3 edges of cost 1 are already used in the current permutation which means that no additional edge of cost 1 can be added. We inspect the different 2-opt steps with respect to the edges that are removed.

If two edges of type $A$ which have cost of 1 are removed, two other edges need to be added and the least costly edges that can be added have a weight of 3. This makes the total cost of the resulting solution to be at least $15-2\xb71+2\xb73=19$ which is greater than 15.

If one edge of type $A$ (weight 1) and one edge of type $C$ (weight 4) are removed, again with the minimum two edges of cost 3 that are added, the total cost is at least $15-1-4+2\xb73=16$ which is greater than 15.

For removing two edges of type $C$, there are three options, listed below. In all of them, the operation adds two edges of type $D$ to the solution, making the total cost greater than 15.

Remove the edge between clusters 1 and 4 and also the edge between clusters 2 and 5. This 2-opt results in permutation $\pi '=(1,5,4,2,3,6)$.

Remove the edge between clusters 1 and 4 and also the edge between clusters 3 and 6. This 2-opt results in permutation $\pi '=(1,3,2,5,4,6)$.

Remove the edge between clusters 2 and 5 and also the edge between clusters 3 and 6. This 2-opt results in permutation $\pi '=(1,4,5,3,2,6)$.

We have shown that no 2-opt step is accepted, which completes the proof.$\u25a1$

In contrast to the negative result for CBLS, we show that NEN-LS is able to reach an optimal solution when starting with the same solution.

Starting with $\pi =(1,4,5,2,3,6)$, NEN-LS finds an optimal solution for the instance $G1$ in expected time $O(n)$.

Starting with a solution with only white nodes and the permutation of $\pi =(1,4,5,2,3,6)$, no improvement can be found by a 2-opt local search (similar to the arguments in the proof of Theorem ^{2}). Therefore, the lower level is already locally optimal and the solution does not change unless a grey node in cluster $V1$ is selected.

Let $P={p1,\u2026,p6}$ be the current set of spanning nodes. Selecting a grey node $p1'$ for cluster $V1$ leads to the set of spanning nodes $P'={p1',p2,\u2026,p6}$. $P'$ in combination with the the current permutation $\pi =(1,4,5,2,3,6)$ has a total cost of 111 as there is one edge of type $D$ with cost 100. We now show that starting from this solution and performing a 2-opt local search on the lower level results in an optimal solution.

In order to accept a new permutation on the lower level a solution of cost at most 111 has to be obtained. We do a case distinction according to the different types of edges that are removed in a 2-opt operation. If we remove only edges of types $A$ and $C$, we reach a solution with a total cost of greater than 111 using the arguments in the proof of Theorem ^{2}. Hence, we need to consider only the case where at least one edge of type $D$ is removed.

There are two possibilities of removing one edge of type $D$ and one edge of type $C$ leading to the permutations $\pi '=(1,5,4,2,3,6)$ and $\pi ''=(1,3,2,5,4,6)$. Both have two edges of type $D$ which implies a total cost of greater than 111 and are therefore rejected.

Considering the case of removing the edge of type $D$ and one of the edges of type $A$, the only applicable 2-opt move leading to a different permutation results in the permutation $\pi '=(1,2,5,4,3,6)$. The resulting solution has cost 16 and is therefore accepted.

After reaching permutation $\pi '=(1,2,5,4,3,6)$, the only acceptable 2-opt move leads to the global optimum $\pi opt=(1,2,3,4,5,6)$.

The 2-opt neighbourhood for this instance has a constant size, as the number of clusters is constant. Moreover, all permutations that were investigated in the lower level were either locally optimal with respect to the spanning nodes, or were improved only twice. Therefore, each lower-level optimization is done in constant time. Furthermore, it takes expected time $O(n)$ on the upper level to select a grey node for the first cluster. As a result, the expected optimization time is bounded by $O(n)$.$\u25a1$

### 3.2 Benefits of CBLS

We now introduce an instance where NEN-LS* with a random initial solution finds it hard to obtain an optimal solution, while CBLS with an arbitrary starting solution obtains an optimum in polynomial time. The instance $G2=(V,E)$ is illustrated in Figure 2. There are $m$ clusters where $m>2$, and all the clusters contain only 2 nodes; one white and one black. We refer to the white and black nodes of cluster $i$, $1\u2264i\u2264m$, by $viW$ and $viB$, respectively. We call cluster $V1$ the costly cluster as edges connecting this cluster to others are more costly than edges connecting other clusters together. The edge set $E$ of this complete graph is partitioned into 4 different types.

- Type $A$: Edges of this type have a weight of 1. All connections between white nodes of different clusters except cluster $V1$ are of this type.$A={{viW,vjW}\u22232\u2264i,j\u2264m}.$
- Type $B$: Edges of this type have a weight of 2. All connections between black nodes of different clusters are of this type.$B={{viB,vjB}\u22231\u2264i,j\u2264m}.$
- Type $C$: Edges of this type have a weight of $m$. All edges between white nodes of the costly cluster and white nodes of other clusters are of this type.$C={{v1W,viW}\u22232\u2264i\u2264m}.$
- Type $D$: Edges of this type have a weight of $m2$. All edges between a white and a black node are of this type.$D=E\u2216{A\u222aB\u222aC}={{viW,vjB}\u22231\u2264i,j\u2264m}.$

We first claim that the optimal solution consists of only black nodes. Then, we bring our main theorems on the runtime behaviour of solving this instance with the two mentioned approaches.

For the graph $G2$ any solution containing all black nodes is optimal.

A solution that contains only black nodes has $m$ edges of type $B$ and therefore total cost of $2m$.

Choosing a combination of black and white nodes implies a connection of type $D$ and therefore a solution of cost at least $m2$. Choosing all white nodes implies 2 edges of cost $m$ connected to cluster $V1$ and $m-2$ edges of cost 1. Hence, the total cost of such a solution is $2m+(m-2)$ which implies that a solution selecting all black nodes is optimal.$\u25a1$

We now show that CBLS always finds an optimal solution due to selecting optimal spanning nodes in time $O(n3)$.

Starting with an arbitrary permutation $\pi $, CBLS finds an optimal solution for $G2$ in time $O(n3)$.

As mentioned in Property 4, visiting black nodes of the graph in any order is a globally optimal solution. For each permutation $\pi $, the optimal set of nodes is given by all black nodes and found when constructing the first spanning node set. This set is constructed in time $O(n3)$ by the shortest path algorithm given in Karapetyan and Gutin (2012).$\u25a1$

In contrast to the positive result for CBLS, NEN-LS* is extremely likely to get stuck in a local optimum if the initial spanning node set is chosen uniformly at random. Note that NEN-LS* is even using an exact solver for the lower level.

Starting with a spanning node set $P$ chosen uniformly at random, NEN-LS* gets stuck in a local optimum of $G2$ with probability $1-e-\Omega (n)$.

Selecting $P={p1,\u2026,pm}$ uniformly at random, the expected number of white nodes is $n2$. Using Chernoff bounds, the number of white nodes is at least $n/4$ with probability $1-e-\Omega (n)$. The same applies to the number of black nodes.

Since connecting white nodes to black nodes is costly, the lower level selects a permutation which forms a chain of white nodes and a chain of black nodes connected to each other by only two edges of type $D$ to form a cycle.

Let $p1$ be the selected node of the costly cluster $V1$. If $p1$ is initially white, the lower level places it at one border between the black chain and the white chain to avoid using one of the edges of type $C$. This situation is illustrated in Figure 3a. If $p1$ is initially black, then the initial solution would look like Figure 3b, in which the costly cluster is placed somewhere in the black chain. Here we present two auxiliary claims, which will be used in the rest of the proof of Theorem ^{6}.

Claim 7:*Starting with a random initial solution, with probability $1-e-\Omega (n)$ for all the clusters $Vi,2\u2264i\u2264m$; a change from black to white is improving while no change from white to black is improving.*

As mentioned earlier, a random initial node set has both kinds of nodes with probability $1-e-\Omega (n)$; therefore, the exact solver of the lower level forms a chain of black nodes and a chain of white nodes. Changing a black node $pi,i\u22601$ to white results in shortening the chain of black nodes by removing an edge of type $B$ and cost 2, while the chain of white nodes gets longer by adding an edge of type $A$ and cost 1. The new solution is hence improved in terms of fitness and accepted by the algorithm. On the other hand, the opposite move increases the cost of the solution; therefore in a cluster $Vi,i\u22601$ a change from white to black is not accepted.

The number of selected white nodes for clusters $Vi,i\u22601$ never decreases; therefore, at all time during the run of the algorithm we have both chains of black nodes and white nodes, until all the black nodes change to white.$\u25a1$

Claim 8:*As long as there is at least one cluster $Vi$, $i\u22601$ for which the black node is selected, a change from white to black is accepted for cluster $V1$ and the opposite change is rejected.*

Since there is at least one cluster $Vi,i\u22601$, for which the black node is selected, we know that the current solution and the new solution both have a chain of black nodes and a chain of white nodes. If the white node of cluster $V1$ is selected in the current solution, changing it to black shortens the chain of white nodes by removing the edge of type $C$ and increases the number of black nodes by adding an edge of type $B$. This move is accepted because the new solution is improved in terms of cost. The result is illustrated in Figure 3b. Using similar arguments, if the black node of cluster $V1$ is selected in the current solution, changing it to white is rejected because it increases the cost.$\u25a1$

Using Claim 7 we can conclude that all nodes $pi,i\u22601$ are gradually changed to white in NEN-LS. As long as at least one node $pi,i\u22601$ is black, a mutation from white to black for $p1$ is accepted, and this node remains black. When all other nodes are changed to white, if $p1$ is black at this point, it is connected to two white nodes with edges of type $D$ and cost $m2$ as illustrated in Figure 4a. If it changes to white, these two edges are removed and two edges of type $C$ and cost $m$ are added to the solution (Figure 4b). This change is accepted because two edges of cost $m$ are less costly than two edges of cost $m2$.

This eventually results in a local optimum with all white nodes selected. The algorithm needs to traverse the clusters on the upper level only twice which gives $O(m)$ iterations on the upper level for the algorithm to get stuck in a local optimum. In the first traversal, for all the clusters the white node will be selected except for the costly cluster, $V1$. In the second traversal, the white node will be selected for $V1$ as well (Figure 4b). This completes the proof of Theorem ^{6}.

### 3.3 Benefits of VNS

In this section we introduce an instance of the problem for which both of the mentioned neighbourhood search algorithms fail to find the optimal solution. Nevertheless, the combination of these approaches as described in Algorithm 4 finds the global optimum.

We consider the undirected complete graph $G3$ shown in Figure 5 which has 6 clusters each containing $n/6$ nodes. There are three kinds of nodes in this graph: white, grey, and black. The first cluster consists of $n/12$ black, $n/24$ white, and $n/24$ grey nodes. All other clusters contain $n/12$ white and $n/12$ black nodes. We refer to the set of white, black, and grey nodes of cluster $Vi$ by $ViW$, $ViB$, and$ViG$, respectively.

There are 5 types of edges in this graph, 4 of which are quite similar to the 4 types of the instance in Section 3.1. The other type, named type $D$ below, includes the edges between two consecutive black nodes with a cost of 1.5.

- Type $A$: Edges of this type have a cost of 1.$A={{vi,vj}\u2223(vi\u2208V1W\u222aV1G\u2227vj\u2208V6W)\u2228(vi\u2208V2W\u2227vj\u2208V3W)\u2228(vi\u2208V4W\u2227vj\u2208V5W)}.$
- Type $B$: Edges of this type have a cost of 3.$B={{vi,vj}\u2223(vi\u2208V1W\u222aV1G\u2227vj\u2208V2W)\u2228(vi\u2208V3W\u2227vj\u2208V4W)\u2228(vi\u2208V5W\u2227vj\u2208V6W)}.$
- Type $C$: Edges of this type have a cost of 4.$C={{vi,vj}\u2223(vi\u2208V1W\u2227vj\u2208V4W)\u2228(vi\u2208V2W\u2227vj\u2208V5W)\u2228(vi\u2208V3W\u2227vj\u2208V6W)}.$
- Type $D$: Edges of this type have a cost of 1.5.$D={{vi,vj}\u2223(vi\u2208VkB\u2227vj\u2208V(k+1)B1\u2264k\u22645)\u2228(vi\u2208V6B\u2227vj\u2208V1B)}.$
- Type $F$: Edges of this type have a large cost of 100. All edges other than those of type $A$ or $B$ or $C$ or $D$ in this complete graph are of Type $F$. Note that the edges between grey nodes of the first cluster and the white nodes of the fourth cluster are also of this type.$F=E\u2216{A\u222aB\u222aC\u222aD}.$

We now show that an optimal solution visits a black node from each cluster in consecutive or reverse-consecutive order. Then in Theorem ^{8}, we show that the algorithms CBLS and NEN-LS may get stuck in local optimums.

The optimal solution for the graph $G3$ is visiting all black nodes with the consecutive or reverse-consecutive order.

There are three kinds of nodes in this graph; white, grey, and black. Any solution that contains black and one other kind of node has at least two edges of type $F$ and weight 100 which makes the total cost of that solution more than 200. A solution that visits all black nodes in consecutive or reverse-consecutive order has 6 edges of type $D$ and a total cost of 9. On the other hand, if we consider only white and grey nodes, our graph is the same as the instance of Section 3.1 with the optimal solution of cost 12. Therefore, visiting all black nodes with the cost of 9 is the optimal solution.$\u25a1$

Starting with a spanning node set $P$ consisting of only white nodes and the permutation $\pi =(1,4,5,$$2,3,6)$, CBLS and NEN-LS get stuck in a local optimum of $G3$.

We first show that the mentioned initial solution is a local optimum for CBLS. The cost of this solution is 15 which is less than any of the edges between black nodes and white or grey nodes. Therefore, any solution consisting of black and another kind of node cannot be accepted. If we do not consider the black nodes and their edges, then $G3$ is similar to $G1$, and according to Theorem ^{2}, starting with the initial permutation, no improvements can be achieved with Algorithm 1. Particularly, permutation $\pi '=(1,2,3,4,5,6)$ is not achievable by searching the 2-opt neighbourhood of the initial solution. A solution consisting of black nodes is less costly only if they are visited in the optimal order of $\pi '=(1,2,3,4,5,6)$ which is proved not to be achievable by CBLS.

Now we investigate the behaviour of NEN-LS which performs a local search based on the node-based approach for this instance. We show that this algorithm finds another locally optimal solution. Starting with the initial solution that is specified in the theorem, all black nodes cannot be selected in one step and trying any one of the black nodes is rejected because using two edges of type $F$ are inevitable which makes the solution worse than the initial solution. The only spanning node set left in the NEN has the grey node of the first cluster. For this selection of nodes, the 2-opt TSP solver of the lower level finds the optimal order of clusters similar to what we described in the proof of Theorem ^{3} of Section 3.1 which form a solution of cost 12. From this point any node-exchange-neighbourhood search fails to find a better solution.$\u25a1$

Using a variable-neighbourhood search that combines the two hierarchical approaches, we are able to escape these local optima. In the following, we show that VNS obtains an optimal solution when starting with the same solution as investigated in Theorem ^{8}.

Starting with a spanning node set $P$ consisting only of white nodes and the $\pi =(1,4,5,2,3,6)$, VNS obtains an optimal solution in time $O(n3)$.

This approach is supposed to start with the cluster-based algorithm and alternate between the two algorithms whenever CBLS is stuck in a locally optimal solution. As we saw from the initial solution, Algorithm 1 cannot find any better solutions because the initial solution is a local optimum for that algorithm. Finding this out requires searching all the 2-opt neighbourhood which can be done in constant time because the number of clusters is fixed. Then NEN-LS manages to find another solution with the permutation of $\pi '=(1,2,3,4,5,6)$. This can also be done in polynomial time as we described in Theorem ^{3} of Section 3.1. Then CBLS uses this as a starting solution. As $\pi '=(1,2,3,4,5,6)$ is an optimal permutation the optimal set of nodes $P$ consisting of all black nodes is found in time $O(n3)$ on the lower level.$\u25a1$

The investigations of this section have pointed out that a combination of the two hierarchical approaches into a variable neighbourhood search is beneficial because each approach helps escape local optimum of the other approach.

## 4 Simple Evolutionary Algorithms

A simple evolutionary algorithm with the cluster-based approach for solving GTSP has been studied in Corus et al. (2016) and a hard instance is presented there to prove the exponential lower bound on the runtime of that algorithm which holds with high probability. In this section, we analyse the behaviour of node-based (1 + 1) EA presented in Algorithm 5 on that instance (Section 4.1). Moreover, we find a lower bound for optimisation time of node-based (1 + 1) EA in Section 4.2. Our analysis gives an exponential lower bound on the optimization time of the upper level, which implies exponential time even if the lower level is solved efficiently.

### 4.1 Behaviour of Node-Based (1 + 1) EA on the Hard Instance of Cluster-Based (1 + 1) EA

In this section, we show that the hard instance for cluster-based (1 + 1) EA introduced in Corus et al. (2016) can be solved in polynomial time by the node-based approach. Moreover, we perform experiments in Section 4.1.2, which confirm the theoretical results of this section.

The hard instance of cluster-based (1 + 1) EA (Corus et al., 2016) is illustrated in Figure 6. In this instance, there are $m$ clusters and all of them comprise two nodes; a white node which represents the suboptimal node, and a black node which is the optimal node. All the white nodes are connected to each other with edges of cost 1, except for the white nodes of consecutive clusters (shown in the picture) which are connected with edges of cost 2. All the edges between a black node and a white node have a cost of $m2$. All edges between black nodes also have a cost of $m2$, except the ones that connect consecutive clusters (shown in the picture) which have a small cost of $1m$.

The optimal node selection is to select all the black nodes and the optimal permutation of clusters is a clockwise or anti-clockwise order of them. The cost of edges between black nodes and white nodes in this permutation are $1m$ and 2, respectively. Therefore, the optimal solution will consist of all $1m$ edges and it is shown that the local optimum is selecting all white nodes in an order which does not have any of the 2-weighted edges. For this instance of the problem, it is proved in Corus et al. (2016) that with an overwhelmingly high probability, the proposed cluster-based (1 + 1) EA needs exponential time to find the optimal solution.

#### 4.1.1 Theoretical Analysis of Node-Based (1 + 1) EA on $GG$

Here we prove that, with probability $1-o(1)$, $GG$ can be solved in polynomial time by the node-based approach. We call this probability a high probability, since by definition, $o(1)$ approaches 0 when the input size approaches infinity. In order to prove this, we first need to analyse how an optimal TSP tour can be found on the lower level of this approach. Although solving TSP in general is NP-hard, it can be solved in polynomial time for the instances induced by picking one node of each cluster of the graph $GG$. Algorithm 6 provides such a method. In Step 3 of this algorithm, if the number of white nodes is at most 3, finding the shortest path can be done by checking all configurations. If the number of white nodes is more than 3, only edges of cost 1 will be used in the shortest path since all white nodes are connected to $m-2$ other white nodes with a cost of 1. Finding this can be done by a depth-first-search and checking all configurations of connecting the last 4 nodes of the path. Therefore, Step 3 needs time $O(m)$ to find the shortest path on white nodes. Since the required time for other steps of the algorithm is also at most $O(m)$, we can conclude that Algorithm 6 runs in time $O(m)$.

To prove that Algorithm 6 finds the optimal tour with respect to the spanning set fixed on the upper level, we first present two properties on the solutions of the lower level. Then, in Lemma ^{12} we show that Algorithm 6 finds the optimal tour.

Let $w$ be the number of white nodes selected on the upper level. If $2\u2264w\u22643$ and all the selected white nodes are from consecutive clusters; then Step 3 of Algorithm 6 uses one edge of cost 2 (and one edge of cost 1 in case $w=3$). Otherwise, it uses only edges of cost 1.

Let $C(S)$ denote the total cost of a solution $S$. Also, let $Y$ and $X$ be two solutions with $r$ and $s$ edges of weight $m2$, respectively. If $r>s$, then we have $C(Y)>C(X)$.

Let $w>0$ and $r=m-w$ be the number of white and black nodes selected on the upper level, respectively. Moreover, let $s$ be the number of black nodes where the selected node in proceeding cluster with respect to the optimal solution is also black. Algorithm 6 finds an optimal tour with total cost of

There are $r$ black nodes in the spanning set; therefore, in order to make a Hamiltonian cycle, at least $r+1$ edges that are connected to these nodes are required. Since all edges connected to black nodes, except for $s$ edges of cost $1/m$ are of cost $m2$, at least $r+1-s$ edges of cost $m2$ are needed, and refusing to select any of edges of cost $1/m$ increases this number. Moreover, according to Property 13, the optimal solution of the lower level has a minimum number of $m2$-edges. Therefore, the lower level has to select all edges of cost $1/m$, which is done in Step 2 of the algorithm.

On the other hand, in order to minimise the number of white-black connections which are of cost $m2$, all white nodes need to form one chain which is done is Step 3 of the algorithm. This chain will be connected to two black nodes from its two ends. If conditions of Property 12 do not hold, then only edges of cost 1 are used in forming the white chain. Otherwise, one edge of cost 2 is also required. Therefore, the cost of forming the white chain is $m-r-1$ in the former case, and $m-r$ in the latter case.

So far, we have formed some chains of black nodes and one chain of white nodes. In order to connect these chains together, we have to use $r+1-s$ edges of weight $m2$, which is done in Step 4 of the algorithm. Summing up, the optimal tour on the selected set of nodes consists of $s$ edges of weight $1m$ and $r-s+1$ edges of weight $m2$. Furthermore, if conditions of Property 12 hold, it contains $m-r-2$ edges of cost 1 and one edge of cost 2; otherwise, it contains $m-r-1$ edges of cost 1. All together, these edges give the total cost as stated in the lemma.$\u25a1$

From now on, we consider only the number of iterations on the upper level. Note that the lower level uses Algorithm 6 which adds only a factor of $O(m)$ to our analysis. We start analysing the behaviour of node-based (1 + 1) EA on $GG$ with a couple of definitions that help us in describing a TSP tour that the lower level forms. In the following, $w$ denotes the number of white nodes in the solution.

A **black block of size $l$**, $l>0$ an integer, is a path on exactly $l$ consecutive black nodes, which consists of $l-1$ edges of cost $1/m$.

The two end nodes of a black block are connected to edges of cost $m2$. Black blocks of size 1, 2, and 3 nodes are illustrated in Figure 7.

A solution is critical if $3\u2264w\u22644$ and all the selected white nodes are from consecutive clusters.

Note that a one-bit flip on a white node of a critical solution results in either a solution with a greater number of black blocks, or a solution that fulfils conditions of Property 12. In the rest of this section, we prove that with high probability, in time $O(m2)$ the algorithm either finds the optimal solution, or reaches a critical solution. From a critical solution, we prove that a 2-bit flip can make an improvement, and with high probability, in time $O(m2logm)$ the optimal solution is found. Lemmata ^{17} and ^{20} prove the upper bound if we do not face a critical solution, and Lemma ^{21} investigates the behaviour of the algorithm, otherwise. Lemmata ^{15} and ^{16} help us with the proof of Lemma ^{17}.

Other than a situation where $w=3$ and all selected white nodes are from consecutive clusters, $w$ can only increase in a step in which the number of $m2$-edges decreases.

Having $r=m-w$ black nodes, Lemma ^{12} gives the total cost of a solution as $s\xb71m+(m-r-1)+(r-s+1)\xb7m2$ when conditions of Property 12 do not hold. Here $s$ is the number of edges of weight $1m$, $m-r-1$ is the number of edges of weight 1 (that connect white nodes), and $(r-s+1)$ is the number of edges of weight $m2$. When $w$ increases, the number of edges of weight 1 increases, and since the total number of edges stay the same, either $s$ has to decrease or $r-s+1$. Decreasing $s$ cannot compensate the increase in the total cost that is caused by adding new edges of weight 1. Therefore, in order to prevent an increase in the total cost, $r-s+1$, which is the number of $m2$-edges, has to decrease.

For the situation where $w=2$ and the selected white nodes are from consecutive clusters, according to Lemma ^{12} the total cost is $m-3m+2+2\xb7m2$. Observe that all solutions with $w\u22654$ have a larger cost. For $w=3$, the solution either needs more than 2 $m2$-edges, which is clearly more costly, or all three white nodes need to be from consecutive clusters. In this situation, Lemma ^{12} gives the total cost as $m-4m+3+2\xb7m2$ which is also more costly and rejected by the algorithm.$\u25a1$

In a phase of $Cm2$ steps, $C$ a constant, if we do not face a critical solution, with probability $1-o(1)$, the sum of all increments on the number of white nodes is at most $5m$.

From Lemma ^{15} we know that the number of white nodes can increase only when the number of blocks reduces. Since the number of blocks is bounded by $m$, this can happen in at most only $m$ steps. At each of those steps, either two black blocks are merged or a black block mutates to white, and some additional nodes may also mutate. We here prove that with high probability no blocks of size larger than 3 mutate in this phase, which results in at most $3m$ white to black mutations. We also prove that the number of additional nodes that mutate at the same steps is with high probability bounded by $2m$. Therefore, we find that in this phase, the sum of all increments on the number of white nodes is at most $5m$.

At each step, each cluster is selected for a mutation with probability $1m$, and its white node is selected with probability $12$. Therefore, the probability that a block of size at least 4 mutates to white in one step is at most $1(2m)4$. Since the number of blocks is bounded by $m$, the probability that at least one block of size at least 4 mutates to white at one step is at most $116m3$. Hence, the probability that at least one of them mutates in a phase of $C\xb7m2$ steps is $O(1m)$. Therefore, with probability at least $1-o(1)$, no black block of size 4 or more mutates to white. In other words, all blocks that mutate to white in a phase of $C\xb7m2$ steps are of size at most 3. This implies that at most $3m$ nodes can belong to the blocks that mutate from black to white in a phase of $C\xb7m2$ steps.

However, at each step that the number of blocks is reduced, some additional nodes may also mutate to white. Let $Xij$ be a random variable such that $Xij=1$ if node $j$ is selected for mutation at step $i$. Note that we need to consider only the steps in which the number of black blocks is reduced because according to Lemma ^{15} a mutation from black to white is not accepted in other steps. Since the number of blocks is bounded by $m$, there are at most $m$ steps in which the number of blocks reduce. The expected value of $X=\u2211i=1m\u2211j=1mXij$ is $E[X]=\u2211i=1m\u2211j=1m1m=m$ and by Chernoff bounds we get $Prob(X\u22652m)\u2264e-\Omega (m)$. Therefore, with probability $1-e-\Omega (m)$ at most $2m$ additional nodes mutate during the steps at which the number of black blocks is reduced, and with probability $(1-o(1))(1-e-\Omega (m))=1-o(1)$ at most $2m$ additional nodes mutate during the considered phase. As a result, together with at most $3m$ black to white mutations, we find that with probability $1-o(1)$ at most $3m+2m$ black nodes mutate to white in a phase of $C\xb7m2$ steps.$\u25a1$

If we do not face a solution with no black nodes or a critical solution, then with probability $1-o(1)$, in time $24em2$ a solution with $w=0$ is found.

According to Lemma ^{16}, with probability $1-o(1)$ during a phase of $C\xb7m2$ steps, $C$ a constant, at most $5m$ black nodes turn into white. Since the number of white nodes in the initial solution is at most $m$, at most $6m$ steps of increasing the number of black nodes is sufficient for reaching a situation with $w=0$.

While the number of black nodes is at least one and we have not reached $w=0$ or a critical solution yet, there is always at least one white node that if it mutates to black, the length of a black block increases. This move is accepted by the algorithm, because it shortens the white path by removing an edge of cost 1, while it adds one edge of cost $1m$ to the black block. At each step, the node of each cluster is mutated to white with probability $12m$. Therefore, the probability that only the mentioned mutation happens at one step is at least $12\xb7m\xb71-1mm-1\u226512em$, where $1-1mm-1$ is the probability that no other mutations happen at that step.

Let $X=\u2211i=1TXi$, where $Xi$ is a random variable such that $Xi=1$ if only a single white node is mutated into black while other nodes are not changed at Step $i$, and $Xi=0$ otherwise. At each Step $i$, before reaching $w=0$ or a critical solution, $Prob(Xi=1)\u226512em$. Considering a phase of $T=24em2$ steps, by linearity of expectation we get $EX\u226524em2\xb712em=12m.$ Using Chernoff bounds we get $ProbX\u2264(1-12)12m\u2264e-\Omega (m).$ As a result, in a phase of $24em2$ steps, we either find a solution with $w=0$, or with probability $1-e-\Omega (m)$ at least $6m$ white nodes mutate to black, which results in a situation with $w=0$ because $6m$ is an upper bound on the number of white to black mutations. Overall with probability $1-o(1)$, the solution with $w=0$ is reached in time $24em2$.$\u25a1$

The initial solution, chosen uniformly at random, has at least $m48$ single black nodes with probability $1-e-\Omega (m)$

Considering the consecutive clusters with respect to their optimal permutation, for any specific cluster, a black (or white) node may be selected for its following cluster with a probability of $1/2$. As a result, any selection of nodes in 3 consecutive clusters can happen with probability $(1/2)3$. There are at least $m/3$ separate sets of consecutive clusters; therefore, the expected number of single black nodes is at least $m3\xb78$. Using Chernoff bounds and considering $X$ to be the number of single black nodes in the initial solution, we have: $P(X<(1-1/2)m3\xb78)\u2264e-m3\xb78\xb718$.

As a result, with a probability $1-e-\Omega (m)$ the initial solution has at least $m48$ single black nodes as described.$\u25a1$

In the proof of the next lemma, we use the Simplified Drift Theorem (Theorem ^{19}) presented in Oliveto and Witt (2011, 2012). Consider a random variable $Xt$, $t\u22650$ with positive values that is changed in a stochastic process. Also consider an interval of $[a,b],a\u22650$. The simplified drift theorem shows that the lower limit of the interval is not reached by $Xt$ with high probability, if the starting point is above $b$, the average drift of the value of the random variable is positive, and the probability of having big changes on it is small. In this theorem, $Ft$ denotes a filtration on states. In the proof of Lemma ^{20}, we analyse the changes on the size of a large black block, and the filtration is done according to the steps where an accepted change happens on the size of that block.

(Simplified Drift Theorem, Oliveto and Witt, 2012). Let $Xt$, $t\u22650$, be real-valued random variables describing a stochastic process over some state space. Suppose there exist an interval $[a,b]\u2286R$, two constants $\delta ,\u025b>0$ and, possibly depending on $l:=b-a$, a function $r(l)$ satisfying $1\u2264r(l)=o(l/log(l))$ such that for all $t\u22650$ the following two conditions hold:

$E[Xt+1-Xt\u2223Ft\u2227a<Xt<b]\u2265\u025b$,

$Prob(|Xt+1-Xt|\u2265j\u2223Ft\u2227a<Xt)\u2264r(l)(1+\delta )j$ for $j\u2208N$.

Then there is a constant $c*>0$ such that for $T*:=min{t\u22650:Xt\u2264a|Ft\u2227X0\u2265b}$ it holds $Prob(T*\u22642c*l/r(l))=2-\Omega (l/r(l))$.

With probability $1-o(1)$, the number of black nodes is at least one during $24e\xb7m2$ steps of the node-based (1 + 1) EA.

Let $r$ be the number of all black blocks in the solution. From Lemma ^{18} we know that with high probability, the initial solution consists of at least $m48$ single black nodes. As a result, in the initial solution $r=\Omega (m)$.

In order to reach a solution in which all nodes are white, the number of black blocks needs to reduce. Let's consider the step when for the first time $r\u2264m\epsilon $, where $0<\epsilon <1$ is a small constant. At this step, $r\u2265m\epsilon 2$; otherwise, at least $m\epsilon 2$ mutations have to had happened at one step which is exponentially unlikely.

Now we show that in a phase of $m1+\epsilon $ we reach this stage. Since each single black node has a probability of $P1++P1-\u22651em$ to change at each step, the expected number of steps that is required to make a change on each single black node is at most $em$. Therefore, by Markov's inequality we know that the probability of not changing each single black node in a phase of $2em$ is at most $12$. Considering a phase of $m1+\epsilon $ steps, we see that the probability of not changing each single black nodes is at most $e-\Omega (m\epsilon )$. There are at most $m\epsilon $ single black nodes, and by union bound we can see that with probability at most $m\epsilon \xb7e-\Omega (m\epsilon )=e-\Omega (m\epsilon )$ at least one of them does not change. Therefore, with probability $1-e-\Omega (m\epsilon )$ all these nodes face a change in the mentioned phase.

Now consider a phase of $m32$ steps. With probability at most $4m\epsilon 2m2\xb7m32=2m\epsilon m$ the size of the large block is decreased at least once. Therefore, with probability $1-O(m\epsilon -1/2)=1-o(1)$ its size is not decreased in the mentioned phase.

On the other hand, there is a probability of at least $Pl+\u22651e\xb7m$ at each step, that the size of the block is increased. Let $Xi$ be a random variable such that $Xi=1$ if the size of large block is increased at Step $i$, and $Xi=0$ otherwise. The expected number of increases in the size of that block in a phase of $m32$ steps is $\u2211i=0m3/2Xi\u2265me$. Moreover, by Chernoff bounds we have $\u2211i=0m3/2Xi\u2265m2e$ with probability at least $1-e-\Omega (m)$, which means with probability $1-o(1)$, the size of the large block is at least $m2e$ after a phase of $m32$ steps.

^{19}. Let $t0$ be the first step after the previous phase has finished and let $L$ be the largest block at that time. We define $Xt$, $t\u22650$, as

Note that $Xt$ always represents a lower bound on the size of $L$ at time $t+t0$. We filter the steps and consider only the relevant steps, that is, the steps in which a change happens on the size of $L$. Moreover, we set $a=X02$, $b=X0$, $r=1$, $\u025b=14e$ and $\delta =1$.

From a critical solution, with probability $1-o(1)$, the optimal solution is reached in time $O(m2+logm)$.

The cost of a critical solution with $w=3$ is $m-4m+3+2\xb7m2$, and it can be observed from Lemma ^{15} that all solutions with $w\u22655$ have a larger cost and cannot replace this solution. Therefore, only a solution with $w\u22642$ or a critical solution with $w=4$ can replace this solution which can be obtained by a 1-bit flip.

From a critical solution with $w=4$, the number of white nodes does not increase, because there are only two $m2$-edges which connect black and white chains in this situation, and according to Lemma ^{15}, in order to increase the number of white nodes, all black nodes need to mutate to white at one step, which is exponentially unlikely. Moreover, a noncritical solution with the same number of white nodes is not accepted after a critical solution either because if the selected white nodes are not from consecutive clusters, more than two $m2$-edges are required in the tour.

Here we show that there exists a 2-bit flip in a critical solution that reduces the number of white nodes by two, and results in a solution with one chain of $n-2$ black nodes and one chain of 2 white nodes. From that solution, similar to our argument in the previous paragraph, increasing the number of white nodes is exponentially unlikely, and according to Lemma ^{17}, with probability $1-o(1)$, the optimal solution is found in time $O(m2)$.

From Lemma ^{12} we know that the cost of a critical solution is $(m-5)\xb71m+3+2\xb7m2$. By flipping two white nodes of one end of the white chain, conditions of Property 12 hold and the cost of the new solution is $(m-3)\xb71m+2+2\xb7m2$, which is better than the cost of the critical solution with respect to the fitness function. Therefore, this solution is accepted by the algorithm. This move has a probability of $1m2$. Therefore, the expected time until it happens is $m2$, and with probability at least $12$ it happens in $2\xb7m2$ steps. Considering $logm$ phases of $2\xb7m2$ steps, by Markov's inequality we get that with probability $1-(12)logm=1-1m$, this 2-bit flip happens in time $O(m2logm)$, which completes the proof.$\u25a1$

Starting from an initial solution chosen uniformly at random, the node-based (1 + 1) EA finds the optimal solution of $GG$ in time $O(m2logm)$ with probability $1-o(1)$.

Lemma ^{20} shows that in a phase of $c\xb7m2$, $c=24e$, steps, the number of black nodes does not decrease to 0 with probability $1-o(1)$. Therefore, due to Lemma ^{17}, if we do not face a critical solution, the optimal solution is found in time $O(m2)$ with probability $1-o(1)$. Moreover, if we face a critical solution, according to Lemma ^{21}, with probability $1-o(1)$ it takes $O(m2logm)$ additional steps to find the optimal solution. Overall, with probability $1-o(1)$, in time $O(m2logm)$, the optimal solution is found by the node-based (1 + 1) EA.$\u25a1$

#### 4.1.2 Experimental Results

In this section, we present experimental results that confirm our theoretical analysis on the behaviour of nodes-based (1 + 1) EA optimizing $GG$. We have run the algorithm for instances of different sizes, and we have done that 30 times with a maximum of $107$ iterations for each instance. The results are summarised in Table 1. The first and second columns indicate the input size and the percentage of runs that result in the optimal solution. The average and the maximum number of iterations until finding this solution are presented in the third and fourth columns. The observed maximum runtime is consistent with the asymptotic bound that we found in our theoretical analysis.

Input Size ($m$) . | $%$Optimum . | Average Runtime . | Maximum Runtime . |
---|---|---|---|

20 | 100 | 495 | 3954 |

50 | 100 | 1873 | 10800 |

100 | 100 | 4076 | 19252 |

200 | 100 | 27817 | 190882 |

500 | 100 | 33280 | 280873 |

1000 | 100 | 110186 | 2518061 |

Input Size ($m$) . | $%$Optimum . | Average Runtime . | Maximum Runtime . |
---|---|---|---|

20 | 100 | 495 | 3954 |

50 | 100 | 1873 | 10800 |

100 | 100 | 4076 | 19252 |

200 | 100 | 27817 | 190882 |

500 | 100 | 33280 | 280873 |

1000 | 100 | 110186 | 2518061 |

In Figure 8 we also include a graphical plot comparing the observed running times to a quadratic curve, more precisely $10n2$. This plot suggests that the $O(m2)$ bound for the worst-case running time of this algorithm on this problem may not be tight.

### 4.2 Lower-Bound Analysis for Node-Based (1 + 1) EA

In this section, we prove an exponential lower bound on the optimization time of node-based (1 + 1) EA. In Section 4.2.1, we introduce an instance of the Euclidean GTSP, $GS$, that is difficult to solve by means of our algorithm and discuss some geometric properties of it. In Section 4.2.2, we show how the algorithm reaches a local optimum in our instance and discuss how it can reach the global optimum after reaching the local optimum. Consequently, we find a lower-bound for the optimization time of the algorithm. In Section 4.2.3, using an efficient algorithm that solves the lower-level problems of this instance, we present some experimental results that confirm the obtained lower bound.

#### 4.2.1 A Hard Instance and Its Geometric Properties

The hard instance presented in this section, which is partly illustrated in Figure 9, is composed of $m$ clusters. Let $a>1$ be a constant. Only $ma$ of these clusters have one node. Other clusters contain $m$ nodes which makes the total number of nodes $n=m(m-ma)+ma$. All nodes are connected to each other and the cost of travelling between them is their Euclidean distance.

In the clusters that have $m$ nodes, $m-1$ nodes are placed on the small circle and are shown by a star in the figure. We refer to them as white nodes or inner nodes. For simplicity we assume that the inner nodes of each cluster all lie on the same position. The same result can be obtained by placing the nodes within a small circle having an arbitrarily small radius $\epsilon $. The remaining node of each cluster, shown black in the figure, is placed on the larger circle. Other $ma$ clusters do not have any nodes on the small circle and have only one black node on the larger circle. The figure demonstrates how the clusters are distributed on the two circles. The arc between black nodes of two consecutive clusters subtend an angle of $2\pi m$, while the arc between two consecutive one-node clusters subtend an angle of $a\xb72\pi m$.

If we represent the radius of inner and outer circles by $r$ and $r'$, respectively, then a black node and a white node have a distance of at least $r'-r$ and the length of edges between two adjacent black nodes is $2r'sin(\pi m)$. The minimum length of edges between two black nodes of one-node clusters is also quite similar to previous formula with a greater angle: $2r'sin(\pi ma)$.

We now prove that if the introduced instance has this characteristic, then for $m\u22658a$, the best tour on any spanning set that has at least one white node, contains only 2 edges between outer and inner circles. We also show that the optimal solution consists of all the black nodes, but with high probability, the node-based (1 + 1) EA reaches a plateau of local optimums with $ma$ black nodes and $m-ma$ white nodes. Note that in such local optimums, selecting $ma$ black nodes is a must, since there's no other choice for those clusters. The local optimum in this instance is a basin of attraction since the distance between white nodes are smaller than the distance between the black nodes.

In the proof of the following property, we have used a couple of theorems from Quintas and Supnick (1965). Given a set of vertices, it is stated in Theorem 1 of that paper that the shortest spherical or planar polygon does not intersect itself. Moreover, Theorem ^{2} of that paper proves that the shortest polygon contains the vertices on the boundary of its convex hull in their cyclic order.

The best tour on a spanning set that has at least one white node, contains only two edges between nodes on the inner and outer circle for $m\u22658a$.

We first take into account the tour on a node set consisting of only the black nodes of one-node clusters. There is no other choice except selecting those nodes because their clusters have no other node. For such a node set, due to Theorem ^{2} of Quintas and Supnick (1965), the optimal tour is to visit all the nodes in the order they appear on the convex hull. This order will be respected in an optimal tour even if there are some inner nodes to visit as well, because according to Theorem 1 in Quintas and Supnick (1965) the optimal solution cannot intersect itself. In other words, if some white nodes are selected in the upper level, while visiting the outer nodes with respect to their convex hull order, a solution occasionally travels the distance between outer circle and inner circle to visit some inner nodes, and then travels roughly the same distance back to the outer circle, to continue visiting the remaining outer nodes. As illustrated in Figure 10, this can be done generally in two ways:

Case 1: Leaving the outer circle only once and visiting all inner nodes together.

Case 2: Leaving the outer circle more than once and visiting some of the inner nodes each time.

We now show that there exists a solution for Case 1 that is less costly than all the solutions of Case 2. As a result, the best tour on a spanning set with at least one white node travels the distance between two circles only twice.

The last part of this formula is two times of the length of edge $AB$ which is a direct line from the inner circle to the outer circle along their radius. The lengths of edges $A'B'$ and $A''B''$ are actually more than that because their ends are not from same clusters. Nevertheless, Formula 2 presents an upper bound of the total cost of the tour because we are considering the complete circumference of both circles. In other words, the distance between $A$ and $A'$ is included in the circumference of the large circle and the distance between $B$ and $B'$ is included in that of the small circle and according to quadrilateral inequality $|A'B'|<|A'A|+|AB|+|BB'|$.

In Formula 3, $2sin(\pi m/a)r'$ is the length of the edges connecting two consecutive clusters with one black node. These edges are the longest edges that can be removed from the tour when we add two edges connecting inner and outer circles. There are initially at least $m/a$ of these edges and in this formula we have omitted $k$ of them from the tour.

The latest inequality holds, because the constraint we introduced on the value of $r$ in Equation 1 is quite tight and we can see that for $m\u22658$ it gives us $r<0.03r'$ which is a tighter bound for $r$ than what the right side of the last equation gives us.$\u25a1$

An optimal solution chooses all black nodes and visits them in clockwise or anti-clockwise order when $m\u22657a$.

Let $P$ and $P'$ be nonoptimal spanning sets and $Pout\u2282P$ and $Pout'\u2282P'$ be their subset of outer nodes. Moreover, let $S$ and $S'$ be optimal solutions with respect to $P$ and $P'$, respectively. For $m\u22658a$, if $Pout\u2282Pout'$ and $|Pout'|=|Pout|+1$ then $C(S)<C(S')$.

The main idea behind this property is that distances in the inner circle are significantly shorter than distances in the large circle and if $r$ is sufficiently smaller than $r'$ (Inequality 1), any single mutation that replaces an inner node with the outer node of the same cluster, increases the cost of the whole tour.

According to Property 25, the permutation chosen in the lower level has all the inner nodes listed between two black nodes. If one inner node is removed and one outer node is added, the part of total tour that includes all inner nodes gets shorter and the part that connects black nodes gets longer. The edges connecting the inner nodes are at most the size of the diameter of the inner circle. Therefore, the maximum decrease for removing an inner node is upper bounded by $4r$. In the following, we find the minimum increase for adding a black node.

The second case is when the new node is added just before or after visiting the inner nodes, as illustrated in the right side of Figure 11. In this case, comparing to the previous case, edge $B$ is longer and the angle between $A$ and $B$ is closer to the right angle. The minimum length of $A$ is also the same as the minimum length of that in previous case. Altogether, with quite similar explanation to what we had for Case 1, the minimum increase in this case is larger than that in Case 1. Therefore the minimum increase in the convex tour, $d$, which is found in 5 is also less than the minimum increase in Case 2 and can be used for both cases.

#### 4.2.2 Runtime Analysis

In this section, we give a lower bound on the runtime of node-based (1 + 1) EA. We start by presenting a lemma about the initial solution that is chosen uniformly at random. Then we introduce the Multiplicative Drift Theorem (Doerr et al., 2012) which is used in our analysis of Lemma ^{28} to upper bound the time of reaching a locally optimal solution. Then we discuss the main theorem of this section.

The initial solution with a spanning set that is chosen uniformly at random, has at least $0.91-1a(m-1)$ white nodes with probability $1-e-\Omega (m)$.

Starting with an initial solution chosen uniformly at random, with probability $1-e-\Omega (m)$, the node-based (1 + 1) EA reaches a local optimum on $GS$ in expected time of $O(mlnm)$.

For a solution $x(t)$ at time $t$, we define $X(t)$ to be the number of $m$-node clusters for which the outer node is selected. Note that this function, as required in Theorem ^{27}, maps the local optimum to zero and all other solutions to positive numbers.

If we assume that the number of $m$-node clusters that their outer node is chosen in solution $x(t)$ is $k$, we can find the expected number of that for $x(t+1)$ as follows.

In the above formula, $(km)(m-1m)$ is the probability of one node to change from outer circle to inner circle. Then we have the number of different ways we can select two clusters that their selected nodes lies on the inner circle. The first $(1m)2$ is the probability that the two selected clusters mutate and the second $(1m)2$ is the probability that after mutation the node on the outer circle is selected in those two clusters.

^{26}we know that with probability $1-e-\Omega (m)$, the initial solution has less than $0.1m$ black nodes other than the fixed black nodes.$\u25a1$

Starting with an initial solution chosen uniformly at random, if $m\u22658a$, then the optimization time of the node-based (1 + 1) EA presented in Algorithm 5 on $GS$ is $\Omega (n2)m-ma$ with probability $1-e-\Omega (m\delta )$, $\delta >0$.

In order to prove this theorem, we introduce a phase $P$ in which

The algorithm reaches a local optimum with high probability

The algorithm does not reach the global optimum with high probability

Then we show that after this phase, only a direct jump from the local optimum to the global optimum will help the algorithm improve the results, the probability of which is $1m2m-ma$.

^{28}, the expected time of node-based (1 + 1) EA to reach the local optimum is $O(mlnm)$. Let $c$ be the appropriate constant, so that $c\xb7mlnm$ is an upper bound on the expected time for reaching that local optimum. Now consider a phase of $2c\xb7mlnm$ steps. If $T$ is the actual time at which the local optimum is reached, by Markov's inequality we have: $Prob(T>2c\xb7mlnm)\u226412$. If we repeat this phase for $m\u025blnm$ times, $\u025b>0$ a constant, then we get a phase of $P=2c\xb7m1+\u025b$ steps in which the probability of not reaching the local optimum is:

^{26}we know that with high probability the initial solution has not too many black nodes other than the fixed black nodes. Here we show that with high probability the number of these nodes does not increase significantly during the phase $P$; hence, the global optimum will not be reached. The probability of selecting each of the clusters for a mutation is $1/m$ and for clusters with $m$ nodes, the probability of changing the selected node to the black node is $1m$; therefore, at each step, the probability that each cluster's node is changed from one of its inner nodes to its outer node is $1m2$. For $m-ma$ clusters, at each step the expected number of clusters that face such a mutation is at most $1m$ and in a phase of $2cm1+\u025b$ steps, is $2cm\u025b$. If we define $X$ as the number of clusters that will have a mutation like this, then by Chernoff bound we have

^{26}we know that with probability $1-e-\Omega (m)$, the initial solution has at least $0.9(1-1a)(m-1)$ white nodes. Hence, with probability $e-\Omega (m\u025b)$, the algorithm will not reach a state with less than $0.9(1-1a)(m-1)-3cm\u025b$ white nodes during phase $P$. As a result, the probability of having a direct jump to the global optimum in phase $P$ is at most

Consequently, with high probability, the global optimum will not be reached during phase $P$. According to Property 27, no mutation from the inner circle to the outer circle can decrease the tour cost when the resulting solution is not the optimal solution. Hence, such a change may be accepted by the algorithm only when another mutation on the other direction happens at the same step. At the local optimum, there is no black node other than the fixed black nodes and no mutation from the outer circle to the inner circle can happen; therefore, a mutation from the inner circle to the outer circle cannot happen either. As a result, after reaching a local optimum, only a direct jump to the global optimum can help moving towards the global optimum and the probability of such a jump is $1m2m-ma$. We now consider $(m22)m-ma$ steps following phase $P$. The probability of reaching the optimum solution is by union bound at most: $m22m-ma1m2m-ma=(12)m-ma$. Hence the probability of not reaching the global optimum in the mentioned phase is $1-(12)m-ma=1-e-\Omega (m)$. Altogether, with probability $1-e-\Omega (m\delta )$, the optimization time is at least $m22m-ma$.$\u25a1$

#### 4.2.3 Experimental Results

In this section, we include experimental results that confirm the exponential lower bound that we have proved in Section 4.2.2 for the optimization time of the studied instance. We try the algorithm with a maximum of $106$ iterations on instances of 6 different input sizes, and we show that they all stick to the local optimum. For the lower-level optimization we have developed an algorithm that visits all the selected black nodes in the order they appear on the large circle, then visits all selected white nodes in the order they appear on the small circle, and goes back to the first black node to form a Hamiltonian circuit. This algorithm assures Property 25 of the optimal lower-level solution, which was important in our theoretical analysis. We have set $r=1$ and $r'=108$ so that the inequality of Equation 1 holds for the maximum input size that we are running the algorithm with. The results, based on 30 runs of the algorithm, are summarised in Table 2. The first and second columns indicate the input size and the percentage of runs that stick to the local optimum, respectively. The average and maximum number of iterations until finding the local optimum are presented in the third and fourth columns, respectively. As the table suggests, 100$%$ of the runs for all input sizes find the local optimum and stick to that until the maximum iteration number is reached. This confirms the theory results of Section 4.2.2.

Input Size ($m$) . | $%$LO . | Average Runtime to Reach Local Optimum . | Maximum Runtime to Reach Local Optimum . |
---|---|---|---|

20 | 100 | 15 | 69 |

50 | 100 | 36 | 275 |

100 | 100 | 66 | 346 |

200 | 100 | 82 | 770 |

500 | 100 | 197 | 1300 |

1000 | 100 | 680 | 3120 |

Input Size ($m$) . | $%$LO . | Average Runtime to Reach Local Optimum . | Maximum Runtime to Reach Local Optimum . |
---|---|---|---|

20 | 100 | 15 | 69 |

50 | 100 | 36 | 275 |

100 | 100 | 66 | 346 |

200 | 100 | 82 | 770 |

500 | 100 | 197 | 1300 |

1000 | 100 | 680 | 3120 |

## 5 Conclusion

Evolutionary algorithms and local search approaches have been shown to be very successful for solving the generalized travelling salesperson problem. We have investigated two common hierarchical representations together with local search algorithms and simple evolutionary algorithms from a theoretical perspective. In the first part of this article, which is based on the conference version (Pourhassan and Neumann, 2015), the focus is on local search approaches. By presenting instances where they mutually outperform each other, we have gained new insights into the complementary abilities of the two approaches. Furthermore, we have presented and analysed a class of instances where combining the two approaches into a variable-neighbourhood search helps to escape from local optima of the single approaches.

In the second part, we have investigated the behaviour of hierarchical evolutionary algorithms for this problem. We have proved that there are instances which node-based (1 + 1) EA solves to optimality in polynomial time, while cluster-based (1 + 1) EA needs exponential time to find an optimal solution for them. Then we have presented a Euclidean instance of the GTSP to find an exponential lower bound on the optimization time of this algorithm. Our lower bound analysis for a geometric instance shows that the Euclidean case is hard to solve even if we assume that the lower-level TSP is solved to optimality in no time.