## Abstract

Bi-level optimisation problems have gained increasing interest in the field of combinatorial optimisation in recent years. In this paper, we analyse the runtime of some evolutionary algorithms for bi-level optimisation problems. We examine two NP-hard problems, the generalised minimum spanning tree problem and the generalised travelling salesperson problem in the context of parameterised complexity. For the generalised minimum spanning tree problem, we analyse the two approaches presented by Hu and Raidl (2012) with respect to the number of clusters that distinguish each other by the chosen representation of possible solutions. Our results show that a (1+1) evolutionary algorithm working with the spanning nodes representation is not a fixed-parameter evolutionary algorithm for the problem, whereas the problem can be solved in fixed-parameter time with the global structure representation. We present hard instances for each approach and show that the two approaches are highly complementary by proving that they solve each other’s hard instances very efficiently. For the generalised travelling salesperson problem, we analyse the problem with respect to the number of clusters in the problem instance. Our results show that a (1+1) evolutionary algorithm working with the global structure representation is a fixed-parameter evolutionary algorithm for the problem.

## 1  Introduction

Many interesting combinatorial optimisation problems are hard to solve, and metaheuristic approaches such as local search, simulated annealing, evolutionary algorithms (EAs), and ant colony optimisation have been used for a wide range of these problems.

In recent years, researchers have become very interested in bi-level optimisation for single-objective (Koh, 2007; Legillon et al., 2012) and multiobjective (Deb and Sinha, 2009, 2010) problems. Such problems can be split into an upper-level and a lower-level problem, which depend on each other. By fixing a possible solution for the upper-level problem, the lower-level problem is optimised with respect to the given objective and the constraints imposed by the choice of the upper-level problem.

Sinha et al. (2014) give the following general definition of a bi-level optimisation problem.

Definition 1 (Sinha et al., 2014).
Let denote the product of the upper-level decision space XU and the lower-level decision space XL, i.e., , if and . For an upper-level objective function and a lower-level objective function , a general bi-level optimisation problem is given by
where the functions , , represent lower-level constraints and , , is the collection of upper-level constraints.

A vector is called feasible on the upper level if it fulfils all upper-level constraints. For a given xu the lower-level solution xl has to be optimal. A comprehensive benchmark set in the context of continuous optimisation was introduced by Sinha et al. (2014).

According to Definition 1, the upper-level objective function F and the lower-level objective function f may constitute an arbitrary optimisation problem, which implies that they could both be difficult, that is, NP-hard problems. Bi-level problems can be found in a variety of domains such as transportation or economics. The toll-setting problem (Brotcorne et al., 2001) is one such problem from the transportation domain, in which the government that operates the highways tries to maximise its revenues by placing toll gates in a road network. Drivers have different objectives and avoid tolls by choosing paths that minimise their travelling costs. This example shows that the upper- and lower-level problems can work against each other, that is, the government tries to maximise revenue, whereas the drivers try to minimise their costs.

To initiate runtime analysis of evolutionary bi-level optimisation, we examine settings for the NP-hard generalised minimum spanning tree problem (GMSTP) where the upper- and lower-level problems are cooperative. In our case, the upper-level function constitutes an NP-hard optimisation problem, whereas the lower-level problem can be solved in polynomial time. We examine two approaches introduced by Hu and Raidl (2011, 2012) for the GMSTP. Both approaches work with an upper-layer and a lower-layer solution. The upper-layer solution xu is evolved by an evolutionary algorithm, whereas the optimal solution xl of the lower-layer problem corresponding to a particular search point xu of the upper layer can be found in polynomial time using deterministic algorithms.

Our goal is to understand the two different approaches by parameterised computational complexity analysis (Downey and Fellows, 1999). The computational complexity analysis of metaheuristics plays a major role in the theoretical analysis of this type of algorithms and studies the runtime behaviour with respect to the size of the given input. We refer the reader to Auger and Doerr (2011) and Neumann and Witt (2010) for a comprehensive presentation. Parameterised complexity analysis takes into account the runtime of algorithms in dependence of an additional parameter that measures the hardness of a given instance. This allows us to understand which parameters of a given NP-hard optimisation problem make it hard or easy to be optimised by heuristic search methods. In the context of evolutionary algorithms, the term fixed-parameter evolutionary algorithms has been defined by Kratsch and Neumann (2013). An evolutionary algorithm is called a fixed-parameter evolutionary algorithm for a given parameter k iff its expected runtime is bounded by , where is a function only depending on k, and is a polynomial with respect to the input size n. Parameterised computational complexity analysis of evolutionary algorithms has been carried out for the vertex cover problem (Kratsch and Neumann, 2013), the computation of maximum leaf spanning trees (Kratsch et al., 2010), makespan scheduling (Sutton and Neumann, 2012b), and the travelling salesperson problem (Sutton and Neumann, 2012a).

We push forward the parameterised analysis of evolutionary algorithms and present an analysis in the context of bi-level optimisation. In our investigations, we take into account two NP-hard problems: the GMSTP and the generalised travelling salesperson problem (GTSP), which share the parameter, number of clusters m. We consider two different bi-level representations for the GMSTP that both have a polynomially solvable lower-level part. For the spanning nodes representation, we present worst-case examples showing that there are instances leading to an optimisation time of . For the global structure representation, we show that it leads to a fixed-parameter evolutionary algorithm with respect to the number of clusters m. Furthermore, we present an instance class where the algorithm using the global structure representation, encounters an optimisation time of . Analysing both approaches on each other’s worst-case instances, we show that they solve them very efficiently. This shows the complementary abilities of these two representations for the GMSTP. Then we extend our results for global structure representation to the GTSP to show that a similar algorithm has an expected optimisation time of for this problem as well.

The paper is divided into two main parts according to the two different problems. The first part, based on the conference version (Corus et al., 2013), where the GMSTP is investigated, is presented in Section 2. We show hard instances for the spanning nodes representation in Section 2.2 and show that a simple evolutionary algorithm needs exponential time even if the number of clusters is small. In Section 2.3 we examine the global structure representation and show that this leads to fixed-parameter evolutionary algorithms for the GMSTP. We point out complementary abilities in Section 2.4. This paper extends the conference version by investigations of the GTSP and some generalisations. We examine the GTSP with the corresponding global structure representation in Section 3 and provide upper and lower bounds on the optimisation time of the considered algorithm. In Section 4 we point out general characteristics that allow this fixed-parameter result to be extended to other problems.

## 2  Generalised Minimum Spanning Tree Problem

In this section, we consider the GMSTP and provide the runtime analysis with respect to bi-level representations given by Hu and Raidl (2011, 2012).

### 2.1  Preliminaries

We consider the GMSTP introduced by Myung et al. (1995). The input is given by an undirected complete graph on n nodes with a cost function that assigns positive costs to the edges. Furthermore, a partitioning of the node set V into m pairwise disjoint clusters is given such that .

A solution to the GMSTP consists of two components, the m chosen nodes P, called the spanning nodes, in the m clusters, and a minimum spanning tree T on the graph induced by the spanned nodes. More precisely, a solution consists of a node set , where and a spanning tree on the subgraph induced by P. The cost of T is the cost of the edges in T, namely,

The goal is to compute a solution that has minimal cost among all possible solutions . For an easier presentation, we assume in some cases that edge costs can be . In this case, we restrict our investigations to solutions that do not include edges with cost . Alternatively, one might view this as the GMSTP defined on a graph that is not necessarily complete.

The GMSTP is NP-hard (Myung et al., 1995), and two different bi-level evolutionary approaches have been examined by Hu and Raidl (2012). The first approach they present uses the spanned nodes representation. It selects in the upper-level problem a node for each cluster and computes on the lower level a minimum spanning tree (using, for example, Kruskal’s algorithm in time ) on the induced subgraph. Tabu search and variable neighbourhood approaches using this representation can be found in Ghosh (2003).

The second approach uses the global structure representation. It constructs a complete graph from the given input graph and the set of pairwise disjoint clusters . The node , , corresponds to the cluster Vi in G. The search space for the upper level consists of all spanning trees of H, and the spanned nodes of the different clusters are selected in time using the dynamic programming approach of Pop (2004).

For our theoretical investigations, we measure the runtime of the algorithms by the number of fitness evaluations required to obtain an optimal solution. We call this the optimisation time of the examined algorithm. The expected optimisation time refers to the expected number of fitness evaluations until an optimal solution has been obtained for the first time.

### 2.2  Spanned Nodes Representation

We analyse the cluster-based (1+1) EA in this section (see Algorithm 1). Our first theorem shows that this algorithm is an XP-algorithm (Downey and Fellows, 1999), that is, an algorithm that runs in time , where is a computable function only depending on m, when choosing the number of clusters m as a parameter.

Theorem 1:

For any instance of the GMSTP, the expected time until the cluster based (1+1) EA reaches the optimal solution is .

Proof:
For any search point x, let denote the number of clusters where the spanned node representation includes a suboptimal node. If the algorithm chooses all suboptimal clusters for mutation and selects the optimal node in each of them, then the optimal solution is obtained. Since , the probability that all suboptimal clusters are mutated in a single step is at least . The probability of choosing the optimal node in cluster i is . Thus, the probability of jumping to the optimal solution from any search point is at least
Since , it holds that
Therefore, the probability of reaching the optimal solution in one step is , and the expected time to reach the optimal solution is bounded from above by .

We now consider an instance of GMSTP that is difficult for the cluster-based (1+1) EA. The hard instance GS for the spanning nodes representation is illustrated in Figure 1. It consists of m clusters, where one cluster is called the central cluster, and the other clusters are called peripheral clusters. Each cluster contains nodes, and we assume that holds. The nodes in the peripheral clusters are called peripheral nodes, and the nodes in the central cluster are called central nodes. Within each cluster, one of the nodes is called optimal and is marked black in the figure. The remaining nodes are called suboptimal nodes and are marked white in the figure. The instance is a bipartite graph, where edges connect peripheral nodes to central nodes. The cost of any edge between two optimal nodes is 1, the cost of any edge between two suboptimal nodes is 2. The cost of any edge between a suboptimal peripheral node and the optimal central node is n2, and the cost of any edge between an optimal peripheral node and a suboptimal central node is n. A cluster is called optimal in a solution if the solution has chosen the optimal node in that cluster.

Figure 1:

Hard instance GS for spanning node representation.

Figure 1:

Hard instance GS for spanning node representation.

Theorem 2:

Starting with an initial solution chosen uniformly at random, the expected optimisation time of the cluster-based (1+1) EA on GS is .

Furthermore, for any constant , the probability of having obtained an optimal solution after at most iterations is .

Proof:

We define two phases for the run of the (1+1) EA. The first phase consists of the first iterations, and the second phase starts at the end of the first phase and continues for iterations. Four distinct events are considered failures during the run of the (1+1) EA for the instance just described:

• The first failure occurs if during the first phase of the run, the algorithm obtains a search point with less than suboptimal peripheral clusters.

• The second type of failure occurs when the central cluster fails to switch to a suboptimal node at least once during the first phase.

• The third type of failure corresponds to a direct jump to the optimal solution during the second phase.

• The fourth failure occurs when the algorithm does not switch all the optimal peripheral clusters to suboptimal clusters during the second phase.

We first show that the probability of the first failure event is at most . This implies that with overwhelmingly high probability, a constant fraction of peripheral clusters is always suboptimal during the first iterations. For and , let be a random variable such that if cluster Vi is always suboptimal in iteration 0 through iteration t, and otherwise. The probability that a suboptimal node is selected in the initial solution is . In the following iterations, the probability that a cluster is selected for mutation and that its new spanned node is optimal is . So it is clear that
By linearity of expectation,
Considering a phase length of , and assuming that m and n are sufficiently large and holds, we get
Finally, a Chernoff bound (Motwani and Raghavan, 1995) implies that
We then show that the probability of the second failure event is . In each iteration the probability to switch the central cluster to a suboptimal node is at least
The probability that this event does not occur in steps is
Now, we show that the probability of the third failure event is less than , assuming that the first two failure events do not occur. As long as the central cluster remains suboptimal, switching a suboptimal node in a peripheral cluster to an optimal node will result in an extra cost of . Conversely, switching an optimal peripheral cluster into a suboptimal cluster will decrease the cost by . As long as there is at least one suboptimal peripheral cluster, making the central cluster optimal will incur an extra cost of at least . So, during phase two, the algorithm cannot make any suboptimal cluster optimal unless all suboptimal clusters are made optimal in the same iteration. The probability of making at least suboptimal peripheral clusters optimal simultaneously is at most
Since the probability to jump to the optimal solution is at most in each iteration, it holds by the union bound that the probability of failure event three is at most
Finally, we show that the probability of failure event four is . For sufficiently large n, the probability that an optimal peripheral cluster is made suboptimal by the (1+1) EA is at least
The expected time until all peripheral clusters have become suboptimal is therefore at most . Considering a phase of length and taking into account , it holds by Markov’s inequality that the probability of a type four failure is

By union bound, the probability that any type of failure occurs is less than the sum of their independent probabilities, which is . Hence, with overwhelmingly high probability, after the second phase, the algorithm has obtained a locally optimal solution where all peripheral clusters are suboptimal. After that iteration, the probability to jump directly to the optimal solution is , and the expected time for this event to occur is nm.

Let E be the event that no failure occurs. Then, the first statement of the theorem follows by the law of the total probability,
Furthermore, by union bound, it holds that
Hence, the second statement of the theorem follows by the law of total probability

Our results for the spanned nodes representation show that the cluster-based (1+1) EA obtains an optimal solution in time , and our analysis for the hard instance GS shows that this bound is tight.

### 2.3  Global Structure Representation

The second approach examined by Hu and Raidl (2012) uses the global structure representation. It works on the complete graph obtained from the input graph . The node , , represents the cluster Vi of G.

The upper-level solution in the global structure representation is a spanning tree T of H, and the lower-level solution is a set of nodes , with that minimises the cost of a spanning tree connecting the clusters in the same way as T. Given a spanning tree T of H, the set of nodes P can be computed in time using dynamic programming (Pop, 2004).

We consider the tree-based (1+1) EA outlined in Algorithm 2. It starts with a spanning tree T of H that is chosen uniformly at random. In each iteration, a new solution of the upper layer is obtained by performing K edge swaps to T. Here the parameter K is chosen according to , where is the Poisson distribution with expectation 1. In one edge swap, an edge e currently not present in the solution is introduced and an edge from the resulting cycle is removed such that a new spanning tree of H is obtained. After having produced the offspring , the corresponding set of nodes is computed using dynamic programming. P and T are replaced by and if the cost of the new solution is not worse than the cost of the old one.

In the following, we show that the tree-based (1+1) EA is a fixed-parameter evolutionary algorithm for the GMSTP when considering the number of clusters m as the parameter. We do this by transferring the result of Pop (2004) to the tree-based (1+1) EA.

Theorem 3:

The expected time of the tree-based (1+1) EA to find the optimal solution for any instance of the GMSTP is . Furthermore, for any , the probability that an optimal solution is not found within steps is less than .

Proof:

An upper-layer solution is a tree T of H. Let be any tree of H for which there exists a set of spanning nodes such that and form an optimal solution. For any nonoptimal solution T, define as the number of edges in that are missing in T.

The mutation operator can convert a nonoptimal solution T into the optimal solution with a sequence of edge exchange operations. The probability that the mutation operator exchanges edges in one mutation step is at least
In each exchange operation, if there are i optimal edges missing, then the probability that one of the missing optimal edges is inserted is at least . After the addition of an optimal edge, the probability of excluding a nonoptimal edge is at least , since the largest cycle cannot be longer than m. At most nonoptimal edges must be exchanged in this manner. So the probability that the nonoptimal solution T will be converted to the optimal solution in one mutation step is at least
So, the expected time to achieve an optimal solution is . Furthermore, the probability that the optimal solution has not been created after iterations is

We now present an instance that is hard to be solved by the tree-based (1+1) EA. The instance GG, illustrated in Figure 2, consists of n nodes and m clusters. There are two central clusters, denoted by V1 and V2. The cluster V1 contains the two nodes v11 and v12. The remaining clusters , contain a single node each. The edges that connect the nodes v11 to the peripheral cluster nodes have cost 1. The edges that connect v21 to the peripheral clusters have weight . The edge that connects v12 and v21 has weight . All other edges have cost . Hence, if the tree-based (1+1) EA connects cluster V1 and V2, then the dynamic programming algorithm will choose node v12.

Figure 2:

Hard instance GG for global structure representation. Edges not shown have weight .

Figure 2:

Hard instance GG for global structure representation. Edges not shown have weight .

In our analysis, we will use the following lemma on basic properties of the Poisson distribution with expectation 1.

Lemma 1:

If , then

Proof:
Using Stirling’s approximation of the factorial,
we obtain the simple bound

Using the previous lemma (Lemma 5), we are able to show that the tree-based (1+1) EA finds it hard to optimise GG when the initial spanning tree is chosen uniformly at random among all spanning trees having weight less then .

Theorem 4:

Starting with a spanning tree chosen uniformly at random among all spanning trees that have cost less than , the expected optimisation time of the tree-based (1+1) EA on GG is .

Proof:

Consider the instance in Figure 2. In the following, edge is the edge connecting the two central clusters. The optimal solution corresponds to the spanning tree that includes edge e, and where all other clusters are connected to cluster V2. The solution where all peripheral clusters are connected to V1, and where cluster V2 is connected to one of the peripheral clusters, is a local optimum.

We define four failure events that can occur during a run of the (1+1) EA on this instance:

• The first type of failure occurs when the initial solution includes edge e.

• The second type of failure occurs when less than of the peripheral clusters are connected to cluster V1 in the initial solution.

• The third type of failure occurs when the algorithm jumps directly to the optimal solution during the first iterations.

• The fourth type of failure occurs if after iteration , there exists a peripheral cluster that is not connected to cluster V1.

There are peripheral clusters that must be connected to either V1 or V2. Additionally, cluster V1 and V2 must be connected. This connection can be established either by adding edge or by connecting a peripheral cluster to both V1 and V2. There are spanning trees that contain edge , and spanning trees that do not contain edge , since one of the peripheral clusters will be connected to both central clusters and the others will be connected to only one. So, the probability that a uniformly chosen spanning tree includes edge is , which is the probability of the first type of failure.

Now, we show that the probability of the second type of failure is at most . Considering that the probability of a specific cluster is adjacent to V1 in the initial solution is at least , the probability that less than clusters are connected to cluster V1 in the initial solution is bounded by using a Chernoff bound.

Assuming that type one and type two failures did not occur, the algorithm cannot accept new search points where a cluster originally connected to V1 is instead connected to V2, since it will create an extra cost of . The only exception is if a type three failure occurs, that is, the algorithm jumps directly to the optimal solution where all the peripheral clusters are connected to V2. For a type three failure to occur, at least clusters have to be modified simultaneously. Therefore, using Lemma 5, the probability of jumping directly to the optimal solution in a single step is bounded from above by
Taking a phase length of into account, the probability of a type three failure can be bounded from above using the union bound, as
Now, we show that the probability of a type four failure is . The probability that a single peripheral cluster connected to V2 is switched to V1 is bounded from below by
Thus, the expected time between any such event is , and the expected time until all of the at most peripheral clusters are connected to V1 is . By Markov’s inequality, it holds for any non-negative random variable X that
The probability that it takes longer than
iterations is therefore no more than

This proves our claim about the probability of failure event four.

If none of the above-mentioned failures occur, we reach the local optimum where all the peripheral clusters are connected to cluster V1. From this point on, the probability to jump to the optimal solution is by Lemma 5 no more than
because it is necessary to make at least edge exchanges to reach the optimum. The expected time to reach the optimal solution conditional on no failure is therefore more than .
Let R be the event that no failure occurs. By the law of total probability, it follows that the expected time to reach the global optimum is

The previous theorem shows that there are instances for the cluster-based (1+1) EA where the optimisation time grows exponentially with the number of clusters. In the next section, we compare the two different representations for GMSTP and show that they have complementary capabilities.

### 2.4  Complementary Capabilities

The two representations examined in the previous sections differ significantly from each other. They both rely on the fact that there is a deterministic algorithm that solves the lower-level problem in polynomial time. In this section, we want to examine the differences between the two approaches. We show that both representations have complementary abilities and do this by examining the algorithms on each other’s hard instance. Surprisingly, we find that the hard instance for one algorithm becomes easy to solve when giving it as an input to the other algorithm.

In Section 2.2 we showed a lower bound of for the cluster-based (1+1) EA using the spanning node representation. The hard instance GS for the cluster-based (1+1) EA given in Figure 1 consists of a central cluster to which all the other clusters are connected. There are no other connections between the clusters. Hence, there is only one spanning tree when working with the global structure representation. The dynamic programming algorithm that runs on the lower layer of the tree-based (1+1) EA therefore solves the problem in its first iteration.

The following theorem shows that these instances are easy to be optimised by the tree-based (1+1) EA.

Theorem 5:

The tree-based (1+1) EA solves the instance GS in expected constant time.

Proof:

There is only a single tree over the cluster graph. Hence, the algorithm selects the optimal tree in the initial iteration.

For the tree-based (1+1) EA, working with the global structure representation, we showed that it finds the instance GG given in Figure 2 hard to solve. Working with the spanning nodes representation, there is only one cluster that consists of two nodes where all the other clusters contain exactly one node. Hence, an optimal solution is obtained by computing a minimum spanning tree on the lower level if the right node in the cluster of two nodes is chosen. The following theorem summarises this and shows that this instance becomes easy when working with the cluster-based (1+1) EA.

Theorem 6:

The cluster-based (1+1) EA solves the instance GG in expected time .

Proof:

Cluster V1 contains two nodes, and all other clusters contain a single node. If the initial solution is not already the optimal solution, the correct node of V1 has to be selected using mutation. The node for the cluster V1 is changed with probability , and in such a step the correct node is selected with probability . Hence, the probability of a mutation leading to an optimal solution is at least and the expected waiting time for this event is .

The investigations show that the two examined representations have complementary abilities. Switching from one representation to the other one can significantly reduce the runtime.

## 3  Generalised Travelling Salesperson Problem

We now turn our attention to the NP-hard generalised travelling salesperson problem. Given a complete graph with a cost function and a partitioning of the node set V into m clusters Vi, , the goal is to find a cycle of minimal cost that contains exactly one node from each cluster.

The bi-level approach that we are studying is similar to the one discussed in the previous section. We investigate the global structure representation, which works on the complete graph obtained from the input graph . The node , , represents the cluster Vi of G.

The upper-level solution in the global structure representation is a Hamiltonian tour on H, and the lower-level solution is a set of nodes with that minimises the cost of a Hamiltonian tour, which connects the clusters in the same way as . Given the restriction imposed by the Hamiltonian tour of H, finding the optimal set of nodes P can be done in time by using any shortest path algorithm. One such algorithm is cluster optimisation, proposed initially by Fischetti et al. (1997) and widely used in the literature. Let be a permutation on the m clusters and pi be the chosen node for cluster , . Then the cost of the tour is

Our proposed algorithm (Algorithm 3) starts with a random permutation of clusters, which is always a Hamiltonian tour , in a complete graph H. In each iteration, a new solution of the upper layer is obtained by the commonly used jump operator, which picks a node and moves it to a random position in the permutation. The number of jump operations carried out in a mutation step is chosen according to , where denotes the Poisson distribution with expectation 1. Although we are using the jump operator in these investigations, similar results can be obtained for other popular mutation operators such as exchange and inversion.

Theorem 7:

The expected optimisation time of the tour-based (1+1) EA is .

Proof:
We consider the probability of obtaining the optimal tour on the global graph H from an arbitrary tour . The number of jump operations required is at most m (the number of clusters). The probability of picking the right node and moving it to the right position in each of those m operations is at least . We can obtain an optimal solution by carrying out a sequence of m jump operations where the ith operation jumps element in to position i. Since the probability of is , the probability of a specific sequence of m jump operations to occur is bounded below by
Therefore, the expected waiting time for such a mutation is
which proves the upper bound on the expected optimisation time.

Note that this upper bound depends on the number of clusters. Since the computational effort required to assess the lower-level problem is polynomial in input size, , this implies that the proposed algorithm is a fixed-parameter evolutionary algorithm for the GTSP problem and the parameter m, the number of clusters.

So far we have found an upper bound for the expected time of finding an optimal solution using the presented algorithm. Next, we find a lower bound for the optimisation time. Figure 3 illustrates an instance GG of GTSP for which finding the optimal solution is difficult by means of the presented bi-level evolutionary algorithm with global structure representation. In this graph, each cluster has two nodes. On the upper layer a tour for clusters is found by the EA, and on the lower layer the best node for that tour is found within each cluster. All white nodes (which represent suboptimal nodes) are connected to each other, making any permutation of clusters a Hamiltonian tour even if the black nodes are not used. All such connections have a weight of 1, except for those that are shown in the figure and have a weight of 2. All edges between a black node and a white node and also all edges between black nodes have weight m2, except the ones presented in the figure that have weight . An optimal solution of cost 1 uses only edges of cost whereas local optimal solutions use only edges of cost 1. The tour comprising all black nodes in the same order as illustrated in Figure 3 is the optimal solution. Note that there are many local optimal solutions of cost m. For our analysis it is just important that they do not share any edge with the global optimal solution.

Figure 3:

Hard instance GG for GTSP with global structure representation.

Figure 3:

Hard instance GG for GTSP with global structure representation.

The clusters are numbered in the figure, and a measure S for evaluating cluster orders is based on this numbering: Let represent the permutation of clusters in the upper layer; then indicates the similarity of the permutation with the optimal permutation. A large value of means that many clusters in are in the same order as in the optimal solution. Note that for an optimal solution . A solution with is locally optimal in the sense that there is no strictly better solution in the neighbourhood induced by the jump operator. The solutions with form a plateau where all solutions differ from the optimal solution by m edges.

We first introduce a lemma that will later help us with the proof of the lower bound on the optimisation time.

Lemma 2:

Let and be two nonoptimal cluster permutations for the instance GG. If , then .

Proof:

In the given instance, all white nodes are connected to each other with a maximum weight of 2. These connections ensure that any permutation of the clusters can result in a Hamiltonian tour with a cost of at most . Moreover, all connections between white nodes and black nodes have a weight of m2. So the lower level will never choose a combination of white and black nodes because the cost will be more than m2, while there is an option of selecting all white nodes with the cost of at most . On the other hand, for any permutation of clusters other than the global optimum, the lower level will not choose any black nodes because it will not be possible to use all the edges and some m2-weighted edges will be used again. Let be the number of clusters adjacent to each other correctly from the right side (having the same rightside neighbour as in the global optimum) in a solution . Then of clusters that have a different neighbour on their right. If is not the optimal solution, then the lower level will choose all white nodes. As a result, a edges with weight 2 and b edges with weight 1 will be used in that solution; therefore, the total cost of solution will be . Consider a solution with and . We have , which completes the proof.

Lemma 10 shows that any nonoptimal offspring of a solution is not accepted if it is closer to an optimal solution . This means that the algorithm finds it hard to obtain an optimal solution for GG and leads to an exponential lower bound on the optimisation time, as shown in the following theorem.

Theorem 8:

Starting with a permutation of clusters chosen uniformly at random, the optimisation time of the tour-based (1+1) EA on GG is with probability .

Proof:

Considering GG illustrated in Figure 3, the optimal solution is the tour comprising all edges with weight . We consider a typical run of the algorithm consisting of a phase of steps, where C is an appropriate constant. For the typical run we show the following:

• A local optimum with is reached with probability .

• The global optimal solution is not obtained with probability .

Then we state that only a direct jump from the local optimum to the global optimum is possible, and the probability of this event is .

First we show that with high probability holds for the initial solution , where is a small positive constant.

We count the number of permutations in which at least , a small constant, of cluster neighbourhoods are correct. We should select of the clusters to be followed by their specific neighbour and consider the number of different permutations of clusters:
1
Some solutions are double-counted in this expression, so the actual number of different solutions with is less than (1). Therefore, the probability of having more than clusters followed by their specific cluster, is at most

Hence, with probability , holds, and the initial solution has at most correctly ordered clusters.

Now we analyse the expected time to reach a solution with . For this purpose, we first consider the exchange operation and find the minimum number of different exchanges at each step that reduce the number of good orderings in a solution.

If we show the permutation of clusters for the current solution by , then there are clusters in this permutation that are followed by their consecutive cluster. Note that for any solution other than the local optimum, holds. Let j be one of these clusters, which is followed by cluster . In order to destroy this good ordering, cluster j should be exchanged with a cluster r that fulfils the following requirements:

• Cluster r cannot be a consecutive cluster of j’s current neighbours, that is, if we name j’s neighbours i and k, then r cannot be , , , because these nodes will introduce a new good ordering to the solution if we replace j.

• The current position of cluster r’ in the permutation should not be before or after clusters or because replacing r with j would introduce new good orderings in that case.

Therefore, the total number of positions in the permutation that should not be selected as r is at most 8, meaning that there are choices for r that result in reducing the number of good orderings. Since there are l choices for j and choices for r, the total number of possible exchange operations to reach a permutation with is
On the other hand, it is possible to simulate each exchange operation with two jumps. For any j and r, exchange(j,r) can be implemented by performing jump(j,r) and . The first jump will place j before r, and the second one will place r before . Now we find the probability that two jumps happen at one step and simulate one of the possible exchange operations as
In this formula, is the number of different choices for exchange operation, is the probability of performing two mutation operations at one step, and each is the probability of selecting the two specific nodes for a jump operation. Using this probability, the expected time until l decrease by one is
where is an appropriate constant. The maximum value of l is , which we have already proved is at most . Therefore, with summing up the expected time of reducing l gradually from its maximum value to 1, we can find the expected time to reach the local optimum with as
If denotes the appropriate constant that , then by Markov’s inequality we have
If we repeat phases of iterations for times, the probability that the local optimum is not reached in any of them is at most

As a result, with probability the algorithm will reach a local optimum in a phase of steps, which if we consider , is actually the same as the phase of iterations that we mentioned previously.

To prove that with high probability the global optimum is not reached during the considered phase, first note that by Lemma 10 any jump to a solution closer to the optimum other than directly to the global optimum will be rejected. Furthermore, for the initial solution . Therefore, only nonoptimal solutions with are accepted by the algorithm. In order to obtain an optimal solution the algorithm has to produce the optimal solution from a solution with in a single mutation step. We now upper-bound the probability of such a direct jump that changes at least clusters to their correct order. Such a move needs at least operations in the same iteration because each jump can change at most three edges. Taking into account that these jump operations may be acceptable in any order, the probability of a direct jump is at most
2

So in a phase of iterations the probability of having such a direct jump is by union bound at most .

So far we have shown that a local optimum with is reached with probability within the first iterations. The probability of obtaining an optimal solution from a solution with is at most
We now consider an additional phase of steps after having obtained a local optimum. Using the union bound, the probability of reaching the global optimum in this phase is at most

As a result, the probability of not reaching the optimal solution in these iterations is . Altogether, the optimisation time is at least with probability .

## 4  Discussion of Generalisations

The problems we have examined in this work are bi-level optimisation problems where the upper-level problem, namely the leader, and the lower-level problem, the follower, share an objective function. The general bi-level optimisation problem also includes the setting where the leader and the follower have different objectives. Given the decision of the leader, the follower makes a decision according to its objective function, which might conflict with the objective function of the leader. An example of such a problem is where the leader places toll booths across a road network and the followers try to find the cheapest way from a point A to a point B by finding a path that avoids as many toll booths as possible. Here, the leader can only learn the objective function value of its decision after the follower picks the optimum path. Unlike the GMSTP and GTSP, the objective functions of upper- and lower-level problems are distinct and conflicting in this toll booth problem.

For a given solution visited in the upper-level problem, the evaluation cost is, in the worst case, the computational complexity of the lower-level problem. If the lower-level problem can be solved in polynomial time, then a fixed-parameter bound on the size of the upper-level solution is sufficient for the overall problem to be fixed-parameter tractable. Because when the size of the upper-level solution is bounded by a function that only depends on a parameter k of the original problem, the search process in the upper level will occur in a space of size , and a simple heuristic like the uniform random search will be able to find the optimal upper-level solution in iterations and basic operations in expectation.

In our case, the global structure representation of GMSTP and GTSP, the size of an upper-level solution is bounded above by m2, since it is enough to indicate with a single bit whether any two clusters are connected to precisely define a solution. On the other hand, the spanned nodes representation of GMSTP needs an upper-level solution to indicate which node is selected in each cluster. The required number of bits in the upper-level solution is maximised when there are nodes in each cluster because of inequality of arithmetic and geometric mean, and the maximum number of bits required is . With global structure representation, if we pick our solutions uniformly at random, the probability of picking a unique optimal solution is , which will occur in trials in expectation. Therefore uniform random search is fixed-parameter tractable for the problem. However with the spanned node representation the corresponding upper bound is , which still depends on the solution size n and does not provide information about the fixed-parameter tractability of the problem. However, it is noteworthy that the upper bound with the uniform random search is asymptotically better than the lower bound provided for the hard instance presented for the proposed algorithms. Nevertheless, the performance of the random uniform search () would still fall into the category of XP-algorithms, in accordance with the FPT-XP distinction between the representations.

## 5  Conclusions

Evolutionary bi-level optimisation has attracted increasing interest in recent years. With this article we have contributed to the theoretical understanding by considering two classical NP-hard combinatorial optimisation problems, namely, the generalised minimum spanning tree and the generalised travelling salesperson. We studied evolutionary algorithms for these problems in the parameterised setting. Using parameterised computational complexity analysis of evolutionary algorithms for the GMSTP, we examined two representations for the upper-layer solutions and their corresponding deterministic algorithms for the lower layer. Our results show that the global structure representation leads to fixed-parameter evolutionary algorithms. By presenting hard instances for each of the two approaches, we pointed out where they run into difficulties. Furthermore, we showed that the two representations for the GMSTP are highly complementary by proving that they are highly efficient on the hard instance of the other algorithm. After having achieved these results for the GMSTP, we turned our attention to the GTSP. We showed that using the global structure representation leads to fixed-parameter evolutionary algorithms with respect to the number of clusters. Furthermore, we pointed out a worst-case instance where the optimisation time grows exponential with respect to the number of clusters, and we discussed generalisations of the results.

## Acknowledgments

The research leading to these results received funding from the Australian Research Council (ARC) under grant agreements DP130104395 and DP140103400, and from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 618091 (SAGE).

## References

Auger
,
A.
, and
Doerr
,
B.
(Eds.) (
2011
).
Theory of randomized search heuristics: Foundations and recent developments
.
Singapore
:
World Scientific
.
Brotcorne
,
L.
,
Labbé
,
M.
,
Marcotte
,
P.
, and
Savard
,
G
. (
2001
).
A bilevel model for toll optimization on a multicommodity transportation network
.
Transportation Science
,
35
(
4
):
345
358
.
Corus
,
D.
,
Lehre
,
P. K.
, and
Neumann
,
F
. (
2013
).
The generalized minimum spanning tree problem: A parameterized complexity analysis of bi-level optimisation
. In
Proceedings of the Conference on Genetic and Evolutionary Computation (GECCO)
, pp.
519
526
.
Deb
,
K.
, and
Sinha
,
A.
(
2009
).
Solving bilevel multi-objective optimization problems using evolutionary algorithms
. In
Proceedings of the International Conference on Evolutionary Multi-Criterion Optimization.
pp.
110
124
.
Lecture Notes in Computer Science
, Vol.
5467
.
Deb
,
K.
, and
Sinha
,
A
. (
2010
).
An efficient and accurate solution methodology for bilevel multi-objective programming problems using a hybrid evolutionary local-search algorithm
.
Evolutionary Computation
,
18
(
3
):
403
449
.
Downey
,
R. G.
, and
Fellows
,
M. R
. (
1999
).
Parameterized complexity
.
New York
:
Springer
.
Fischetti
,
M.
,
Salazar González
,
J. J.
, and
Toth
,
P
. (
1997
).
A branch-and-cut algorithm for the symmetric generalized traveling salesman problem
.
Operations Research
,
45
(
3
):
378
394
.
Ghosh
,
D.
(
2003
).
Solving medium to large sized Euclidean generalized minimum spanning tree problems
.
Technical Report, Indian Institute of Management
,
Research and Publication Department, Ahmedabad, India
.
Hu
,
B.
, and
Raidl
,
G. R
. (
2011
).
An evolutionary algorithm with solution archive for the generalized minimum spanning tree problem
. In
Proceedings of the International Conference on Computer Aided Systems Theory.
pp.
287
294
.
Lecture Notes in Computer Science
, Vol.
6927
.
Hu
,
B.
, and
Raidl
,
G. R.
(
2012
).
An evolutionary algorithm with solution archives and bounding extension for the generalized minimum spanning tree problem
. In
Proceedings of the Conference on Genetic and Evolutionary Computation (GECCO),
pp.
393
400
.
Koh
,
A
. (
2007
).
Solving transportation bi-level programs with differential evolution
. In
IEEE Congress on Evolutionary Computation
, pp.
2243
2250
.
Kratsch
,
S.
,
Lehre
,
P. K.
,
Neumann
,
F.
, and
Oliveto
,
P. S.
(
2010
).
Fixed parameter evolutionary algorithms and maximum leaf spanning trees: A matter of mutation
. In
Parallel Problem Solving from Nature
,
PPSN XI
, pp.
204
213
.
Kratsch
,
S.
, and
Neumann
,
F
. (
2013
).
Fixed-parameter evolutionary algorithms and the vertex cover problem
.
Algorithmica
,
65
(
4
):
754
771
.
Legillon
,
F.
,
Liefooghe
,
A.
, and
Talbi
,
E.-G
. (
2012
).
COBRA: A cooperative coevolutionary algorithm for bi-level optimization
. In
Proceedings of IEEE Congress on Evolutionary Computation
, pp.
1
8
.
Motwani
,
R.
, and
Raghavan
,
P
. (
1995
).
Randomized algorithms
.
Cambridge
:
Cambridge University Press
.
Myung
,
Y.-S.
,
ho Lee
,
C.
, and
wan Tcha
,
D
. (
1995
).
On the generalized minimum spanning tree problem
.
Networks
,
26
(
4
):
231
241
.
Neumann
,
F.
, and
Witt
,
C
. (
2010
).
Bioinspired computation in combinatorial optimization: Algorithms and their computational complexity
.
New York
:
Springer
.
Pop
,
P. C
. (
2004
).
New models of the generalized minimum spanning tree problem
.
Journal of Mathematical Modelling and Algorithms
,
3
(
2
):
153
166
.
Sinha
,
A.
,
Malo
,
P.
, and
Deb
,
K
. (
2014
).
Test problem construction for single-objective bilevel optimization
.
Evolutionary Computation
,
22
(
3
):
439
477
.
Sutton
,
A. M.
, and
Neumann
,
F.
(
2012a
).
A parameterized runtime analysis of evolutionary algorithms for the Euclidean traveling salesperson problem
.
.
Sutton
,
A. M.
, and
Neumann
,
F.
(
2012b
).
A parameterized runtime analysis of simple evolutionary algorithms for makespan scheduling
. In
Parallel Problem Solving from Nature, PPSN XII
, pp.
52
61
.