## Abstract

Bi-level optimisation problems have gained increasing interest in the field of combinatorial optimisation in recent years. In this paper, we analyse the runtime of some evolutionary algorithms for bi-level optimisation problems. We examine two NP-hard problems, the generalised minimum spanning tree problem and the generalised travelling salesperson problem in the context of parameterised complexity. For the generalised minimum spanning tree problem, we analyse the two approaches presented by Hu and Raidl (2012) with respect to the number of clusters that distinguish each other by the chosen representation of possible solutions. Our results show that a (1+1) evolutionary algorithm working with the spanning nodes representation is not a fixed-parameter evolutionary algorithm for the problem, whereas the problem can be solved in fixed-parameter time with the global structure representation. We present hard instances for each approach and show that the two approaches are highly complementary by proving that they solve each other’s hard instances very efficiently. For the generalised travelling salesperson problem, we analyse the problem with respect to the number of clusters in the problem instance. Our results show that a (1+1) evolutionary algorithm working with the global structure representation is a fixed-parameter evolutionary algorithm for the problem.

## 1 Introduction

Many interesting combinatorial optimisation problems are hard to solve, and metaheuristic approaches such as local search, simulated annealing, evolutionary algorithms (EAs), and ant colony optimisation have been used for a wide range of these problems.

In recent years, researchers have become very interested in bi-level optimisation for single-objective (Koh, 2007; Legillon et al., 2012) and multiobjective (Deb and Sinha, 2009, 2010) problems. Such problems can be split into an upper-level and a lower-level problem, which depend on each other. By fixing a possible solution for the upper-level problem, the lower-level problem is optimised with respect to the given objective and the constraints imposed by the choice of the upper-level problem.

Sinha et al. (2014) give the following general definition of a bi-level optimisation problem.

*X*and the lower-level decision space

_{U}*X*, i.e., , if and . For an upper-level objective function and a lower-level objective function , a general bi-level optimisation problem is given by where the functions , , represent lower-level constraints and , , is the collection of upper-level constraints.

_{L}A vector is called feasible on the upper level if it fulfils all upper-level constraints. For a given *x _{u}* the lower-level solution

*x*has to be optimal. A comprehensive benchmark set in the context of continuous optimisation was introduced by Sinha et al. (2014).

_{l}According to Definition ^{1}, the upper-level objective function *F* and the lower-level objective function *f* may constitute an arbitrary optimisation problem, which implies that they could both be difficult, that is, NP-hard problems. Bi-level problems can be found in a variety of domains such as transportation or economics. The toll-setting problem (Brotcorne et al., 2001) is one such problem from the transportation domain, in which the government that operates the highways tries to maximise its revenues by placing toll gates in a road network. Drivers have different objectives and avoid tolls by choosing paths that minimise their travelling costs. This example shows that the upper- and lower-level problems can work against each other, that is, the government tries to maximise revenue, whereas the drivers try to minimise their costs.

To initiate runtime analysis of evolutionary bi-level optimisation, we examine settings for the NP-hard generalised minimum spanning tree problem (GMSTP) where the upper- and lower-level problems are cooperative. In our case, the upper-level function constitutes an NP-hard optimisation problem, whereas the lower-level problem can be solved in polynomial time. We examine two approaches introduced by Hu and Raidl (2011, 2012) for the GMSTP. Both approaches work with an upper-layer and a lower-layer solution. The upper-layer solution *x _{u}* is evolved by an evolutionary algorithm, whereas the optimal solution

*x*of the lower-layer problem corresponding to a particular search point

_{l}*x*of the upper layer can be found in polynomial time using deterministic algorithms.

_{u}Our goal is to understand the two different approaches by parameterised computational complexity analysis (Downey and Fellows, 1999). The computational complexity analysis of metaheuristics plays a major role in the theoretical analysis of this type of algorithms and studies the runtime behaviour with respect to the size of the given input. We refer the reader to Auger and Doerr (2011) and Neumann and Witt (2010) for a comprehensive presentation. Parameterised complexity analysis takes into account the runtime of algorithms in dependence of an additional parameter that measures the hardness of a given instance. This allows us to understand which parameters of a given NP-hard optimisation problem make it hard or easy to be optimised by heuristic search methods. In the context of evolutionary algorithms, the term fixed-parameter evolutionary algorithms has been defined by Kratsch and Neumann (2013). An evolutionary algorithm is called a fixed-parameter evolutionary algorithm for a given parameter *k* iff its expected runtime is bounded by , where is a function only depending on *k*, and is a polynomial with respect to the input size *n*. Parameterised computational complexity analysis of evolutionary algorithms has been carried out for the vertex cover problem (Kratsch and Neumann, 2013), the computation of maximum leaf spanning trees (Kratsch et al., 2010), makespan scheduling (Sutton and Neumann, 2012b), and the travelling salesperson problem (Sutton and Neumann, 2012a).

We push forward the parameterised analysis of evolutionary algorithms and present an analysis in the context of bi-level optimisation. In our investigations, we take into account two NP-hard problems: the GMSTP and the generalised travelling salesperson problem (GTSP), which share the parameter, number of clusters *m*. We consider two different bi-level representations for the GMSTP that both have a polynomially solvable lower-level part. For the *spanning nodes representation*, we present worst-case examples showing that there are instances leading to an optimisation time of . For the *global structure representation*, we show that it leads to a fixed-parameter evolutionary algorithm with respect to the number of clusters *m*. Furthermore, we present an instance class where the algorithm using the global structure representation, encounters an optimisation time of . Analysing both approaches on each other’s worst-case instances, we show that they solve them very efficiently. This shows the complementary abilities of these two representations for the GMSTP. Then we extend our results for global structure representation to the GTSP to show that a similar algorithm has an expected optimisation time of for this problem as well.

The paper is divided into two main parts according to the two different problems. The first part, based on the conference version (Corus et al., 2013), where the GMSTP is investigated, is presented in Section 2. We show hard instances for the spanning nodes representation in Section 2.2 and show that a simple evolutionary algorithm needs exponential time even if the number of clusters is small. In Section 2.3 we examine the global structure representation and show that this leads to fixed-parameter evolutionary algorithms for the GMSTP. We point out complementary abilities in Section 2.4. This paper extends the conference version by investigations of the GTSP and some generalisations. We examine the GTSP with the corresponding global structure representation in Section 3 and provide upper and lower bounds on the optimisation time of the considered algorithm. In Section 4 we point out general characteristics that allow this fixed-parameter result to be extended to other problems.

## 2 Generalised Minimum Spanning Tree Problem

### 2.1 Preliminaries

We consider the GMSTP introduced by Myung et al. (1995). The input is given by an undirected complete graph on *n* nodes with a cost function that assigns positive costs to the edges. Furthermore, a partitioning of the node set *V* into *m* pairwise disjoint clusters is given such that .

*m*chosen nodes

*P*, called

*the spanning nodes*, in the

*m*clusters, and a minimum spanning tree

*T*on the graph induced by the spanned nodes. More precisely, a solution consists of a node set , where and a spanning tree on the subgraph induced by

*P*. The cost of

*T*is the cost of the edges in

*T*, namely,

The goal is to compute a solution that has minimal cost among all possible solutions . For an easier presentation, we assume in some cases that edge costs can be . In this case, we restrict our investigations to solutions that do not include edges with cost . Alternatively, one might view this as the GMSTP defined on a graph that is not necessarily complete.

The GMSTP is NP-hard (Myung et al., 1995), and two different bi-level evolutionary approaches have been examined by Hu and Raidl (2012). The first approach they present uses the spanned nodes representation. It selects in the upper-level problem a node for each cluster and computes on the lower level a minimum spanning tree (using, for example, Kruskal’s algorithm in time ) on the induced subgraph. Tabu search and variable neighbourhood approaches using this representation can be found in Ghosh (2003).

The second approach uses the global structure representation. It constructs a complete graph from the given input graph and the set of pairwise disjoint clusters . The node , , corresponds to the cluster *V _{i}* in

*G*. The search space for the upper level consists of all spanning trees of

*H*, and the spanned nodes of the different clusters are selected in time using the dynamic programming approach of Pop (2004).

For our theoretical investigations, we measure the runtime of the algorithms by the number of fitness evaluations required to obtain an optimal solution. We call this the *optimisation time* of the examined algorithm. The *expected optimisation time* refers to the expected number of fitness evaluations until an optimal solution has been obtained for the first time.

### 2.2 Spanned Nodes Representation

We analyse the cluster-based (1+1) EA in this section (see Algorithm 1). Our first theorem shows that this algorithm is an XP-algorithm (Downey and Fellows, 1999), that is, an algorithm that runs in time , where is a computable function only depending on *m*, when choosing the number of clusters *m* as a parameter.

For any instance of the GMSTP, the expected time until the cluster based (1+1) EA reaches the optimal solution is .

*x*, let denote the number of clusters where the spanned node representation includes a suboptimal node. If the algorithm chooses all suboptimal clusters for mutation and selects the optimal node in each of them, then the optimal solution is obtained. Since , the probability that all suboptimal clusters are mutated in a single step is at least . The probability of choosing the optimal node in cluster

*i*is . Thus, the probability of jumping to the optimal solution from any search point is at least

We now consider an instance of GMSTP that is difficult for the cluster-based (1+1) EA. The hard instance *G _{S}* for the spanning nodes representation is illustrated in Figure 1. It consists of

*m*clusters, where one cluster is called the

*central cluster*, and the other clusters are called

*peripheral clusters*. Each cluster contains nodes, and we assume that holds. The nodes in the peripheral clusters are called

*peripheral nodes*, and the nodes in the central cluster are called

*central nodes*. Within each cluster, one of the nodes is called

*optimal*and is marked black in the figure. The remaining nodes are called

*suboptimal*nodes and are marked white in the figure. The instance is a bipartite graph, where edges connect peripheral nodes to central nodes. The cost of any edge between two optimal nodes is 1, the cost of any edge between two suboptimal nodes is 2. The cost of any edge between a suboptimal peripheral node and the optimal central node is

*n*

^{2}, and the cost of any edge between an optimal peripheral node and a suboptimal central node is

*n*. A cluster is called

*optimal*in a solution if the solution has chosen the optimal node in that cluster.

Starting with an initial solution chosen uniformly at random, the expected optimisation time of the cluster-based (1+1) EA on *G _{S}* is .

Furthermore, for any constant , the probability of having obtained an optimal solution after at most iterations is .

We define two phases for the run of the (1+1) EA. The first phase consists of the first iterations, and the second phase starts at the end of the first phase and continues for iterations. Four distinct events are considered failures during the run of the (1+1) EA for the instance just described:

- •
The first failure occurs if during the first phase of the run, the algorithm obtains a search point with less than suboptimal peripheral clusters.

- •
The second type of failure occurs when the central cluster fails to switch to a suboptimal node at least once during the first phase.

- •
The third type of failure corresponds to a direct jump to the optimal solution during the second phase.

- •
The fourth failure occurs when the algorithm does not switch all the optimal peripheral clusters to suboptimal clusters during the second phase.

*V*is always suboptimal in iteration 0 through iteration

_{i}*t*, and otherwise. The probability that a suboptimal node is selected in the initial solution is . In the following iterations, the probability that a cluster is selected for mutation and that its new spanned node is optimal is . So it is clear that By linearity of expectation, Considering a phase length of , and assuming that

*m*and

*n*are sufficiently large and holds, we get Finally, a Chernoff bound (Motwani and Raghavan, 1995) implies that

*n*, the probability that an optimal peripheral cluster is made suboptimal by the (1+1) EA is at least The expected time until all peripheral clusters have become suboptimal is therefore at most . Considering a phase of length and taking into account , it holds by Markov’s inequality that the probability of a type four failure is

By union bound, the probability that any type of failure occurs is less than the sum of their independent probabilities, which is . Hence, with overwhelmingly high probability, after the second phase, the algorithm has obtained a locally optimal solution where all peripheral clusters are suboptimal. After that iteration, the probability to jump directly to the optimal solution is , and the expected time for this event to occur is *n ^{m}*.

*E*be the event that no failure occurs. Then, the first statement of the theorem follows by the law of the total probability,

Our results for the spanned nodes representation show that the cluster-based (1+1) EA obtains an optimal solution in time , and our analysis for the hard instance *G _{S}* shows that this bound is tight.

### 2.3 Global Structure Representation

The second approach examined by Hu and Raidl (2012) uses the global structure representation. It works on the complete graph obtained from the input graph . The node , , represents the cluster *V _{i}* of

*G*.

The upper-level solution in the global structure representation is a spanning tree *T* of *H*, and the lower-level solution is a set of nodes , with that minimises the cost of a spanning tree connecting the clusters in the same way as *T*. Given a spanning tree *T* of *H*, the set of nodes *P* can be computed in time using dynamic programming (Pop, 2004).

We consider the tree-based (1+1) EA outlined in Algorithm 2. It starts with a spanning tree *T* of *H* that is chosen uniformly at random. In each iteration, a new solution of the upper layer is obtained by performing *K* edge swaps to *T*. Here the parameter *K* is chosen according to , where is the Poisson distribution with expectation 1. In one edge swap, an edge *e* currently not present in the solution is introduced and an edge from the resulting cycle is removed such that a new spanning tree of *H* is obtained. After having produced the offspring , the corresponding set of nodes is computed using dynamic programming. *P* and *T* are replaced by and if the cost of the new solution is not worse than the cost of the old one.

In the following, we show that the tree-based (1+1) EA is a fixed-parameter evolutionary algorithm for the GMSTP when considering the number of clusters *m* as the parameter. We do this by transferring the result of Pop (2004) to the tree-based (1+1) EA.

The expected time of the tree-based (1+1) EA to find the optimal solution for any instance of the GMSTP is . Furthermore, for any , the probability that an optimal solution is not found within steps is less than .

An upper-layer solution is a tree *T* of *H*. Let be any tree of *H* for which there exists a set of spanning nodes such that and form an optimal solution. For any nonoptimal solution *T*, define as the number of edges in that are missing in *T*.

*T*into the optimal solution with a sequence of edge exchange operations. The probability that the mutation operator exchanges edges in one mutation step is at least

*i*optimal edges missing, then the probability that one of the missing optimal edges is inserted is at least . After the addition of an optimal edge, the probability of excluding a nonoptimal edge is at least , since the largest cycle cannot be longer than

*m*. At most nonoptimal edges must be exchanged in this manner. So the probability that the nonoptimal solution

*T*will be converted to the optimal solution in one mutation step is at least

We now present an instance that is hard to be solved by the tree-based (1+1) EA. The instance *G _{G}*, illustrated in Figure 2, consists of

*n*nodes and

*m*clusters. There are two central clusters, denoted by

*V*

_{1}and

*V*

_{2}. The cluster

*V*

_{1}contains the two nodes

*v*

_{11}and

*v*

_{12}. The remaining clusters , contain a single node each. The edges that connect the nodes

*v*

_{11}to the peripheral cluster nodes have cost 1. The edges that connect

*v*

_{21}to the peripheral clusters have weight . The edge that connects

*v*

_{12}and

*v*

_{21}has weight . All other edges have cost . Hence, if the tree-based (1+1) EA connects cluster

*V*

_{1}and

*V*

_{2}, then the dynamic programming algorithm will choose node

*v*

_{12}.

In our analysis, we will use the following lemma on basic properties of the Poisson distribution with expectation 1.

If , then

Using the previous lemma (Lemma ^{5}), we are able to show that the tree-based (1+1) EA finds it hard to optimise *G _{G}* when the initial spanning tree is chosen uniformly at random among all spanning trees having weight less then .

Starting with a spanning tree chosen uniformly at random among all spanning trees that have cost less than , the expected optimisation time of the tree-based (1+1) EA on *G _{G}* is .

Consider the instance in Figure 2. In the following, edge is the edge connecting the two central clusters. The optimal solution corresponds to the spanning tree that includes edge *e*, and where all other clusters are connected to cluster *V*_{2}. The solution where all peripheral clusters are connected to *V*_{1}, and where cluster *V*_{2} is connected to one of the peripheral clusters, is a local optimum.

We define four failure events that can occur during a run of the (1+1) EA on this instance:

- •
The first type of failure occurs when the initial solution includes edge

*e*. - •
The second type of failure occurs when less than of the peripheral clusters are connected to cluster

*V*_{1}in the initial solution. - •
The third type of failure occurs when the algorithm jumps directly to the optimal solution during the first iterations.

- •
The fourth type of failure occurs if after iteration , there exists a peripheral cluster that is not connected to cluster

*V*_{1}.

There are peripheral clusters that must be connected to either *V*_{1} or *V*_{2}. Additionally, cluster *V*_{1} and *V*_{2} must be connected. This connection can be established either by adding edge or by connecting a peripheral cluster to both *V*_{1} and *V*_{2}. There are spanning trees that contain edge , and spanning trees that do not contain edge , since one of the peripheral clusters will be connected to both central clusters and the others will be connected to only one. So, the probability that a uniformly chosen spanning tree includes edge is , which is the probability of the first type of failure.

Now, we show that the probability of the second type of failure is at most . Considering that the probability of a specific cluster is adjacent to *V*_{1} in the initial solution is at least , the probability that less than clusters are connected to cluster *V*_{1} in the initial solution is bounded by using a Chernoff bound.

*V*

_{1}is instead connected to

*V*

_{2}, since it will create an extra cost of . The only exception is if a type three failure occurs, that is, the algorithm jumps directly to the optimal solution where all the peripheral clusters are connected to

*V*

_{2}. For a type three failure to occur, at least clusters have to be modified simultaneously. Therefore, using Lemma

^{5}, the probability of jumping directly to the optimal solution in a single step is bounded from above by

*V*

_{2}is switched to

*V*

_{1}is bounded from below by

*V*

_{1}is . By Markov’s inequality, it holds for any non-negative random variable

*X*that The probability that it takes longer than iterations is therefore no more than

This proves our claim about the probability of failure event four.

*V*

_{1}. From this point on, the probability to jump to the optimal solution is by Lemma

^{5}no more than because it is necessary to make at least edge exchanges to reach the optimum. The expected time to reach the optimal solution conditional on no failure is therefore more than .

The previous theorem shows that there are instances for the cluster-based (1+1) EA where the optimisation time grows exponentially with the number of clusters. In the next section, we compare the two different representations for GMSTP and show that they have complementary capabilities.

### 2.4 Complementary Capabilities

The two representations examined in the previous sections differ significantly from each other. They both rely on the fact that there is a deterministic algorithm that solves the lower-level problem in polynomial time. In this section, we want to examine the differences between the two approaches. We show that both representations have complementary abilities and do this by examining the algorithms on each other’s hard instance. Surprisingly, we find that the hard instance for one algorithm becomes easy to solve when giving it as an input to the other algorithm.

In Section 2.2 we showed a lower bound of for the cluster-based (1+1) EA using the spanning node representation. The hard instance *G _{S}* for the cluster-based (1+1) EA given in Figure 1 consists of a central cluster to which all the other clusters are connected. There are no other connections between the clusters. Hence, there is only one spanning tree when working with the global structure representation. The dynamic programming algorithm that runs on the lower layer of the tree-based (1+1) EA therefore solves the problem in its first iteration.

The following theorem shows that these instances are easy to be optimised by the tree-based (1+1) EA.

The tree-based (1+1) EA solves the instance *G _{S}* in expected constant time.

There is only a single tree over the cluster graph. Hence, the algorithm selects the optimal tree in the initial iteration.

For the tree-based (1+1) EA, working with the global structure representation, we showed that it finds the instance *G _{G}* given in Figure 2 hard to solve. Working with the spanning nodes representation, there is only one cluster that consists of two nodes where all the other clusters contain exactly one node. Hence, an optimal solution is obtained by computing a minimum spanning tree on the lower level if the right node in the cluster of two nodes is chosen. The following theorem summarises this and shows that this instance becomes easy when working with the cluster-based (1+1) EA.

The cluster-based (1+1) EA solves the instance *G _{G}* in expected time .

Cluster *V*_{1} contains two nodes, and all other clusters contain a single node. If the initial solution is not already the optimal solution, the correct node of *V*_{1} has to be selected using mutation. The node for the cluster *V*_{1} is changed with probability , and in such a step the correct node is selected with probability . Hence, the probability of a mutation leading to an optimal solution is at least and the expected waiting time for this event is .

The investigations show that the two examined representations have complementary abilities. Switching from one representation to the other one can significantly reduce the runtime.

## 3 Generalised Travelling Salesperson Problem

We now turn our attention to the NP-hard generalised travelling salesperson problem. Given a complete graph with a cost function and a partitioning of the node set *V* into *m* clusters *V _{i}*, , the goal is to find a cycle of minimal cost that contains exactly one node from each cluster.

The bi-level approach that we are studying is similar to the one discussed in the previous section. We investigate the global structure representation, which works on the complete graph obtained from the input graph . The node , , represents the cluster *V _{i}* of

*G*.

*H*, and the lower-level solution is a set of nodes with that minimises the cost of a Hamiltonian tour, which connects the clusters in the same way as . Given the restriction imposed by the Hamiltonian tour of

*H*, finding the optimal set of nodes

*P*can be done in time by using any shortest path algorithm. One such algorithm is

*cluster optimisation*, proposed initially by Fischetti et al. (1997) and widely used in the literature. Let be a permutation on the

*m*clusters and

*p*be the chosen node for cluster , . Then the cost of the tour is

_{i}Our proposed algorithm (Algorithm 3) starts with a random permutation of clusters, which is always a Hamiltonian tour , in a complete graph *H*. In each iteration, a new solution of the upper layer is obtained by the commonly used *jump* operator, which picks a node and moves it to a random position in the permutation. The number of jump operations carried out in a mutation step is chosen according to , where denotes the Poisson distribution with expectation 1. Although we are using the jump operator in these investigations, similar results can be obtained for other popular mutation operators such as *exchange* and *inversion*.

The expected optimisation time of the tour-based (1+1) EA is .

*H*from an arbitrary tour . The number of jump operations required is at most

*m*(the number of clusters). The probability of picking the right node and moving it to the right position in each of those

*m*operations is at least . We can obtain an optimal solution by carrying out a sequence of

*m*jump operations where the

*i*th operation jumps element in to position

*i*. Since the probability of is , the probability of a specific sequence of

*m*jump operations to occur is bounded below by

Note that this upper bound depends on the number of clusters. Since the computational effort required to assess the lower-level problem is polynomial in input size, , this implies that the proposed algorithm is a fixed-parameter evolutionary algorithm for the GTSP problem and the parameter *m*, the number of clusters.

So far we have found an upper bound for the expected time of finding an optimal solution using the presented algorithm. Next, we find a lower bound for the optimisation time. Figure 3 illustrates an instance *G _{G}* of GTSP for which finding the optimal solution is difficult by means of the presented bi-level evolutionary algorithm with global structure representation. In this graph, each cluster has two nodes. On the upper layer a tour for clusters is found by the EA, and on the lower layer the best node for that tour is found within each cluster. All white nodes (which represent suboptimal nodes) are connected to each other, making any permutation of clusters a Hamiltonian tour even if the black nodes are not used. All such connections have a weight of 1, except for those that are shown in the figure and have a weight of 2. All edges between a black node and a white node and also all edges between black nodes have weight

*m*

^{2}, except the ones presented in the figure that have weight . An optimal solution of cost 1 uses only edges of cost whereas local optimal solutions use only edges of cost 1. The tour comprising all black nodes in the same order as illustrated in Figure 3 is the optimal solution. Note that there are many local optimal solutions of cost

*m*. For our analysis it is just important that they do not share any edge with the global optimal solution.

The clusters are numbered in the figure, and a measure *S* for evaluating cluster orders is based on this numbering: Let represent the permutation of clusters in the upper layer; then indicates the similarity of the permutation with the optimal permutation. A large value of means that many clusters in are in the same order as in the optimal solution. Note that for an optimal solution . A solution with is locally optimal in the sense that there is no strictly better solution in the neighbourhood induced by the jump operator. The solutions with form a plateau where all solutions differ from the optimal solution by *m* edges.

We first introduce a lemma that will later help us with the proof of the lower bound on the optimisation time.

Let and be two nonoptimal cluster permutations for the instance *G _{G}*. If , then .

In the given instance, all white nodes are connected to each other with a maximum weight of 2. These connections ensure that any permutation of the clusters can result in a Hamiltonian tour with a cost of at most . Moreover, all connections between white nodes and black nodes have a weight of *m*^{2}. So the lower level will never choose a combination of white and black nodes because the cost will be more than *m*^{2}, while there is an option of selecting all white nodes with the cost of at most . On the other hand, for any permutation of clusters other than the global optimum, the lower level will not choose any black nodes because it will not be possible to use all the edges and some *m*^{2}-weighted edges will be used again. Let be the number of clusters adjacent to each other correctly from the right side (having the same rightside neighbour as in the global optimum) in a solution . Then of clusters that have a different neighbour on their right. If is not the optimal solution, then the lower level will choose all white nodes. As a result, *a* edges with weight 2 and *b* edges with weight 1 will be used in that solution; therefore, the total cost of solution will be . Consider a solution with and . We have , which completes the proof.

Lemma ^{10} shows that any nonoptimal offspring of a solution is not accepted if it is closer to an optimal solution . This means that the algorithm finds it hard to obtain an optimal solution for *G _{G}* and leads to an exponential lower bound on the optimisation time, as shown in the following theorem.

Starting with a permutation of clusters chosen uniformly at random, the optimisation time of the tour-based (1+1) EA on *G _{G}* is with probability .

Considering *G _{G}* illustrated in Figure 3, the optimal solution is the tour comprising all edges with weight . We consider a typical run of the algorithm consisting of a phase of steps, where

*C*is an appropriate constant. For the typical run we show the following:

- •
A local optimum with is reached with probability .

- •
The global optimal solution is not obtained with probability .

Then we state that only a direct jump from the local optimum to the global optimum is possible, and the probability of this event is .

First we show that with high probability holds for the initial solution , where is a small positive constant.

Hence, with probability , holds, and the initial solution has at most correctly ordered clusters.

Now we analyse the expected time to reach a solution with . For this purpose, we first consider the exchange operation and find the minimum number of different exchanges at each step that reduce the number of good orderings in a solution.

If we show the permutation of clusters for the current solution by , then there are clusters in this permutation that are followed by their consecutive cluster. Note that for any solution other than the local optimum, holds. Let *j* be one of these clusters, which is followed by cluster . In order to destroy this good ordering, cluster *j* should be exchanged with a cluster *r* that fulfils the following requirements:

- •
Cluster

*r*cannot be a consecutive cluster of*j*’s current neighbours, that is, if we name*j*’s neighbours*i*and*k*, then*r*cannot be , , , because these nodes will introduce a new good ordering to the solution if we replace*j*. - •
The current position of cluster

*r*’ in the permutation should not be before or after clusters or because replacing*r*with*j*would introduce new good orderings in that case.

*r*is at most 8, meaning that there are choices for

*r*that result in reducing the number of good orderings. Since there are

*l*choices for

*j*and choices for

*r*, the total number of possible exchange operations to reach a permutation with is

*exchange*operation with two jumps. For any

*j*and

*r*,

*exchange*(

*j*,

*r*) can be implemented by performing

*jump*(

*j*,

*r*) and . The first jump will place

*j*before

*r*, and the second one will place

*r*before . Now we find the probability that two jumps happen at one step and simulate one of the possible exchange operations as

*l*decrease by one is where is an appropriate constant. The maximum value of

*l*is , which we have already proved is at most . Therefore, with summing up the expected time of reducing

*l*gradually from its maximum value to 1, we can find the expected time to reach the local optimum with as If denotes the appropriate constant that , then by Markov’s inequality we have If we repeat phases of iterations for times, the probability that the local optimum is not reached in any of them is at most

As a result, with probability the algorithm will reach a local optimum in a phase of steps, which if we consider , is actually the same as the phase of iterations that we mentioned previously.

^{10}any jump to a solution closer to the optimum other than directly to the global optimum will be rejected. Furthermore, for the initial solution . Therefore, only nonoptimal solutions with are accepted by the algorithm. In order to obtain an optimal solution the algorithm has to produce the optimal solution from a solution with in a single mutation step. We now upper-bound the probability of such a direct jump that changes at least clusters to their correct order. Such a move needs at least operations in the same iteration because each jump can change at most three edges. Taking into account that these jump operations may be acceptable in any order, the probability of a direct jump is at most

So in a phase of iterations the probability of having such a direct jump is by union bound at most .

As a result, the probability of not reaching the optimal solution in these iterations is . Altogether, the optimisation time is at least with probability .

## 4 Discussion of Generalisations

The problems we have examined in this work are bi-level optimisation problems where the upper-level problem, namely the *leader*, and the lower-level problem, the *follower*, share an objective function. The general bi-level optimisation problem also includes the setting where the leader and the follower have different objectives. Given the decision of the leader, the follower makes a decision according to its objective function, which might conflict with the objective function of the leader. An example of such a problem is where the leader places toll booths across a road network and the followers try to find the cheapest way from a point *A* to a point *B* by finding a path that avoids as many toll booths as possible. Here, the leader can only learn the objective function value of its decision after the follower picks the optimum path. Unlike the GMSTP and GTSP, the objective functions of upper- and lower-level problems are distinct and conflicting in this toll booth problem.

For a given solution visited in the upper-level problem, the evaluation cost is, in the worst case, the computational complexity of the lower-level problem. If the lower-level problem can be solved in polynomial time, then a fixed-parameter bound on the size of the upper-level solution is sufficient for the overall problem to be fixed-parameter tractable. Because when the size of the upper-level solution is bounded by a function that only depends on a parameter *k* of the original problem, the search process in the upper level will occur in a space of size , and a simple heuristic like the uniform random search will be able to find the optimal upper-level solution in iterations and basic operations in expectation.

In our case, the global structure representation of GMSTP and GTSP, the size of an upper-level solution is bounded above by *m*^{2}, since it is enough to indicate with a single bit whether any two clusters are connected to precisely define a solution. On the other hand, the spanned nodes representation of GMSTP needs an upper-level solution to indicate *which node* is selected in *each cluster*. The required number of bits in the upper-level solution is maximised when there are nodes in each cluster because of inequality of arithmetic and geometric mean, and the maximum number of bits required is . With global structure representation, if we pick our solutions uniformly at random, the probability of picking a unique optimal solution is , which will occur in trials in expectation. Therefore uniform random search is fixed-parameter tractable for the problem. However with the spanned node representation the corresponding upper bound is , which still depends on the solution size *n* and does not provide information about the fixed-parameter tractability of the problem. However, it is noteworthy that the upper bound with the uniform random search is asymptotically better than the lower bound provided for the hard instance presented for the proposed algorithms. Nevertheless, the performance of the random uniform search () would still fall into the category of XP-algorithms, in accordance with the FPT-XP distinction between the representations.

## 5 Conclusions

Evolutionary bi-level optimisation has attracted increasing interest in recent years. With this article we have contributed to the theoretical understanding by considering two classical NP-hard combinatorial optimisation problems, namely, the generalised minimum spanning tree and the generalised travelling salesperson. We studied evolutionary algorithms for these problems in the parameterised setting. Using parameterised computational complexity analysis of evolutionary algorithms for the GMSTP, we examined two representations for the upper-layer solutions and their corresponding deterministic algorithms for the lower layer. Our results show that the global structure representation leads to fixed-parameter evolutionary algorithms. By presenting hard instances for each of the two approaches, we pointed out where they run into difficulties. Furthermore, we showed that the two representations for the GMSTP are highly complementary by proving that they are highly efficient on the hard instance of the other algorithm. After having achieved these results for the GMSTP, we turned our attention to the GTSP. We showed that using the global structure representation leads to fixed-parameter evolutionary algorithms with respect to the number of clusters. Furthermore, we pointed out a worst-case instance where the optimisation time grows exponential with respect to the number of clusters, and we discussed generalisations of the results.

## Acknowledgments

The research leading to these results received funding from the Australian Research Council (ARC) under grant agreements DP130104395 and DP140103400, and from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 618091 (SAGE).