## Abstract

Real-world optimization problems have been studied in the past, but the work resulted in approaches tailored to individual problems that could not be easily generalized. The reason for this limitation was the lack of appropriate models for the systematic study of salient aspects of real-world problems. The aim of this article is to study one of such aspects: multi-hardness. We propose a variety of decomposition-based algorithms for an abstract multi-hard problem and compare them against the most promising heuristics.

## 1 Introduction

Mathematical modeling has been the basis of many natural sciences, as well as operations research, for decades. Yet, even as many advances have been made, over the years the phenomenon of “unreasonable ineffectiveness of mathematics” in computer engineering (Gunawardena, 1998), cognitive science (Poli, 1999), economics (Velupillai, 2005), and biology (Borovik, 2009) has been noticed. In Michalewicz (2012) and Michalewicz and Fogel (2000) the authors argue that the same phenomenon occurs in real-world optimization. They divided hard optimization problems into two categories: designed and real-world problems. “Designed problems” are “mathematical”—they have simple logical formulation, are surgically precise, and the objective function clearly indicates a better solution out of two potential solutions. This category includes the Traveling Salesman Problem (TSP), Graph Colouring Problem (GCP), Job Shop Scheduling Problem (JSSP), and Knapsack Problem (KP), to name a few. Real-world optimization problems have not been designed by anyone, but occur in real business processes. They usually have very complex formulations. To solve such problems, first we have to build their models, and the quality of the obtained solution will depend on the quality of the model.

The level of difficulty of designed and real-world optimization problems differs in practice, even if they may be equivalent from the point of view of complexity theory. In this article, the distinction between these levels of difficulty is made by referring to “single-hard,” “double-hard,” and more generally “multi-hard” problems. A single-hard problem means a designed problem of high computational complexity. A “multi-hard” problem can be described as a nontrivial combination of “single-hard” problems: solving subproblems of a multi-hard problem in isolation does not lead to a good solution of the multi-hard problem.

Through a better understanding of multi-hardness, the “ineffectiveness of mathematics” for solving real-world optimization problems may be reduced. Often, in our attempts to reduce the complexity of multi-hard problems, we create models that use known, single-hard problems that are combined by additional interdependencies like joint criteria or joint constraints.

The aim of this article is to develop foundations for solving multi-hard problems. The starting point is a formulation of an abstract double-hard problem, called the Traveling Thief Problem (TTP) in Bonyadi et al. (2013) that is a nontrivial composition of two well-studied classical problems: the Traveling Salesman Problem and the Knapsack Problem.

In this work, TTP is studied further, with the goal of obtaining insights into the difficulty of multi-hard problems in general through an evaluation of algorithms for solving TTP. The goal is to compare known heuristics against algorithms that aim to exploit the structure of a multi-hard problem.

Specialized algorithms have been developed for many (perhaps most) well-known single-hard problems. Unfortunately, such algorithms are often very sensitive to problem modifications, such as new constraints. Moreover, such algorithms do not exist for multi-hard problems, and it would be hard to develop them as multi-hard problems are defined by special combinations of single-hard problems of various types. The way these single-hard problems are combined differs from one multi-hard problem to another, as well.

However, instead of throwing out our knowledge, and instead of building new algorithms from scratch, it may be possible to use existing algorithms as building blocks for solving multi-hard problems. The first candidates would be known metaheuristics. After all, multi-hard problems are in the same computational complexity class as single-hard problems. However, this approach does not take into account the structure and type of combination of a multi-hard problem.

In our previous work in Bonyadi et al. (2014) we have developed the idea of CoSolver and applied it to the Traveling Thief Problem obtaining some promising results. The main idea behind CoSolver is to decompose a multi-hard problem into subproblems, solve the subproblems separately with some communication between the algorithms solving subproblems, and then combine the solutions back to obtain a solution to the initial problem.

This article makes the following contributions. CoSolver is compared against metaheuristics that we have thought of as most promising for multi-hard problems: a Monte-Carlo Tree Search algorithm and Ant Colony Optimization. The algorithms are also compared to exact solutions for a variety of instances of TTP, differing in difficulty and structure. Further, CoSolver is extended by incorporating heuristics instead of exact solvers for the TSP and KP components of TTP. This extension greatly improves the scalability of CoSolver without compromising quality.

The structure of the article is as follows. In the next section, we discuss related work. Section 3 formally defines the Traveling Thief Problem. Section 4 introduces the concept of decomposition algorithms for multi-hard problems, the CoSolver algorithm, and the Monte-Carlo Tree Search algorithm for TTP. Section 5 describes the benchmark instances of TTP. Section 6 presents results of experiments with solving benchmarks using proposed algorithms. Section 7 concludes the article.

## 2 Related Work

In 2013 The Traveling Thief Problem (TTP) was introduced (Bonyadi et al., 2013) as an example of a multilevel optimization problem. The problem was presented as a combination of two well-known subproblems: the Traveling Salesman Problem (TSP) and the Knapsack Problem (KP). The authors showed that optimal solutions for each subproblem do not guarantee a global solution, because the interdependency between the two problems affects the optimal solution for the whole problem. Although some extensions to the Traveling Salesman Problem were studied before, they consisted of one hard problem (i.e., the core problem, which was usually the Vehicle Routing Problem, see Braekers et al., 2015 and Toth and Vigo, 2001) equipped with additional constraints, and were solved as single monolithic problems. The variants of the Vehicle Routing Problem that are closest to the Traveling Thief Problem are the Time Dependent Vehicle Routing Problem (Malandraki and Daskin, 1992), where the cost of traveling between the cities varies over time, and the Capacitated Vehicle Routing Problem (Ralphs et al., 2003), where the travelers (vehicles) have additional constraints on the total weight of items that can be carried.

### 2.1 State of the Art

As noted in Bonyadi et al. (2013) most of the current research in algorithm design focuses on problems with a single-hard component (Traveling Salesman Problem, Knapsack Problem, and Job Shop Scheduling Problem; Cheng et al., 1996, Davis, 1985, and van Laarhoven et al., 1992), the Vehicle Routing Problem (Toth and Vigo, 2001), whilst most of the real-world problems are multi-hard problems. It has also been shown in the paper that interdependencies between components of multi-hard problems play a chief role in the complexity of these problems. Such interdependencies do not occur in designed, single-hard problems (see, for example, Bonyadi et al., 2014). In order to present the complexity that results from interdependencies in multi-hard problems, the Traveling Thief Problem was introduced.

Bonyadi et al. (2014) introduced a new algorithm (called CoSolver) for solving multi-hard problems, which focuses on the aspect of communication and negotiation between partial subproblems. The main idea behind CoSolver is to decompose TTP into subproblems, solve the subproblems separately, with some communication between the algorithms solving them, and then combine such partial solutions back to the overall solution for TTP. The article also proposed a simple heuristics (called Density-Based Heuristic) as a second approach and compared it to CoSolver. This heuristic first generates a solution for the TSP component of a given TTP problem and then for the found TSP route solves the generalized KP problem so that the objective value is maximized. It is worth noting that Density-Based Heuristic ignores all of the interdependencies between the subproblems. These two algorithms were compared on a series of benchmark instances. The results showed that the CoSolver efficiency was better than Density-Based Heuristic, suggesting that taking into consideration the interdependencies between the subproblems is beneficial.

It was argued in Przybylek et al. (2016) that multi-hardness is not the only crucial aspect of real-world optimization problems. Another important characteristic is that real-world problems usually have to operate in an uncertain and a dynamically changing environment. This observation resulted in a formulation of a probabilistic variant of TTP. The authors also showed how the decomposition-based approach (namely, CoSolver) can be incorporated in this new setting.

The first attempts to solve multi-hard problems were based on methods for large-scale optimization. Typical methods used in such approaches are Newton's method and conjugate gradient method (Faires and Burden, 2003), the partitioned quasi-Newton method (Griewank and Toint, 1982), and linear programming (Bertsimas and Tsitsiklis, 1997). The main disadvantage of these methods is, however, their dependency on the algebraic formalization of the problem and the availability of information about the gradients. In case of many of the real-world problems an algebraic formalization is simply impossible. Therefore, the simulation has to be used to get the evaluation of potential solutions, by providing an output value for a given set of input decision values. This kind of optimization (i.e., a black-box optimization) is widely used in mechanical engineering and many other disciplines. In black-box optimization, metaheuristics such as evolutionary algorithms (EAs) have considerable advantages over conventional single-point derivative-based methods of optimization. Metaheuristics are not based on gradient information and are less likely to be stuck on local optima due to the use of a population of possible solutions. In addition, the recent advances in metaheuristics show that the cooperative coevolutionary algorithms hold great promise for such problems as shown in Yang et al. (2008) and Li and Yao (2009, 2012). Nonetheless, major challenges remain. Finally, there is also modern research on multilevel optimization, where the optimization problems consist of multiple subcomponents that are subject to certain hierarchical relationships (Colson et al., 2007 and Talbi, 2013). In such a setting, the components that are lower in the hierarchy do not include in the optimization process the solutions of the components that are higher in the hierarchy.

To solve complex problems human computational potential can be used (Kearns, 2012). This is, however, a completely new approach in the context of multi-hard problems and at the same time it is a promising and interesting direction of the research. Teams of human decision makers and new heuristic algorithms could improve solutions of these problems.

## 3 Real-World Inspiration

Real, multi-hard optimization problems are solved in practice every day by human decision makers. A good example of such a multi-hard problem is the optimization of the supply chain operations from the mines to the ports (Bonyadi and Michalewicz, 2016 and Bonyadi et al., 2016). These operations include mine planning and scheduling, stockpile management and blending, train scheduling, and port operations—with an overall objective to satisfy customer orders by providing pre-defined quantity of products by a specified date.

Let's look at the some of these operations in more detail:

*Train scheduling*. To operate the trains, there is a railway network, which is usually hired by the mining company so that trains can travel between the mines and the ports. The owner of the railway network sets some restrictions for the operation of trains for each mining company, for example, the number of trains per day for each junction in the network is constant (set by the owner of the railway network) for each mine company. There are a number of self-unloading stations, which are scheduled to unload products from the trains arriving in the port. The mine company schedules and loads trains in the mines of requested material and then sends them to the port, while respecting all constraints (that is, the train scheduling procedure).*Train unloading*. The mine company also has to plan train dumpers to unload the trains and place unloaded products at a port. A port encloses a huge area called the stockyard, several places to berth ships (called the berth), and a waiting area for ships. The stockyard contains certain stockpiles, which are storages of individual products (mixing products in stockpiles is not allowed), and are limited by some capacities.*Ship scheduling*. Ships arriving in ports (time of arrival is often approximate, due to weather conditions) have to wait in the waiting area until the port manager assigns them to a particular berth, where they take specific products to be delivered to the customers. Ships are subject to the penalty costs, called the demurrage—the penalty is applied for each unit of time while the ship is waiting in the port of its arrival. The mining company's goal is to plan the ships and fill them with the requested products, so that the total demurrage consumed by all ships is limited to a minimum.*Loading ships*. There are several ship loaders, which are assigned to each berthed ship to load it with requested products. The ship loaders take products from appropriate stockpiles and load them into ships. It should be noted that different ships may have different requirements for the products, and each product can be taken from a different stockpile, so that the scheduling various ship loaders and choosing different stockpiles in order to meet the demand of the ships may result in different amount of time to finish the loading. It is the task of the owner of the mine to ensure sufficient quantities of each type of product in the stockyard until the ships arrive.

Each of the above-mentioned procedures (train scheduling, train unloading, ship scheduling, and ship loading) is a component of the optimization problem. Of course, each of these components is a problem on its own, which is difficult to solve. In addition to the complexity of the components, solving the components in isolation does not lead to an overall solution to the whole problem. As an example, the optimal solution to the problem of train scheduling (carrying as much material as possible from mines to ports) may result in an insufficient amount of available landfill capacity or even the lack of suitable products for ships that will arrive on schedule. Also, the best plan for dumping products from the trains and keeping them in the stockyard can lead to poor quality of the plan for ship loaders, which would have to move too many times to load the ship.

While TTP is an abstract model of a multi-hard problem, it is also inspired by a real multi-hard problem of optimizing supply-chain operations of a mining company. The KP component of TTP models train loading, while the TSP component models scheduling a train that has to visit several mines. It is clear from this analogy how TTP could be extended to create new multi-hard problems (possibly with more than two components).

However, this complex, real multi-hard problem is solved by the mining corporation, applying two basic approaches: specialization and collaboration or negotiation (Derrouiche et al., 2008). Teams of decision makers work on each component of the problem, such as scheduling trains and ships, separately. These teams are specialized and experienced in solving their particular problem. This kind of a unit in a corporation is also referred to as a “silo,” since it is responsible only for a selected part of operations and does not need deal with other issues. Silos can collaborate and negotiate with each other: a solution proposed by the silo should be consulted with solutions of other silos, since independent solutions frequently interfere or even disturb one another. This collaboration is usually crucial to the success of the management of the whole supply chain, and hence it is usually the responsibility of higher-level management to carry out or oversee negotiations among silos.

The concept of decomposition algorithms and CoSolver is inspired by this social or managerial solution of real multi-hard problems. Decomposition algorithms use solvers of subproblems of the multi-hard problem instead of silos of human decision makers. Similar decomposition methods are used in online collaborative knowledge communities, such as the Wikipedia (Turek et al., 2011 and Wierzbicki et al., 2010). Moreover, a decomposition algorithm needs a method of “negotiation” of the solutions found by its subproblem solvers. This negotiation method can be crucial to the algorithm's success. The general idea is shown in Figure 1. The initial problem is decomposed onto two, possibly overlapping, subproblems: $XA$ and $XB$. Each of the subproblems $XA,XB$ (optionally with some knowledge about the solution $YB,YA$ to the other subproblem from the previous iteration) is passed to a domain-specific solver giving a partial solution $A$-*solution* ($B$-*solution*). Then, the negotiation protocol starts modifying $A$-*solution* to respect $B$-*solution* and modifying $B$-*solution* to respect $A$-*solution*. Finally, the solutions are composed together to obtain a solution to the initial problem.

## 4 TTP: A Model Multi-Hard Problem

In this section, we provide a formal definition of the Traveling Thief Problem (Bonyadi et al., 2013). Given:

a weighted graph $G=\u2329V,E\u232a$, whose nodes $m,n\u2208V$ are called cities, and whose edges $m\u27f6d(m,n)n\u2208E$ from $m$ to $n$ are called distances

an initial city $s\u2208V$

a list of pairs of natural numbers $\u2329wi,pi\u232a1\u2264i\u2264I$, called the list of items; each item $\u2329wi,pi\u232a$ has its weight $wi$ and its profit $pi$

a relation $a$, called the availability of the items, between the cities $V$ and a set ${1,2,\u2026,I}$; the $i$-th item is available in city $n\u2208V$ iff $a(n,i)$ is satisfied

a natural number $C$, called the capacity of the knapsack

a real number $R\u22650$, called the rent ratio

two positive real numbers $vmin\u2264vmax$, called the minimal and maximal speed

the Traveling Thief Problem (TTP) asks what is the most profitable picking plan on the best route that starts and ends in the initial city $s$ and visits each other city exactly once. In more details, let:

$\pi =\u2329\pi 1,\pi 2,\u2026,\pi |V|,\pi |V|+1\u232a$ be a Hamiltonian cycle in $G$ such that $\pi 1=\pi |V|+1=s$ is the starting node

$\sigma :{1,2,\u2026,I}\u2192V$ be a

*partial*function (“the picking plan”) such that every item $i$ which belongs to the domain of $\sigma $ is available in city $\sigma (i)$ (that is, $i\u2208dom(\sigma )\u21d2a(i,\sigma (i))$) and the capacity of the knapsack is never exceeded (that is, $\u2211i\u2208dom(\sigma )wi\u2264C$)

## 5 Algorithms

In our previous work (Bonyadi et al., 2014) we have developed the idea of a decomposition-based algorithm and applied it to TTP obtaining some promising results (see also Section 7). Instances of TTP were decomposed into two components^{1}: TSKP—the Traveling Salesman with Knapsack Problem and KRP—the Knapsack on the Route Problem. In this section, we develop two additional variants of this algorithm: one using a heuristic approach to solve the TSKP component and exact solver for KRP component and another using heuristics for both components. We also describe two heuristic solvers: one that is based on Monte-Carlo Tree Search and another that is based on Ant Colony Optimization. In Section 6, we compare these algorithms against each other.

### 5.1 Decomposition Algorithms

We have identified the following subproblems of TTP—one corresponding to a generalization of TSP, which we shall call the Traveling Salesman with Knapsack Problem (TSKP), and another corresponding to a generalization of KP, which we shall call the Knapsack on the Route Problem (KRP).

Let us start with the definition of TSKP. Given:

a weighted graph $G=\u2329V,E\u232a$, whose nodes $m,n\u2208V$ are called cities, and whose edges $m\u27f6d(m,n)n\u2208E$ from $m$ to $n$ are called distances

an initial city $s\u2208V$

a function $\omega :V\u2192R$ assigning to every node $n\u2208V$ a non-negative real number $\omega (n)$, which may be thought of as the total weight of items picked at city $n$

a natural number $C$, called the capacity of the knapsack

a real number $R\u22650$, called the rent ratio

two positive real numbers $vmin\u2264vmax$, called the minimal and maximal speed

The Knapsack on the Route Problem is the counterpart of the above problem. Given:

a set $V=\u23291,2,\u2026,N\u232a$, whose elements are called cities,

a function $\delta :V\u2192R$ that assigns to each city $n$ a non-negative real number $\delta (n)$, which may be thought of as the distance from city $n$ to the “next” city on some route

a list of pairs of natural numbers $\u2329wi,pi\u232a1\u2264i\u2264I$, called the list of items; each item $\u2329wi,pi\u232a$ has its weight $wi$ and its profit $pi$

a relation $a$, called the availability of the items, between the cities $V$ and a set ${1,2,\u2026,I}$; the $i$-th item is available in city $n\u2208V$ iff $a(n,i)$ is satisfied

a natural number $C$, called the capacity of the knapsack

a real number $R\u22650$, called the rent ratio

two positive real numbers $vmin\u2264vmax$, called the minimal and maximal speed

Observe that our decomposition preserves the relative difficulties of the original components. First of all, because there is a trivial gap-preserving reduction from TSP to TSKP, we obtain the following theorem.

There is no polynomial constant-factor approximation algorithm for the Traveling Salesman with Knapsack Problem unless $P=NP$.

On the other hand, in this section we construct an algorithm for KRP that is polynomial under unary encoding of profits of items (Algorithm 2), which may be turned into a fully polynomial approximation scheme for KRP in the usual way.

There is a fully polynomial approximation scheme for the Knapsack on the Route Problem.

Therefore TSP is computationally equivalent to TSKP and KP is computationally equivalent to KRP.

At this point, one may wonder why we have identified TSKP and KRP subcomponents of TTP, instead of the obvious TSP and KP. As mentioned earlier, two key factors of decomposition-based approach are:

identification of subcomponents of the problem

development of a communication protocol for the subcomponents

These factors are, of course, not completely independent of each other and there are very many important aspects that we have to take into consideration when making such choices:

there should be efficient approximation algorithms for subcomponents

the algorithms for subcomponents should be “stable,” by which we mean that, whenever possible, similar instances of the problem should lead to similar solutions

subcomponents have to be chosen is such a way that makes it possible to develop an effective and efficient negotiation protocol

a good solution to the problem has to be found in a possibly small number of executions of approximation algorithms for subcomponents

the computational overhead of the communication protocol should be reasonably small

Having the above in mind, we can now better understand our choice of subcomponents of TTP. One could naively think that since TTP has been designed as a generalisation of both TSP and KP problems, the natural choices for subcomponents are exactly TSP and KP. Nonetheless, the highly nonlinear interdependencies between TSP and KP parts of TTP, make it difficult to develop efficient and effective negotiation protocol for them.

The negotiation protocol between TSKP and KRP components is presented as Algorithm 1.

Given an instance of TTP, CoSolver starts by creating an instance of KRP that consists of all items of TTP, and distances $\delta (k)$ equal zero. After finding a solution $\sigma $ for this instance, it creates an instance of TSKP by assigning to each city a weight equal to the total weights of items picked at the city according to $\sigma $. A solution for TTP at the initial step consists of a pair $\sigma ,\pi $, where $\pi $ is the route found as a solution to the instance of TSKP. Then the profit $P$ of the solution is calculated. If profit $P$ is better than the best profit $P*$ that has been found so far, the process repeats with distances between nodes adjusted along tour $\pi $.

We may obtain various variants of CoSolver algorithms by plugging various KRP and TSKP components in the negotiation protocol.

We have implemented the following algorithms for KRP component.

Exact solver for KRP (Algorithm 2). The algorithm runs in time and space polynomial under unary encoding of profits of items. It inductively builds a two-dimensional array $P$, such that $P[n][w]$ stores the real profit that can be obtained by transporting items of total weight $w$ through cities up to $n$. The initial values $P[1][w]$ are set to zero for every $0\u2264w\u2264C$. Assuming that we have computed $P[n-1][w]$, the values $P[n][w]$ can be obtained by using the usual dynamic-programming routine for the Knapsack Problem (Algorithm 3) on items that are available at city $n$ minus the difference in costs between traveling from city $n-1$ to city $n$ carrying the empty knapsack and carrying a knapsack that weighs $w$.

Heuristic reduction from KRP to KP and exact solver for KP.

Let $W(i)$ denote the total weight of items picked at cities ${1,2,\u2026,N}$ according to some picking plan. We create an instance of KP with “relaxed profits” in the following way:where $t(i)$ and $t'(i)$ are given by:$p\xafi=pi-R(ti-ti')$and:$t(i)=L(i)vmax-(W(i-1)+wi)vmax-vminCt'(i)=L(i)vmax-W(i-1)vmax-vminC$The items whose relaxed profit is not strictly positive are not taken into consideration when forming an instance of KP.$L(i)=0fori=1\u2211n=iN\delta (n)otherwise$Instances of KP are solved exactly by the dynamic programming approach (Algorithm 3).

Heuristic reduction from KRP to KP and weighted greedy approach to KP.

The reduction proceeds like in the above. To solve an instance of KP we use a variant of the greedy approach—the items are sorted according to the ratio $p\xafiwi\Theta $, where $\Theta \u22650$ is a weighting parameter, and then greedily packed into the knapsack (Algorithm 4). We use a constant set of weighting parameters $\Theta \u2208{0,1e,1,e}$ and return the picking plan for the best parameter $\Theta $. Observe that for $\Theta =0$ we get the usual naive algorithm (“best value first”), and for $\Theta =1$ we get the usual greedy algorithm (“best ratio first”). It may be shown that by using these two parameters only, we get a 1.5-approximation scheme.

We have implemented the following algorithms for TSKP component.

Exact solver for TSKP implemented by the usual branch-and-bound technique.

Heuristic reduction from TSKP to TSP and exact solver for TSP.

Given an instance of TSKP, we create an instance of TSP with the same nodes, but whose distances are substituted by the time needed for the travel:The instances of TSP obtained in this way are exactly solved by the branch-and-bound technique.$d\xaf(\pi n,\pi n+1)=d(\pi n,\pi n+1)vmax-W(n)vmax-vminC$Heuristic reduction from TSKP to TSP and heuristic TSP.

The reduction proceeds like in the above. Instances of TSP are solved by the state-of-the-art solver for TSP (that is, Concorde: Cook (1995)).

### 5.2 Monte-Carlo Tree Search

Monte-Carlo Tree Search (Abramson, 1991) is a metaheuristic for decision processes. Originally, it has been proposed to play board games such as Hex (Arneson et al., 2010), Othello (Robles et al., 2011), and most notably: GO (Coulom, 2009). It has been also successfully applied to some optimization problems, including a variant of TSP (Perez et al., 2012) and VRP (Takes, 2010).

The idea behind Monte-Carlo Tree Search is to sample random solutions and, based on their quality, make the most promising local decision. Here we apply this idea to solve TTP. Starting from the initial city and the empty knapsack we interchangeably perform the following two steps:

(TSP Phase) extend the current route by a node $n$ and run a number of random simulations with the extended route; calculate the best profit $pn$ from all simulations; add to the route node $n$ having maximal profit $pn$

(KP Phase) for every set of items $J\u2286In$ that are available at the current city $n$, extend the knapsack by $J$ and run a number of random simulations with the extended knapsack; calculate the best profit $pJ$ from all simulations; add to the knapsack items $J$ having maximal profit $pJ$

until a complete tour is constructed (Algorithm 5).

### 5.3 Ant Colony Optimization

Methods based on Ant Colony Optimization (ACO) were proposed in early 1990 to solve the Traveling Salesman Problem (Dorigo and Blum, 2005), and later have been extended to problems like the Scheduling Problem (Martens et al., 2007), the Assignment Problem (Ramalhinho Lourenço and Serra, 2002), the Vehicle Routing Problem (Toth and Vigo, 2002), the Set Cover Problem (Leguizamon and Michalewicz, 1999), and many more.

The general idea behind ACO is to iteratively perform the following steps:

construct a number of random solutions; the solutions are constructed incrementally by making local choices with some probabilities $\rho $

evaluate solutions and adjust probabilities $\rho $ of local choices—increase the probabilities of choices that have led to better solutions

Algorithm 6 uses ideas from Alaya et al. (2004) and Dorigo and Blum (2005) to solve TTP. Because there are nontrivial interactions between TSP and KP components in TTP, we had to apply several modifications:

- Because the cost of traveling between cities depends on the current weight of the Knapsack, we first build a random solution to the KP part of the problem, and then extend it with a random tour. The probability of picking an item is given as in Formula 7:but the probability of moving from city $i$ to city $j$ is defined according to the time of the travel instead of its distance:$\rho \xaf(i)=\tau \xaf(i)\alpha (i)q(i)-\beta \u22111\u2264i\u2264n\tau \xaf(i)\alpha (i)q(i)-\beta $$\rho (i,j)=\tau \alpha (i,j)t(i,j)-\beta \u2211(i,k)\u2208G\tau \alpha (i,k)t(i,k)-\beta $
- Contrary to KP, in TTP an optimal solution may consist of fewer items than it is allowed by the capacity of the knapsack (i.e., because the weight of the knapsack impacts the speed of the thief, dropping an item from a solution may lead to a better solution). Therefore, ACO has to discover an upper bound on the total weight of items in the knapsack. If $W$ is the capacity of the knapsack, then we use $\u230alog(W)\u230b+1$ bits to encode the upper bound on the weight of items. The probability that the $i$-th bit of the upper bound is $x$ is:and the pheromone $\tau \xaf\xaf(i=x)$ is updated during each iteration of the algorithm for each random solution $\sigma xr$ with the upper bound having $i$-th bit set to $x$:$\rho \xaf\xaf(i=x)=\tau \xaf\xaf(i=x)\tau \xaf\xaf(i=1)+\tau (i=0)$$\tau \xaf\xaf'(i=x)=(1-\gamma )\tau \xaf\xaf(i=x)+\u2211\sigma xrP(\sigma xr)$
The initial pheromone is uniformly distributed across cities and items.

### 5.4 Exact Solver for TTP

Exact solver implements the branch-and-bound technique to solve TTP. For each Hamiltonian cycle an instance of the Knapsack Problem is considered, where the value of an item is amortized by the minimal cost required for its transport. Branch-and-bound is used both in generating Hamiltonian cycles, as well as in solving the Knapsack Problems. Exact solver guarantees optimality of produced solutions and serves as a benchmark algorithm against which other algorithms are compared.

## 6 Benchmarks

To compare performance of algorithms for TTP, we prepared in Bonyadi et al. (2014) a generic framework for generating classes of TTP-instances. Each class was composed of three independent components: *meta*, *TSP*, and *KP*. We recall from Bonyadi et al. (2014) the explanation of these components in the below. Depending on the configuration parameters of the components, one is able to create separate classes of TTP-instances. In addition to Bonyadi et al. (2014), the *TSP* component includes well-known benchmark instances from a public database.

(Meta) This component describes graph and items independent parameters of the Traveling Thief Problem: a natural number $C$ describing the capacity of the knapsack, a non-negative real number $R$ indicating the rent ratio, and two positive real numbers $vmin\u2264vmax$ corresponding to the minimal and maximal speed of the traveler. Thanks to these parameters we can adjust the coupling between the subcomponents of TTP (if $vmax=vmin$ the subproblems are “fully sequential”—there is no interaction between subproblems) and their relative importance (if the rent rate $R=0$, the solution to the TSP part may be completely ignored, as it has no impact on the objective function of TTP).

(TSP) This component describes the graph of the Traveling Thief Problem. Such a graph is a pair $\u2329V,E\u232a$ consisting of a set of nodes $m,n\u2208V$ (called cities), and a set of edges $m\u27f6d(m,n)n\u2208E$ from $m$ to $n$ (called distances). We have used four sources of graphs (for more details, see Bonyadi et al., 2014):

*random graphs*(Solomonoff and Rapoport, 1951) (distances are independently assigned according to some priori distribution),*Euclidean graphs*(the nodes of the underlying graph are embedded in some low-dimensional Euclidean space),*Hamiltonian-dense graphs*(the number of paths that can be extended to the full Hamiltonian cycle is relatively high; the main motivation behind this class of graphs is to make the problem of finding Hamiltonian cycles easy), a class based on a well-known set of benchmark for TSP (http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95).(KP) This component describes the set of items together with the availability relation of the Traveling Thief Problem. An item $i$ is a pair of natural numbers $\u2329wi,pi\u232a$, where $wi$ is called the weight of the item, and $pi$ is called the profit of the item. The availability relation between the cities $V$ and a set ${1,2,\u2026,I}$ says that the $i$-th item is available at city $n\u2208V$ iff $a(n,i)$ is true. We have used two classes of KP instances:

*uncorrelated weights and values*(weights, values, and availability of the items are independently sampled from some priori distributions) and*greedy-prof*(because a KP-instance in which values and weights of items are uncorrelated can be easily solved by the greedy algorithm to a high-quality solution; see Pisinger, 2005; we generated KP-component instances that are resistant to such approaches).

The parameters of the benchmark instances are set to the same values as in Bonyadi et al. (2014).

## 7 Experimental Results

In order to compare algorithms for TTP, we generated various instances with different numbers of cities (from 3 to 76) and items (from 10 to 146) with the parameters listed in Section 6. The full set of benchmark instances together with the scripts to generate them is available at the website (Przybylek, 2015).

Observe that for any positive constant $K$, one may rescale the values $pi$ and $R$ in an instance by $K$, obtaining another instance, whose solutions are exactly the same, but profits of the solutions are rescaled by $K$ (thus, the qualities of the solutions are preserved by the rescaling operation). Moreover, the instances are not *localised*, which means that for any non-negative constant $D$ and any instance of TTP, one may build an equivalent instance whose solutions have profits translated by $D$—it suffices to substitute the starting node by a pair of nodes with a single edge of an appropriate distance (see Figure 2).

Any reasonable algorithm should return a solution having quality between $0%$ and $100%$. Here $0%$ means that the algorithm produced an average (i.e., random) solution, and $100%$ means that the algorithm produced an optimal solution. One may actually think of this quality as of the “smartness” of an algorithm, where $0%$ does not require any work (i.e., statistically, it suffices to construct a random solution), negative values indicate that the algorithm has been misled (i.e., it has produced solutions worse than the solutions that do not require any computation), and values 0–100% measure the real effectiveness of the algorithm. Nonetheless, there is one problem here—the average profit $P#$ depends, of course, on the probability distribution on the spaces of possible solutions; and since a solution to TTP comprises of a solution to the Hamiltonian Problem, which is NP-complete, one should not expect that a random solution to TTP generated according to any polynomial distribution would be feasible. We were forced to use a different strategy—first we generated a random Hamiltonian cycle, and then supplied it with randomly chosen items. Therefore, one has to remember, that a “random solution” is not that easy to obtain—there is a highly nontrivial problem underlying the random samples.

Our algorithms are applied to the benchmark problems and their results are compared. The methods that use any kind of nondeterminism (Ant Colony Optimization, Monte-Carlo Tree Search) were run 16 times and the average solutions have been taken for the final results. In addition, for the main set of benchmark problems, where we could obtain exact results, we present a graph of the performance of nondeterministic algorithms with error bars indicating the best and worst solutions and the $95%$ confidence interval for every instance (see Figures 3 and 4). Figure 4 shows that the confidence intervals are usually quite narrow, allowing for a clear comparison of performance of the various methods, which justifies our choice of the number of runs for the nondeterministic heuristics.

We also designed an exhaustive search algorithm that solves the main benchmark set to the optimality, and estimated an average solution of each of the benchmark problems. The benchmarks are divided on three classes. The full set of results is available at the website (Przybylek, 2015).

### 7.1 Typical

This class of benchmarks contains typical instances of TTP as described in the previous section. The results are presented in Table 1, where Average is an estimated average profit, CoSolver is the original CoSolver algorithm as introduced in Bonyadi et al. (2014), CoSolver Exact is a variant of CoSolver based on a heuristic method for TSKP component and exact solver for KRP component (Algorithm 2), CoSolver Heuristic is a variant of CoSolver based on a heuristic methods for both of its components, MCTS is a heuristic based on Monte-Carlo Tree Search (Algorithm 5), and ACO is a method based on Ant Colony Optimization (Algorithm 6). Table 1 shows that CoSolver Heuristic and MCTS never produce bad solutions—the worst are almost twice as good as the average solution. Moreover, MCTS outperforms ACO in most cases.

Exact | Average | CoSolver | CoSolver E | CoSolver H | MCTS | ACO | ||||||

Benchmark | Profit | Profit | Profit | Q | Profit | Q | Profit | Q | Profit | Q | Profit | Q |

Euclidean | $-$230563 | $-$317652 | $-$241667 | 87% | $-$241667 | 87% | $-$230585 | 100% | $-$248449 | 79% | $-$244484 | 84% |

$-$18210 | $-$30219 | $-$18210 | 100% | $-$18210 | 100% | $-$19918 | 86% | $-$21683 | 71% | $-$22087 | 68% | |

$-$38782 | $-$48162 | $-$57438 | $-$99% | $-$66482 | $-$195% | $-$38833 | 99% | $-$39560 | 92% | $-$39136 | 96% | |

$-$155161 | $-$205688 | $-$155161 | 100% | $-$155815 | 99% | $-$155334 | 100% | $-$155334 | 100% | $-$156253 | 98% | |

5038 | 323 | 5009 | 99% | 5009 | 99% | 4981 | 99% | 2838 | 53% | 2688 | 50% | |

$-$36042 | $-$51009 | $-$36042 | 100% | $-$36042 | 100% | $-$38696 | 82% | $-$39101 | 80% | $-$40688 | 69% | |

1289 | $-$854 | 1289 | 100% | 788 | 77% | 538 | 65% | 486 | 63% | 228 | 50% | |

$-$122329 | $-$202694 | $-$143547 | 74% | $-$159672 | 54% | $-$122329 | 100% | $-$134387 | 85% | $-$151206 | 64% | |

Euclidean avg | $-$74345 | $-$106994 | $-$80721 | 70% | $-$84012 | 53% | $-$75022 | 91% | $-$79399 | 78% | $-$81367 | 72% |

Dense | $-$88984 | $-$116443 | $-$93952 | 82% | $-$93952 | 82% | $-$89299 | 99% | $-$96257 | 74% | $-$98218 | 66% |

$-$32662 | $-$82552 | $-$32921 | 99% | $-$32860 | 100% | $-$59606 | 46% | $-$63856 | 37% | $-$59136 | 47% | |

$-$25346 | $-$69790 | $-$36360 | 75% | $-$36360 | 75% | $-$49236 | 46% | $-$53125 | 37% | $-$48176 | 49% | |

Dense avg | $-$48997 | $-$89595 | $-$54411 | 86% | $-$54391 | 86% | $-$66047 | 64% | $-$71079 | 49% | $-$68510 | 54% |

Small | 17274 | $-$19338 | 17024 | 99% | 17024 | 99% | 11590 | 84% | $-$4283 | 41% | $-$1482 | 49% |

$-$38181 | $-$63763 | $-$38181 | 100% | $-$38659 | 98% | $-$38181 | 100% | $-$44329 | 76% | $-$44446 | 76% | |

$-$17695 | $-$24310 | $-$19277 | 76% | $-$19277 | 76% | $-$17771 | 99% | $-$18549 | 87% | $-$18561 | 87% | |

$-$30616 | $-$38015 | $-$34807 | 43% | $-$35259 | 37% | $-$30796 | 98% | $-$31832 | 84% | $-$30796 | 98% | |

$-$63706 | $-$72850 | $-$75158 | $-$25% | $-$75291 | $-$27% | $-$63706 | 100% | $-$65071 | 85% | $-$63706 | 100% | |

$-$58489 | $-$77757 | $-$63518 | 74% | $-$63641 | 73% | $-$59258 | 96% | $-$61949 | 82% | $-$63230 | 75% | |

$-$32946 | $-$74510 | $-$34105 | 97% | $-$34817 | 95% | $-$32946 | 100% | $-$42821 | 76% | $-$50334 | 58% | |

Small avg | $-$32051 | $-$52935 | $-$35432 | 66% | $-$35703 | 65% | $-$33010 | 97% | $-$38405 | 76% | $-$38936 | 77% |

Random | $-$19428 | $-$34468 | $-$22100 | 82% | $-$21945 | 83% | $-$19591 | 99% | $-$23082 | 76% | $-$24401 | 67% |

$-$20176 | $-$38693 | $-$28872 | 53% | $-$28872 | 53% | $-$20482 | 98% | $-$22229 | 89% | $-$23556 | 82% | |

7369 | 1696 | 7351 | 100% | 7351 | 100% | 7108 | 95% | 3731 | 36% | 4222 | 45% | |

5521 | 1632 | 5521 | 100% | 5521 | 100% | 5507 | 100% | 2603 | 25% | 3043 | 36% | |

2104 | 470 | 2104 | 100% | 2104 | 100% | 2085 | 99% | 670 | 12% | 740 | 17% | |

9969 | 1661 | 9955 | 100% | 9964 | 100% | 8221 | 79% | 2741 | 13% | 3105 | 17% | |

8834 | 1023 | 8834 | 100% | 8830 | 100% | 8833 | 100% | 3406 | 30% | 3331 | 30% | |

Random avg | $-$829 | $-$9525 | $-$2458 | 91% | $-$2435 | 91% | $-$1189 | 96% | $-$4594 | 40% | $-$4788 | 42% |

KP Centric | 39937 | 2459 | 39935 | 100% | 39937 | 100% | 29281 | 72% | 28917 | 71% | 17926 | 41% |

69336 | 1532 | 69335 | 100% | 69331 | 100% | 36542 | 52% | 38272 | 54% | 17117 | 23% | |

90025 | 1828 | 89992 | 100% | 90019 | 100% | 81479 | 90% | 55366 | 61% | 18346 | 19% | |

69484 | 1906 | 69478 | 100% | 69484 | 100% | 62682 | 90% | 40603 | 57% | 17778 | 23% | |

74234 | 1824 | 74233 | 100% | 74234 | 100% | 60684 | 81% | 46817 | 62% | 21952 | 28% | |

65531 | 1850 | 65524 | 100% | 65524 | 100% | 65043 | 99% | 34558 | 51% | 17983 | 25% | |

80049 | 2629 | 80046 | 100% | 80046 | 100% | 56291 | 69% | 52480 | 64% | 19881 | 22% | |

59604 | 1611 | 59597 | 100% | 59604 | 100% | 51167 | 85% | 30234 | 49% | 17478 | 27% | |

50766 | 2498 | 50743 | 100% | 50757 | 100% | 27678 | 52% | 36377 | 70% | 18115 | 32% | |

27358 | 886 | 27358 | 100% | 27358 | 100% | 17082 | 61% | 21651 | 78% | 11272 | 39% | |

KP cent. avg | 62632 | 1902 | 62624 | 100% | 62629 | 100% | 48793 | 75% | 38527 | 62% | 17785 | 28% |

Greedy | 1178064 | 49008 | 1178064 | 100% | 1178064 | 100% | 1002796 | 84% | 1168390 | 99% | 1008409 | 85% |

1551296 | 61677 | 1551296 | 100% | 1551296 | 100% | 1342664 | 86% | 1447922 | 93% | 1353361 | 87% | |

659186 | 29237 | 659186 | 100% | 659186 | 100% | 576326 | 87% | 650737 | 99% | 580318 | 87% | |

1384486 | 79465 | 1384486 | 100% | 1384486 | 100% | 1186825 | 85% | 1331286 | 96% | 1195859 | 86% | |

1401970 | 58205 | 1401970 | 100% | 1401970 | 100% | 1224622 | 87% | 1340589 | 95% | 1233473 | 87% | |

1188129 | 54522 | 1188129 | 100% | 1188129 | 100% | 1020886 | 85% | 1131986 | 95% | 1038696 | 87% | |

1143685 | 40462 | 1143685 | 100% | 1143685 | 100% | 997422 | 87% | 1131531 | 99% | 1006294 | 88% | |

2099247 | 106191 | 2099247 | 100% | 2099247 | 100% | 1812908 | 86% | 2056908 | 98% | 1843659 | 87% | |

771036 | 36070 | 771036 | 100% | 771036 | 100% | 649061 | 83% | 721852 | 93% | 662999 | 85% | |

1425409 | 87466 | 1425409 | 100% | 1425409 | 100% | 1225759 | 85% | 1367081 | 96% | 1241604 | 86% | |

Greedy avg | 1264122 | 57204 | 1264122 | 100% | 1264122 | 100% | 1090390 | 86% | 1220133 | 96% | 1102563 | 87% |

Average | 276820 | $-$20903 | 274545 | 87% | 273924 | 84% | 233100 | 86% | 257425 | 70% | 226222 | 61% |

Exact | Average | CoSolver | CoSolver E | CoSolver H | MCTS | ACO | ||||||

Benchmark | Profit | Profit | Profit | Q | Profit | Q | Profit | Q | Profit | Q | Profit | Q |

Euclidean | $-$230563 | $-$317652 | $-$241667 | 87% | $-$241667 | 87% | $-$230585 | 100% | $-$248449 | 79% | $-$244484 | 84% |

$-$18210 | $-$30219 | $-$18210 | 100% | $-$18210 | 100% | $-$19918 | 86% | $-$21683 | 71% | $-$22087 | 68% | |

$-$38782 | $-$48162 | $-$57438 | $-$99% | $-$66482 | $-$195% | $-$38833 | 99% | $-$39560 | 92% | $-$39136 | 96% | |

$-$155161 | $-$205688 | $-$155161 | 100% | $-$155815 | 99% | $-$155334 | 100% | $-$155334 | 100% | $-$156253 | 98% | |

5038 | 323 | 5009 | 99% | 5009 | 99% | 4981 | 99% | 2838 | 53% | 2688 | 50% | |

$-$36042 | $-$51009 | $-$36042 | 100% | $-$36042 | 100% | $-$38696 | 82% | $-$39101 | 80% | $-$40688 | 69% | |

1289 | $-$854 | 1289 | 100% | 788 | 77% | 538 | 65% | 486 | 63% | 228 | 50% | |

$-$122329 | $-$202694 | $-$143547 | 74% | $-$159672 | 54% | $-$122329 | 100% | $-$134387 | 85% | $-$151206 | 64% | |

Euclidean avg | $-$74345 | $-$106994 | $-$80721 | 70% | $-$84012 | 53% | $-$75022 | 91% | $-$79399 | 78% | $-$81367 | 72% |

Dense | $-$88984 | $-$116443 | $-$93952 | 82% | $-$93952 | 82% | $-$89299 | 99% | $-$96257 | 74% | $-$98218 | 66% |

$-$32662 | $-$82552 | $-$32921 | 99% | $-$32860 | 100% | $-$59606 | 46% | $-$63856 | 37% | $-$59136 | 47% | |

$-$25346 | $-$69790 | $-$36360 | 75% | $-$36360 | 75% | $-$49236 | 46% | $-$53125 | 37% | $-$48176 | 49% | |

Dense avg | $-$48997 | $-$89595 | $-$54411 | 86% | $-$54391 | 86% | $-$66047 | 64% | $-$71079 | 49% | $-$68510 | 54% |

Small | 17274 | $-$19338 | 17024 | 99% | 17024 | 99% | 11590 | 84% | $-$4283 | 41% | $-$1482 | 49% |

$-$38181 | $-$63763 | $-$38181 | 100% | $-$38659 | 98% | $-$38181 | 100% | $-$44329 | 76% | $-$44446 | 76% | |

$-$17695 | $-$24310 | $-$19277 | 76% | $-$19277 | 76% | $-$17771 | 99% | $-$18549 | 87% | $-$18561 | 87% | |

$-$30616 | $-$38015 | $-$34807 | 43% | $-$35259 | 37% | $-$30796 | 98% | $-$31832 | 84% | $-$30796 | 98% | |

$-$63706 | $-$72850 | $-$75158 | $-$25% | $-$75291 | $-$27% | $-$63706 | 100% | $-$65071 | 85% | $-$63706 | 100% | |

$-$58489 | $-$77757 | $-$63518 | 74% | $-$63641 | 73% | $-$59258 | 96% | $-$61949 | 82% | $-$63230 | 75% | |

$-$32946 | $-$74510 | $-$34105 | 97% | $-$34817 | 95% | $-$32946 | 100% | $-$42821 | 76% | $-$50334 | 58% | |

Small avg | $-$32051 | $-$52935 | $-$35432 | 66% | $-$35703 | 65% | $-$33010 | 97% | $-$38405 | 76% | $-$38936 | 77% |

Random | $-$19428 | $-$34468 | $-$22100 | 82% | $-$21945 | 83% | $-$19591 | 99% | $-$23082 | 76% | $-$24401 | 67% |

$-$20176 | $-$38693 | $-$28872 | 53% | $-$28872 | 53% | $-$20482 | 98% | $-$22229 | 89% | $-$23556 | 82% | |

7369 | 1696 | 7351 | 100% | 7351 | 100% | 7108 | 95% | 3731 | 36% | 4222 | 45% | |

5521 | 1632 | 5521 | 100% | 5521 | 100% | 5507 | 100% | 2603 | 25% | 3043 | 36% | |

2104 | 470 | 2104 | 100% | 2104 | 100% | 2085 | 99% | 670 | 12% | 740 | 17% | |

9969 | 1661 | 9955 | 100% | 9964 | 100% | 8221 | 79% | 2741 | 13% | 3105 | 17% | |

8834 | 1023 | 8834 | 100% | 8830 | 100% | 8833 | 100% | 3406 | 30% | 3331 | 30% | |

Random avg | $-$829 | $-$9525 | $-$2458 | 91% | $-$2435 | 91% | $-$1189 | 96% | $-$4594 | 40% | $-$4788 | 42% |

KP Centric | 39937 | 2459 | 39935 | 100% | 39937 | 100% | 29281 | 72% | 28917 | 71% | 17926 | 41% |

69336 | 1532 | 69335 | 100% | 69331 | 100% | 36542 | 52% | 38272 | 54% | 17117 | 23% | |

90025 | 1828 | 89992 | 100% | 90019 | 100% | 81479 | 90% | 55366 | 61% | 18346 | 19% | |

69484 | 1906 | 69478 | 100% | 69484 | 100% | 62682 | 90% | 40603 | 57% | 17778 | 23% | |

74234 | 1824 | 74233 | 100% | 74234 | 100% | 60684 | 81% | 46817 | 62% | 21952 | 28% | |

65531 | 1850 | 65524 | 100% | 65524 | 100% | 65043 | 99% | 34558 | 51% | 17983 | 25% | |

80049 | 2629 | 80046 | 100% | 80046 | 100% | 56291 | 69% | 52480 | 64% | 19881 | 22% | |

59604 | 1611 | 59597 | 100% | 59604 | 100% | 51167 | 85% | 30234 | 49% | 17478 | 27% | |

50766 | 2498 | 50743 | 100% | 50757 | 100% | 27678 | 52% | 36377 | 70% | 18115 | 32% | |

27358 | 886 | 27358 | 100% | 27358 | 100% | 17082 | 61% | 21651 | 78% | 11272 | 39% | |

KP cent. avg | 62632 | 1902 | 62624 | 100% | 62629 | 100% | 48793 | 75% | 38527 | 62% | 17785 | 28% |

Greedy | 1178064 | 49008 | 1178064 | 100% | 1178064 | 100% | 1002796 | 84% | 1168390 | 99% | 1008409 | 85% |

1551296 | 61677 | 1551296 | 100% | 1551296 | 100% | 1342664 | 86% | 1447922 | 93% | 1353361 | 87% | |

659186 | 29237 | 659186 | 100% | 659186 | 100% | 576326 | 87% | 650737 | 99% | 580318 | 87% | |

1384486 | 79465 | 1384486 | 100% | 1384486 | 100% | 1186825 | 85% | 1331286 | 96% | 1195859 | 86% | |

1401970 | 58205 | 1401970 | 100% | 1401970 | 100% | 1224622 | 87% | 1340589 | 95% | 1233473 | 87% | |

1188129 | 54522 | 1188129 | 100% | 1188129 | 100% | 1020886 | 85% | 1131986 | 95% | 1038696 | 87% | |

1143685 | 40462 | 1143685 | 100% | 1143685 | 100% | 997422 | 87% | 1131531 | 99% | 1006294 | 88% | |

2099247 | 106191 | 2099247 | 100% | 2099247 | 100% | 1812908 | 86% | 2056908 | 98% | 1843659 | 87% | |

771036 | 36070 | 771036 | 100% | 771036 | 100% | 649061 | 83% | 721852 | 93% | 662999 | 85% | |

1425409 | 87466 | 1425409 | 100% | 1425409 | 100% | 1225759 | 85% | 1367081 | 96% | 1241604 | 86% | |

Greedy avg | 1264122 | 57204 | 1264122 | 100% | 1264122 | 100% | 1090390 | 86% | 1220133 | 96% | 1102563 | 87% |

Average | 276820 | $-$20903 | 274545 | 87% | 273924 | 84% | 233100 | 86% | 257425 | 70% | 226222 | 61% |

The two last set of benchmarks was tuned to mislead “greedy heuristics” of the KP-subcomponent. Notice that the solutions generated by algorithms that are sensitive to greedy-proof instances—that is, CoSolver Heuristic and Monte Carlo Tree Search—are still of a reasonable quality.

Figure 3 shows performances of CoSolver Heuristic against ACO and MCTS with error bars indicating the best and worst solutions for every instance, and Figure 4 shows the $95%$ confidence intervals.

It is worth noticing that for many classes of problems the version of CoSolver that is based on purely heuristic components (CoSolver Heuristic) performs better than the original CoSolver on the average. Moreover, Table 1 shows that both the original CoSolver and CoSolver Exact may get misled and produce a worse than random solutions (red cells in Table 1). One may explain this phenomenon by the fact that although heuristic components give partial solutions that are locally worse than optimal, the solutions are also less sensitive to further changes and, therefore, can potentially lead to better global solutions. Additionally, CoSolver Heuristic greatly improves the scalability of CoSolver without compromising quality. Observe, also, that while not as good as methods based on pre-existing components, Monte-Carlo Tree Search may provide an interesting alternative in case when there is a little knowledge about subcomponents of the initial problem, or the coupling between the subcomponents is high, making negotiation protocols infeasible.

### 7.2 Known TSP Benchmarks

To build a competitive set of benchmarks for TTP, we decided to use a well-known public database of symmetric and asymmetric TSP instances: http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95 and extend them with randomly generated items. The instances, however, are too big to be solved to optimality by the exact solver, or even to be solved by the CoSolvers with exact components. Therefore, we produced the results for CoSolver Heuristic, MCTS and ACO only. The benchmarks are presented in Table 2. Columns *Profit* describe the profit obtained by a given algorithm, and columns *Gain* describe the “gain” obtained by an algorithm with respect to the average solution.

Average | CoSolver H | MCTS | ACO | ||||

Benchmark | Profit | Profit | Gain | Profit | Gain | Profit | Gain |

att48 | $-$1189 | $-$165 | 1024 | $-$682 | 507 | $-$929 | 260 |

bayg29 | $-$199 | $-$67 | 132 | $-$113 | 87 | $-$155 | 44 |

bays29 | $-$200 | $-$67 | 132 | $-$119 | 80 | $-$159 | 41 |

berlin52 | $-$585 | $-$57 | 528 | $-$366 | 219 | $-$482 | 103 |

br17 | $-$289 | 19 | 308 | 19 | 308 | $-$98 | 191 |

brazil58 | $-$1205 | $-$160 | 1045 | $-$721 | 484 | $-$967 | 237 |

burma14 | $-$79 | $-$40 | 40 | $-$45 | 34 | $-$56 | 23 |

dantzig42 | $-$874 | $-$126 | 748 | $-$443 | 431 | $-$695 | 179 |

eil51 | $-$503 | 4 | 507 | $-$217 | 286 | $-$381 | 122 |

eil76 | $-$2039 | $-$208 | 1831 | $-$1306 | 734 | $-$1688 | 351 |

fri26 | $-$237 | $-$82 | 155 | $-$112 | 125 | $-$183 | 54 |

ft53 | $-$615 | $-$162 | 453 | $-$434 | 181 | $-$531 | 84 |

ft70 | $-$973 | $-$383 | 590 | $-$625 | 347 | $-$841 | 132 |

ftv33 | $-$239 | $-$70 | 169 | $-$125 | 114 | $-$191 | 48 |

ftv35 | $-$721 | $-$214 | 506 | $-$453 | 267 | $-$588 | 133 |

ftv38 | $-$122 | $-$36 | 86 | $-$64 | 59 | $-$94 | 28 |

ftv44 | $-$986 | $-$166 | 820 | $-$630 | 357 | $-$800 | 186 |

ftv47 | $-$975 | $-$167 | 807 | $-$628 | 346 | $-$800 | 175 |

ftv55 | $-$1671 | $-$203 | 1468 | $-$1070 | 600 | $-$1403 | 268 |

ftv64 | $-$1128 | $-$223 | 904 | $-$764 | 364 | $-$958 | 170 |

ftv70 | $-$2304 | $-$404 | 1900 | $-$1523 | 782 | $-$1929 | 375 |

gr17 | $-$100 | 40 | 140 | 18 | 118 | $-$11 | 89 |

gr21 | $-$173 | $-$61 | 112 | $-$94 | 79 | $-$132 | 41 |

gr24 | $-$172 | $-$63 | 109 | $-$52 | 120 | $-$112 | 60 |

gr48 | $-$1053 | $-$221 | 833 | $-$620 | 434 | $-$846 | 208 |

hk48 | $-$1081 | $-$171 | 910 | $-$673 | 408 | $-$865 | 215 |

p43 | $-$955 | $-$168 | 787 | $-$190 | 765 | $-$522 | 432 |

pr76 | $-$2350 | $-$244 | 2106 | $-$1483 | 868 | $-$1927 | 423 |

ry48p | $-$960 | $-$166 | 794 | $-$577 | 382 | $-$747 | 212 |

swiss42 | $-$748 | $-$117 | 631 | $-$440 | 308 | $-$588 | 160 |

ulysses16 | $-$93 | $-$52 | 42 | $-$18 | 76 | $-$42 | 51 |

ulysses22 | $-$220 | $-$93 | 127 | $-$125 | 94 | $-$168 | 52 |

Average | $-$782 | $-$134 | 648 | $-$459 | 324 | $-$622 | 161 |

Average | CoSolver H | MCTS | ACO | ||||

Benchmark | Profit | Profit | Gain | Profit | Gain | Profit | Gain |

att48 | $-$1189 | $-$165 | 1024 | $-$682 | 507 | $-$929 | 260 |

bayg29 | $-$199 | $-$67 | 132 | $-$113 | 87 | $-$155 | 44 |

bays29 | $-$200 | $-$67 | 132 | $-$119 | 80 | $-$159 | 41 |

berlin52 | $-$585 | $-$57 | 528 | $-$366 | 219 | $-$482 | 103 |

br17 | $-$289 | 19 | 308 | 19 | 308 | $-$98 | 191 |

brazil58 | $-$1205 | $-$160 | 1045 | $-$721 | 484 | $-$967 | 237 |

burma14 | $-$79 | $-$40 | 40 | $-$45 | 34 | $-$56 | 23 |

dantzig42 | $-$874 | $-$126 | 748 | $-$443 | 431 | $-$695 | 179 |

eil51 | $-$503 | 4 | 507 | $-$217 | 286 | $-$381 | 122 |

eil76 | $-$2039 | $-$208 | 1831 | $-$1306 | 734 | $-$1688 | 351 |

fri26 | $-$237 | $-$82 | 155 | $-$112 | 125 | $-$183 | 54 |

ft53 | $-$615 | $-$162 | 453 | $-$434 | 181 | $-$531 | 84 |

ft70 | $-$973 | $-$383 | 590 | $-$625 | 347 | $-$841 | 132 |

ftv33 | $-$239 | $-$70 | 169 | $-$125 | 114 | $-$191 | 48 |

ftv35 | $-$721 | $-$214 | 506 | $-$453 | 267 | $-$588 | 133 |

ftv38 | $-$122 | $-$36 | 86 | $-$64 | 59 | $-$94 | 28 |

ftv44 | $-$986 | $-$166 | 820 | $-$630 | 357 | $-$800 | 186 |

ftv47 | $-$975 | $-$167 | 807 | $-$628 | 346 | $-$800 | 175 |

ftv55 | $-$1671 | $-$203 | 1468 | $-$1070 | 600 | $-$1403 | 268 |

ftv64 | $-$1128 | $-$223 | 904 | $-$764 | 364 | $-$958 | 170 |

ftv70 | $-$2304 | $-$404 | 1900 | $-$1523 | 782 | $-$1929 | 375 |

gr17 | $-$100 | 40 | 140 | 18 | 118 | $-$11 | 89 |

gr21 | $-$173 | $-$61 | 112 | $-$94 | 79 | $-$132 | 41 |

gr24 | $-$172 | $-$63 | 109 | $-$52 | 120 | $-$112 | 60 |

gr48 | $-$1053 | $-$221 | 833 | $-$620 | 434 | $-$846 | 208 |

hk48 | $-$1081 | $-$171 | 910 | $-$673 | 408 | $-$865 | 215 |

p43 | $-$955 | $-$168 | 787 | $-$190 | 765 | $-$522 | 432 |

pr76 | $-$2350 | $-$244 | 2106 | $-$1483 | 868 | $-$1927 | 423 |

ry48p | $-$960 | $-$166 | 794 | $-$577 | 382 | $-$747 | 212 |

swiss42 | $-$748 | $-$117 | 631 | $-$440 | 308 | $-$588 | 160 |

ulysses16 | $-$93 | $-$52 | 42 | $-$18 | 76 | $-$42 | 51 |

ulysses22 | $-$220 | $-$93 | 127 | $-$125 | 94 | $-$168 | 52 |

Average | $-$782 | $-$134 | 648 | $-$459 | 324 | $-$622 | 161 |

Table 2 confirms that CoSolver Heuristic outperforms MCTS and ACO, and that MCTS is fairly better than ACO.

### 7.3 Coupling Based

We have also tested performance of our algorithms with respect to coupling between subcomponents. An instance of TTP is “sequential” if a good solution can be obtained by independently solving its first component, and, on top of it, solving its second component. We have prepared six sets of instances with increasing levels of sequentiality and applied both the CoSolver and MCTS algorithms to them. The normalised results are shown in Table 3.

Average | CoSolver H | MCTS | ACO | ||||

Benchmark | Profit | Profit | Gain | Profit | Gain | Profit | Gain |

Highly dependent | 120211 | 335000 | 214789 | 389007 | 268796 | 314353 | 194142 |

Dependent | 121727 | 425000 | 303273 | 401650 | 279923 | 336261 | 214534 |

Balanced | 121482 | 455000 | 333518 | 405873 | 284391 | 330917 | 209435 |

Moderately sequential | 130486 | 465000 | 334514 | 418095 | 287609 | 349277 | 218791 |

Fully sequential | 137186 | 470000 | 332814 | 420003 | 282817 | 355598 | 218411 |

Average | CoSolver H | MCTS | ACO | ||||

Benchmark | Profit | Profit | Gain | Profit | Gain | Profit | Gain |

Highly dependent | 120211 | 335000 | 214789 | 389007 | 268796 | 314353 | 194142 |

Dependent | 121727 | 425000 | 303273 | 401650 | 279923 | 336261 | 214534 |

Balanced | 121482 | 455000 | 333518 | 405873 | 284391 | 330917 | 209435 |

Moderately sequential | 130486 | 465000 | 334514 | 418095 | 287609 | 349277 | 218791 |

Fully sequential | 137186 | 470000 | 332814 | 420003 | 282817 | 355598 | 218411 |

The impact of “coupling” of TTP components on the difficulty of obtaining good results using our heuristics is clear. TTP instances in which components are more dependent on each other are more difficult to solve well. This result gives insight into the difficulty of other multi-hard problems. Even though multi-hard problems may be in the same complexity class as their components, they can be more difficult than each of the components and this difficulty increases with increasing component interdependence.

Also, for most of the considered instance types, CoSolver H (the algorithm that aims to exploit the multi-hard problem structure) does a better job than MCTS and ACO. However, results also suggest that MCTS-based algorithms perform better on problems that have large cohesion between their subcomponents (highly dependent). This shows that further work is needed to design algorithms that can better exploit problem structure. Recall that CoSolver's design relies on good methods to “negotiate” a solution between solvers for the components of a multi-hard problem. A very high coupling (dependency) between the components of a multi-hard problem seems to make this “negotiation” less effective.

## 8 Conclusions and Further Work

In this article, we have introduced the concept of multi-hardness—that is, problems that are nontrivial combinations of classical hard problems. We have studied algorithms that exploit the structure of multi-hard problems through an evaluation of such algorithms for solving TTP, a model multi-hard problem. We have extended the idea of CoSolver by incorporating heuristics instead of exact solvers for the Traveling Salesman Problem and Knapsack Problem components of the Traveling Thief Problem. Moreover, we have introduced a new promising heuristic for multi-hard problems that bases on Monte-Carlo Tree Search. We also examined a heuristic based on Ant Colony Optimisation. We have developed a set of publicly available benchmarks for TTP and have used it to compare the heuristics against each other.

Our experiments show that, when it comes to partial solutions, heuristic components may lead to better global solutions, because the results produced by such components are generally “more stable”—that is, are less sensitive to further changes. In the experiments, the version of CoSolver that is based on purely heuristic components (CoSolver Heuristic) performs better than the original CoSolver on the average. Moreover, CoSolver Heuristic and MCTS never produce bad solutions. We also note that Monte-Carlo Tree Search may provide an interesting alternative to CoSolver-based heuristics in case there is little knowledge about subcomponents of the initial problem, or if the coupling between the subcomponents is high enough to make any negotiation protocols between subcomponents ineffective. Our results confirm the effectiveness of using a decomposition-negotiation approach to multi-hard problems.

The coupling between TTP components has a great impact on the difficulty of obtaining good results. TTP instances in which components are more dependent on each other are more difficult to solve well by our decomposition-based algorithms. This gives insight into the difficulty of other multi-hard problems. Even though multi-hard problems may be in the same complexity class as their components, they can be more difficult than each of the components and this difficulty increases with increasing component interdependence. Also, for most of the considered instance types, CoSolver Heuristic (the algorithm that aims to exploit multi-hard problem structure) does a better job than MCTS and ACO. However, results also suggest that MCT-based algorithms perform better on problems that have large cohesion between their subcomponents (i.e., are highly dependent). This shows that further work is needed to design algorithms that can better exploit problem structure. Recall that CoSolver's design relies on good methods to “negotiate” a solution between solvers for the components of a multi-hard problem. A very high coupling (dependency) between the components of a multi-hard problem seems to make this “negotiation” less effective. We believe that better methods for such “negotiation” may still be discovered.

Our long-term goal is to provide a broad new methodology for integration of multi-hard problems progressing from simpler couplings of silos and sequences, to heterogeneous highly connected models. In future work we will be interested in extending our model Traveling Thief Problem with additional subcomponents and various aspects that may be found in real-world systems (such as incompleteness and uncertainty of information, or information that changes over time) and developing new decomposition-based methodologies for such extensions. We will be also interested in validating our methods in an industrial environment.

## Note

^{1}

See Section 5.1 for definitions.

## References

## Author notes

Z. Michalewicz is also with Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland, and the Polish Japanese Academy of Information Technology, Warsaw, Poland.