Abstract
Connection patterns among Local Optima Networks (LONs) can inform heuristic design for optimisation. LON research has predominantly required complete enumeration of a fitness landscape, thereby restricting analysis to problems diminutive in size compared to real-life situations. LON sampling algorithms are therefore important. In this article, we study LON construction algorithms for the Quadratic Assignment Problem (QAP). Using machine learning, we use estimated LON features to predict search performance for competitive heuristics used in the QAP domain. The results show that by using random forest regression, LON construction algorithms produce fitness landscape features which can explain almost all search variance. We find that LON samples better relate to search than enumerated LONs do. The importance of fitness levels of sampled LONs in search predictions is crystallised. Features from LONs produced by different algorithms are combined in predictions for the first time, with promising results for this “super-sampling”: a model to predict tabu search success explained 99% of variance. Arguments are made for the use-case of each LON algorithm and for combining the exploitative process of one with the exploratory optimisation of the other.
1 Introduction
A Local Optima Network (LON) (Ochoa et al., 2008) is a fitness landscape model used to understand or predict interaction between configuration spaces and search algorithms. LONs consist of local optima for nodes, and the edges are trajectories between local optima. Studying LON objects has brought colour and clarity to our understanding of how optimisation problems and search algorithms interact (Daolio et al., 2011; Ochoa and Veerapen, 2016; Herrmann, 2016; Hernando et al., 2017).
Principally, studies have enumerated the fitness landscape to build a comprehensive LON (Ochoa et al., 2008; Tomassini et al., 2008; Daolio et al., 2011; Verel et al., 2011; Herrmann et al., 2016; Veerapen et al., 2016; Verel et al., 2018). Their focus was on smaller problems and every candidate solution was mapped to a local optimum within its basin of attraction.
The baseline and the proof-of-concepts were necessary to establish LONs as promising tools. They now have attracted attention (Ochoa and Veerapen, 2018; Bozejko et al., 2018; Herrmann et al., 2018; Fieldsend, 2018; Chicano et al., 2018; Simoncini et al., 2018; Liu et al., 2019), and will likely be applied to increasing numbers of real (and much larger) problems. In anticipation of these requirements, refined LON sampling methods are needed. The literature concerning how to sample LONs is in its embryonic stages.
A few LON construction algorithms have been proposed, but have not been extensively tested. We describe here those proposed for the Quadratic Assignment Problem (QAP). Note that in lieu of formal naming by the various authors, we have labelled them as we see fit.
Two-Phase LON Sampling. Iclanzan et al. (2014) proposed a LON algorithm to sample -dimension QAP instances, which had separate phases for recording nodes and edges. Initially, local optima were found by hill-climbing from starting solutions, and thereafter the local optima have a kick or perturbation applied. If the obtained local optimum is also in the node set, an edge is added (or, if it exists already, the edge weight is incremented). This algorithm is not used in our study as it seems to be a particular case of the two more recent proposals.
Markov-Chain LON Sampling. Ochoa and Herrmann (2018) introduced a LON algorithm for the QAP built with a competitive heuristic, Iterated Local Search (ILS) (Stützle, 2006). For the remainder of this study, this is labelled Markov-Chain LON Sampling, to avoid confusion with the ILS optimisation heuristic itself. The sampling is comprised of multiple ILS runs, each being an adaptive walk on the local optima space. Every local optimum and connection between optima is recorded. The same framework for Travelling Salesman Problem LONs has been used with some success (Ochoa and Veerapen, 2018); this is also instrumented on top of a competitive heuristic in the domain—Chained Lin-Kernighan (Applegate et al., 2003).
Snowball LON Sampling. The same year, Verel et al. (2018) presented a Snowball LON Sampling algorithm. A random walk takes place on the local optima level. From each node in the walk, a recursive branching or snowballing samples the direct (locally optimal) neighbours. The local optima are found using kicks followed by hill-climbing, as in Markov-Chain LON Sampling.
The literature has not asked which method is more representative of the LON being sampled; that is, are the sampled landscapes similar to those induced by search heuristics during optimisation? There is also vagueness about whether particular LON construction algorithms relate more earnestly to particular search heuristics. A further open issue is that computational cost has not necessarily been considered with regards to LON construction algorithms. Therefore, can their efficiency be compared? And finally, is sampling bias inherent within the methods themselves?
This work has a heavy focus on the relevance of sampled LONs to empirical search difficulty, which manifests as performance prediction using LON features as predictors. We argue this is fruitful in contrast to the analysis of LON features (without aligning them with search variance)—this seems arbitrary and counterintuitive to the purpose of LONs. Search variance prediction is important because it helps ascribe algorithm proficiency to particular topological features and produces insight about how algorithmic operators interact with a problem.
The results and experiments of this study come in four parts. First, the comparison of LON samples with fully enumerated LONs; next LON construction algorithms are provided a fixed computational budget; then LON methods are given various sets of input parameters; and finally the best features from each LON type are taken together into regression models.
The first comparison of sampled versus enumerated LONs, in terms of predictive power for search variance is given. For the first time LON construction algorithms are given equal computational budget and the effect studied. We created a new set of features, which contains features obtained with the two different sampling methods, as opposed to sets of features coming from a single method.
The remainder of the article is structured as follows. Section 2 provides the baseline definitions to keep the work self-contained; our methods are in Section 3. Experimental setup is detailed in Section 4, followed by the results in Section 5, and discussion and conclusions in Sections 6 and 7, respectively.
2 Definitions
2.1 Fitness Landscapes
A fitness landscape is both a mathematical object and a metaphor for understanding optimisation processes. For evolutionary computation, a landscape is the tuple where is the set of all potential solutions, , a function that defines the adjacency between solutions; that is, is the set of neighbouring solutions of the solution , and f—a fitness function which assigns a fitness value to each solution as (Stadler, 2002). When we consider that search heuristics typically use multiple operators (the operator corresponds to the fitness landscape component as a notion of connectivity between solutions), it becomes clear that search algorithms induce empirical fitness landscapes that are much more complex than the basic singular neighbourhood model. Two different search algorithms moving on the same configuration space can manufacture completely dissimilar topological landscape features.
2.2 Local Optima Networks
A subspace of the fitness landscape is considered, reducing the solution set exclusively to local optima. The neighbourhood now defines the accessibility between local optima often based on the original neighborhood relation of the fitness landscape. This is a Local Optima Network as per Ochoa et al. (2008).
2.2.1 Nodes
The set of vertices, , consists of local optima (labelled with an index and corresponding fitness). For a minimisation optimisation problem, we define each such local optimum to satisfy the condition of superior fitness over all other solutions, , in its neighbourhood: , where is the neighbourhood of .
2.2.2 Edges
The network edges, , are weighted and oriented. For Markov-Chain Sampling and Snowball Sampling LONs, we have an edge if combining perturbation and hill-climbing can transform the source to the destination. The edge weight is the probability of that transformation occurring. Formally, local optima and form the source and destination of an edge iff .
Joining the nodes and edges induces a LON, , a directed graph with nodes , and there exists an edge , with weight , between two nodes and iff . Note that, in most cases, .
2.3 Funnels
Abstracted from high-dimensional space, a suboptimal funnel (left) and the primary (optimal) funnel on the right, for a minimisation problem.
Abstracted from high-dimensional space, a suboptimal funnel (left) and the primary (optimal) funnel on the right, for a minimisation problem.
3 Methodology
3.1 The Quadratic Assignment Problem
A Quadratic Assignment Problem (QAP) (Lawler, 1963) requires allocation of facilities to available locations. A permutation of facilities assigned to the indexed locations is the solution representation. Instance specification for a QAP is formed by two matrices: one for pairwise distances between facilities and locations, the other, pairwise flows between them. Mathematically, with the problem dimension being , the QAP is an assignment of facilities to locations where each pair of facilities and locations, , have a distance separating them, , and a flow value, . These are encoded in the distance and flow matrices. A solution is a permutation of length , has fitness calculated as the product of distances, and flows between adjacent assignments. The objective function, , for a QAP solution is minimisation of this product where . The QAP is NP-Hard and manifests in many real-world problems; it remains a testbed for new fitness landscape analysis (Tayarani-N and Prügel-Bennett, 2016; Verel et al., 2018; Ochoa and Herrmann, 2018). This reality combined with the availability of QAP LON construction algorithms lead us to select QAP for this study.
3.2 Instances
In our analysis, we consider the widely accepted benchmark library, Quadratic Assignment Problem Library (QAPLIB) (Burkard et al., 1997). The library contains 128 instances which also have the optimum fitness evaluations provided. We use 124 of these 128, omitting the largest three due to computational cost (tho150, tai150b, and tai256c), and also esc16f because all entries in the distance matrix are zero. Our set results in . A set of sixty additional instances ( = 11) not in QAPLIB also play a part in the results. Their size enables a complete enumeration of the LONs, which is required for comparison to sampled LONs of the same instances. As there is an insufficient number of small instances in the benchmark QAPLIB, this set is desirable to perform statistical analysis.
QAP instances usually conform to four categories (Stützle, 2006), differentiated by distributions in the distance and flow matrices. The four categories are: uniform random distances and flows, random flows on grids, real-world problems, and random real-world like problems. Below we describe our chosen instances in relation to these categories. Note that in QAPLIB, the problem dimension is given in the instance name.
Uniform random distances and flows. Distance and flow values for instances from this category are randomly sampled from a Gaussian distribution. The spread of locations on the plane is random. From QAPLIB we use tai{12a, 15a, 17a, 20a, 25a, 30a, 35a, 40a, 50a, 60a, 64c, 80a, 100a} alongside rou{12, 15, 20}. Within this class are 30 of our size-11 generated instances.
Random flows on grids. Distance values are organised—locations lie in a defined square on a grid. Flow entries are random. From this category, the QAPLIB instances had{12,14, 16, 18, 20}, nug{12, 14, 16a-b, 17, 18, 20, 21, 22, 24, 30}, scr{12, 15, 20}, sko{42, 56, 64, 72, 81, 90, 100{a-f}}, tho{30,40}, and wil{50, 100} are considered.
Real-world. Instances of the QAP can be formulated from real-world problems. This is the case for some QAPLIB instances. In use in this study are bur instances, which comprise of stenotypist typing data; kra instances, formulated to plan a hospital layout; esc instances, for the testing of sequential circuits; the ste set, for fixing units on a grid; and els19, which deals with flows of hospital patients moving between departments. Our analysis includes QAPLIB sets bur{a-h}, {els19}, esc{16a-e, g-j, 32e, 32g, 128}, kra{30a-b, 32}, lipa{20a-b, 30a-b, 40a-b, 50a-b, 60a-b, 70a-b, 80a-b, 90a-b}, and ste36{a-c}.
Random Real-World Like. Although not strictly “real-world,” these simulate real-world conditions by placing locations in clusters on the plane. In this way, distance values are either small (intra-cluster) or large (inter-cluster). Flows are random. From QAPLIB we use tai{12b, 15b, 20b, 25b, 30b, 35b, 40b, 50b, 60b, 80b, 100b}. As above, within this class are 30 of our size-11 generated instances.
Miscellaneous. The QAPLIB chr set do not seem to fit neatly into a category. The only constraint is that flows form a mathematical tree. The instances chr{12a-c, 15a-c, 18a-b, 20a-c, 22a-b, 25a} are used.
3.3 LON Algorithm 0: Exhaustive Enumeration
The method for exhaustively enumerating a local optima network was introduced alongside the model itself (Ochoa et al., 2008) and then adapted for the QAP (Daolio et al., 2011). LONs are enumerated using a best-improvement algorithm which uses the elementary operator for QAP—a pairwise exchange of facilities. The local optimum for each solution is found by best-improvement pivot rule and in this way the nodes in the network are added. The escape edges are defined according to the distance (number of pairwise swaps between solutions), and a threshold for the LON . An edge is traced between and if a solution exists such that and . The weight of this edge is . This weight can be normalised by the number of solutions, , within reach at distance . In the present study, we set . The best-improvement algorithm is described in Algorithm 1.
3.4 LON Algorithm 1: Markov-Chain LON Sampling
We label the first LON sampling candidate (Ochoa and Herrmann, 2018) Markov-Chain LON Sampling for the duration of this study. LON nodes and edges are logged during multiple runs of an iterated local search. Each run starts at a random solution, hill-climbs to a local optimum, before “kicking” or perturbing with a large mutation. Hill-climbing is applied to the perturbed solution to obtain a local optimum, which becomes the input for the next perturbation and hill-climb cycle. The ILS used (Stützle, 2006) is competitive for QAP. To build a LON, 200 runs of ILS track each local optimum and each movement between local optima. Each of these traces is combined to form a single LON.
The algorithm process is in Algorithm 2. There are parameters which will affect the sample: the number of ILS runs (runs), the termination condition (iterations without improvement, t), the pivot rule of hill-climbing (best improvement or first, hc-type), and the strength of kick or perturbation (k). The sample can be regarded as a joined group of adaptive walks on the local optima level. The multiple start points of the individual runs should—in principle—negate any issue of anisotropy (directional dependency) in the fitness landscape.
3.5 LON Algorithm 2: Snowball LON Sampling
The second LON algorithm candidate is called Snowball LON Sampling. Put concisely, it is a branching random walk at the local optima level. The process of snowballing has been around for decades in social science (Goodman, 1961) but was not considered for LON sampling until late 2018 (Verel et al., 2018). Snowball LON Sampling starts at a random solution and hill-climbs to a local optimum (LO) which is taken to be the first node in a random walk of LO. From this node, a recursive expansion of LO neighbours begins. Specifically, perturbations followed by hill-climbs are taken from the node, to find LO neighbours. All are added to the growing sample. The process is then repeated for the neighbours, to find their own neighbours. Thereafter, Snowball LON Sampling returns to the first node on the main walk path, and goes to a random LO neighbour (again by perturbation and hill-climbing) which becomes node two on the walk. The expansion of node two begins. This continues until the random walk is complete.
The pseudocode is shown in Algorithm 3. Parameters exist which can be used to tune the sample: the length of the walk (l), the number of neighbours included in a LO expansion (m), and the depth of the expansion (number of steps away from original path, d).
3.6 LON Features
LON features are chosen with care for our study—a critical component of our analysis utilises them as predictors in regression for performance prediction.
Standard features taken from Markov-Chain LON Sampling LONs are the number of LON nodes sampled (denoted as number optima); edges sampled (edges); mean sampled fitness (mean fitness); mean out-degree (mean outgoing edges from nodes, out-degree); and the diameter of the LON (longest path between nodes), diameter.
Funnel-based LON features are included. Our choices are the proportion of LON edges pointing at the global optimum funnel-floor (incoming global); and mean funnel-floor fitness (funnel-floor fitness).
All features mentioned thus far are useful for Markov-Chain LON Sampling LONs but not Snowball LONs. Snowball samples do not fit well with funnel metrics as heuristic trajectories are uniformly restricted by the nature of the sampling. In fact, the sampling would induce a consistent and increasingly large number of apparent “funnels” (according to our definition in Section 2.3). The short branching paths during node expansion would also lead to numerous nodes with no outgoing edges, which are technically defined as funnel-floors. In reality, improving transformations from these nodes is likely possible. Similarly, standard features of the samples such as quantities of nodes, edges, and out-degrees in Snowball Sampling LONs are redundant as predictors; they are artefacts of the sampling parameters (length of the random walk, number of edges, and depth of snowball expansion).
Three LONs extracted from the same N = 11 QAP instance (from class “random real-like”). All produced with different algorithms as indicated in the subcaptions. Node size is proportional to fitness (larger is fitter). Global optimum is shown in red, all other nodes in grey.
Three LONs extracted from the same N = 11 QAP instance (from class “random real-like”). All produced with different algorithms as indicated in the subcaptions. Node size is proportional to fitness (larger is fitter). Global optimum is shown in red, all other nodes in grey.
For Snowball LON Sampling, features are based on the attributes encoded in the nodes and edges: the fitness distribution and the edge weight distribution. These are more appropriate to what the sample has to offer. Metrics based on the density and connection pattern of edges will not give meaningful information because these are an artefact of the chosen sampling parameters, instead of an inherent structure. In particular, funnel metrics calculated on Snowball LONs demonstrated that almost all nodes were identified as funnel-floors. Instead, included in the chosen metric set is the mean weight of self-loops (weight loops); mean weight disparity of outgoing edges (weight disparity); and fitness-fitness correlation between neighbours (fitness correlation). Statistics collected during snowballing are included too, namely: mean length of hill-climb to local optimum (mean HC length); maximum length of hill-climb to local optimum (maximum HC length); and maximum number of paths to local optimum (maximum HC paths).
4 Experimental Setup
In this section, we detail the setup of the LON construction algorithms, the predictive models used, and the QAP search algorithms.
4.1 Markov-Chain LON Sampling: Details
For the main QAPLIB study, seven parameter configurations are given to the Markov-Chain LON sampling algorithm, producing seven Markov-Chain LONs for each QAPLIB instance. Parameter choices are amongst those provided and suggested by the author of the base ILS heuristic (Stützle, 2006), using the form {perturbation, type local search}, and they are: {, first}; {, best}; {, first}; {, best}; {, first}; {, first}; {, first}. For all, 200 runs are started from a random solution, so that diverse regions of the local optima space will be sampled. Each run terminates when there has not been an improvement in 1000 iterations. During preliminary runs, this condition was found to be sufficiently large such that local optima are not prematurely considered to be inescapable funnel-floors (recall Section 2.3), but low enough that computational cost is reasonable. Concerning the sensitivity of the results to the parameter choices: if the number of runs were lower, the relation from the constructed LONs to optimiser performance would be proportionally weaker. A lower value for the termination condition would produce LONs riddled with pseudo-funnel floors, presenting an image of the local optima space which is more complex than the empirical reality. The full process is given in Section 3 and Algorithm 2.
Seven sampling configurations for 124 problem instances gives us 868 LONs for QAPLIB instances as produced by Markov-Chain LON Sampling. In addition, Markov-Chain LON Sampling LONs for the 60 synthetic size-11 instances and for the fixed-evaluations QAPLIB LONs were constructed using {, first}.
4.2 Snowball LON Sampling: Details
Parameter configurations for Snowball LON Sampling lie in the form {l, m, d} and are as follows: {100, 20, 2}; {100, 30, 2}; {100, 50, 2}; {100, 60, 2}; {200, 30, 2}; {400, 30, 2}; and {400, 50, 2}. The choices are based on those suggested by the algorithm's author (Verel et al., 2018). To reiterate, we start once from a random solution, and hill-climb to a local optimum, which is the first in the random walk. The algorithm will terminate when the walk length and node expansion is complete, as in Section 3 and Algorithm 3. Like Markov-Chain LON Sampling, there are 868 (seven parameter sets 124 instances) for QAPLIB produced by this algorithm. Snowball LON Sampling LONs for the synthetic size-11 instances and for the fixed-evaluations QAPLIB LONs were constructed using {100, 60, 2}.
4.3 Predictive Models: Details
Linear and random forest regression are used for algorithm performance prediction. Linear regression models describe the linear effect of predictors on the response variable; random forest is known for capturing nonlinear interactions between variables. For each, random repeated subsampling cross-validation (also known as bootstrapping) is conducted for 100 iterations, each time shuffling the observations randomly. We do this with an 80-20 training-test split. The predictors are normalised and reduce to standard deviation one as . The random forest regressions use 500 decision trees; the number of features sampled as candidates during splits is , where is the number of features.
For the bulk of our models (results Sections 5.2, 5.3, and 5.4) the is a reported statistic. This quantifies the amount of variance in the response variable—which is the obtained fitness (after 1000 iterations) as a proportion of the desired fitness—which is explainable using the set of predictors. Also in use is the mean-squared error (MSE), a notion of how accurate predictions from the model are. For the random forest regressions, the predictor importance rankings are reported. A portion of our study is comparing LON enumerations with LON samples. For this, we have a different environment for modelling. Because the LON must be fully enumerable, the instances are bounded at and are synthetic. There are 60 observations (feature vectors for 60 LONs), arising from two QAP classes (random “real-like” and uniformly random, respectively). Accounting for effects caused by QAP class is an objective for this set of experiments (the results are in Section 5.1), in particular because of the limited number of observations. These “random effects” can be controlled for in a hierarchical linear mixed model, also known as a mixed-effects model.
The summary statistics used for this subset of regression models (Section 5.1) are the conditional, the marginal, and the Root Mean Squared Error (RMSE) (given as a proportion of the response variable's range). Conditional R is the variance explained by the complete model. Marginal is the ratio of variance that is explained by the fixed effects (Nakagawa and Schielzeth, 2013) (negating or controlling for the random effects from QAP instance class). The fixed effects are the LON features. RMSE captures the standard deviation of the model predictions (also known as residuals) and is therefore useful for estimating the variability of predictions. RMSE is in the same unit range as the response variable. For ease of interpretation, RMSE is taken as a proportion of the total range for the response variable (search performance).
4.4 QAP Search Algorithms
For our regression response variables, search algorithm performance on the QAP instances is used. The search algorithms are two competitive search algorithms for the QAP: Improved Iterated Local Search (ILS, Stützle, 2006), and Robust Taboo Search (TS, Taillard, 1991). These are run 100 times on every QAP instance and after 1000 iterations the obtained fitness is taken as a proportion of the desired fitness. This is our measure for how difficult the instance was for ILS or TS and is the response variable in the models seen in results Sections 5.2, 5.3, and 5.4. This metric does not hold up well though for our instances: they are easy for ILS and TS, meaning the obtained fitness is frequently the desired fitness. For those experiments (models reported in results Section 5.1) the metric number of iterations to reach the global optimum is instead used as our algorithm response variable.
ILS comes with a wealth of viable parameter configurations. The choices for this study are first improvement hill-climbing, in combination with a perturbation strength of pairwise exchanges of facilities. This perturbation strength is known to be well-performing (Stützle, 2006) and is chosen to generate a sufficiently different solution from the previous local optimum.
5 Results
This section reports the experimental results, mostly in the form of regression model summaries. The primary aim is analysing the relationship between the LONs and search algorithm performance.
5.1 Approximating the LON Enumeration
5.1.1 Features
Scatterplots for features of fully enumerated LONs (-axis) versus features of sampled Markov-Chain LONs (-axis).
Scatterplots for features of fully enumerated LONs (-axis) versus features of sampled Markov-Chain LONs (-axis).
Scatterplots for features of fully enumerated LONs (-axis) versus features of sampled Snowball LONs (-axis).
Scatterplots for features of fully enumerated LONs (-axis) versus features of sampled Snowball LONs (-axis).
5.1.2 Predictive Models
Consider now Table 1 for performance prediction models using features from different types of LON sampling. In terms of explanation of variance in the responses (see Marginal column), the models are somewhat weak after controlling for the random effects (allowing for difference in problem type). This can be seen by comparing the smaller Marginal values with the Conditional ones. The confusion of the response variable is likely due to the small number of observations (60) and smaller still number belonging to a particular class (30 each). The purpose of the models is comparison between LON construction algorithms though. The models do have low relative RMSE—typically one-fifth of the range of the response variable. Taking the Marginal as indication of quality, the two strongest models are using sampled (Markov-Chain and Snowball) LONs to predict ILS response on the problem. Notice that the Marginal and Conditional are equal for the former. This means there was no detectable random effects coming from difference in problem class. The result that sampled LONs explain more variance than enumerated is noteable—intuition would tell us that the enumerated LON (rows one and two in Table 1) would give superior information about the problem. Markov-Chain LON Sampling and Snowball LON Sampling are, however, based on operator combinations which are in search algorithms used in practice on the QAP. Enumeration of LONs is simply running a local search from every possible solution to assign a local optimum. This may not mirror actual algorithm-problem interactions. This result is interesting given that most LON work has analysed enumerated LONs (Ochoa et al., 2008; Daolio et al., 2011; Herrmann et al., 2016).
LON Method . | Response Variable . | Marginal . | Conditional . | RMSE . |
---|---|---|---|---|
Exhaustive | ILS | 0.159 | 0.832 | 0.21 |
Exhaustive | TS | 0.148 | 0.719 | 0.18 |
Markov-Chain | ILS | 0.406 | 0.406 | 0.20 |
Markov-Chain | TS | 0.220 | 0.816 | 0.20 |
Snowball | ILS | 0.370 | 0.374 | 0.20 |
Snowball | TS | 0.147 | 0.745 | 0.23 |
LON Method . | Response Variable . | Marginal . | Conditional . | RMSE . |
---|---|---|---|---|
Exhaustive | ILS | 0.159 | 0.832 | 0.21 |
Exhaustive | TS | 0.148 | 0.719 | 0.18 |
Markov-Chain | ILS | 0.406 | 0.406 | 0.20 |
Markov-Chain | TS | 0.220 | 0.816 | 0.20 |
Snowball | ILS | 0.370 | 0.374 | 0.20 |
Snowball | TS | 0.147 | 0.745 | 0.23 |
5.2 LON Sampling: Equal Computational Threshold
For this section, LONs were produced using a comparable computational budget. Each LON algorithm was allowed 50,000 fitness function evaluations. Now we investigate which provides more predictive information about search difficulties under these restricted conditions.
Table 2 gives model summaries, including the LON algorithm used to produce the samples, the type of regression, the chosen response variable (ILS or TS, meaning the normalised fitness they obtained on the problem), the , and the mean-squared error (MSE). Tables 3 and 4 then record the random forest predictor rankings for Markov LON Sampling and Snowball LON Sampling respectively. Looking over the ILS rows in Table 2, it can be seen that the MSE rates are comparable for the LON construction algorithms; however, the values are higher for both types of regression for Markov-Chain over Snowball. This being said, Snowball LON, with features predicting TS using random forest regression, is the strongest model shown, with around 52% of variance explained. This is especially interesting with the consideration that typically the values are smaller for random forest over linear, but this model is the only exception. Snowball LON sampling is superior to its competitor when taking tabu search (TS) to be the response variable. This can be seen by comparing rows 2 and 4 with rows 6 and 8. The values are higher and the error rates are smaller (by an order of magnitude) in the Snowball models, compared to those of Markov.
LON Method . | Regression . | Response Variable . | . | MSE . |
---|---|---|---|---|
Markov-Chain | Linear | ILS | 0.471 | 0.002 |
Markov-Chain | Linear | TS | 0.336 | 0.132 |
Markov-Chain | RandomForest | ILS | 0.245 | 0.002 |
Markov-Chain | RandomForest | TS | 0.154 | 0.144 |
Snowball | Linear | ILS | 0.303 | 0.002 |
Snowball | Linear | TS | 0.418 | 0.024 |
Snowball | RandomForest | ILS | 0.230 | 0.002 |
Snowball | RandomForest | TS | 0.521 | 0.013 |
LON Method . | Regression . | Response Variable . | . | MSE . |
---|---|---|---|---|
Markov-Chain | Linear | ILS | 0.471 | 0.002 |
Markov-Chain | Linear | TS | 0.336 | 0.132 |
Markov-Chain | RandomForest | ILS | 0.245 | 0.002 |
Markov-Chain | RandomForest | TS | 0.154 | 0.144 |
Snowball | Linear | ILS | 0.303 | 0.002 |
Snowball | Linear | TS | 0.418 | 0.024 |
Snowball | RandomForest | ILS | 0.230 | 0.002 |
Snowball | RandomForest | TS | 0.521 | 0.013 |
Predicting ILS . | Importance Value . | Predicting TS . | Importance Value . |
---|---|---|---|
incoming global | 0.058 | mean fitness | 2.130 |
out-degree | 0.023 | funnel-floor fitness | 2.127 |
number optima | 0.023 | edges | 1.742 |
mean fitness | 0.022 | number optima | 1.736 |
edges | 0.021 | out-degree | 0.349 |
funnel-floor fitness | 0.019 | incoming global | 0.304 |
Predicting ILS . | Importance Value . | Predicting TS . | Importance Value . |
---|---|---|---|
incoming global | 0.058 | mean fitness | 2.130 |
out-degree | 0.023 | funnel-floor fitness | 2.127 |
number optima | 0.023 | edges | 1.742 |
mean fitness | 0.022 | number optima | 1.736 |
edges | 0.021 | out-degree | 0.349 |
funnel-floor fitness | 0.019 | incoming global | 0.304 |
Predicting ILS . | Importance Value . | Predicting TS . | Importance Value . |
---|---|---|---|
maximum HC length | 0.043 | maximum HC length | 0.656 |
weight disparity | 0.028 | mean fitness | 0.320 |
mean fitness | 0.025 | mean HC length | 0.310 |
maximum HC paths | 0.017 | fitness correlation | 0.273 |
fitness correlation | 0.015 | weight disparity | 0.193 |
weight loops | 0.014 | weight loops | 0.156 |
mean HC length | 0.014 | maximum HC paths | 0.141 |
Predicting ILS . | Importance Value . | Predicting TS . | Importance Value . |
---|---|---|---|
maximum HC length | 0.043 | maximum HC length | 0.656 |
weight disparity | 0.028 | mean fitness | 0.320 |
mean fitness | 0.025 | mean HC length | 0.310 |
maximum HC paths | 0.017 | fitness correlation | 0.273 |
fitness correlation | 0.015 | weight disparity | 0.193 |
weight loops | 0.014 | weight loops | 0.156 |
mean HC length | 0.014 | maximum HC paths | 0.141 |
5.3 LONs to Predict QAPLIB Problem Difficulty
In this section, the set of 868 LON samplings (for each LON algorithm, totalling 1,736) for QAPLIB is used. Recall that these are associated with 124 QAPLIB instances, each having seven LONs produced per algorithm (differentiated by sampling parameter configuration). Table 5 shows the algorithmic performance prediction model summaries, indicating the strength of the models in terms of and MSE. Tables 6 and 7 record predictor rankings for the random forest models for Markov-Chain LON Samples and Snowball LON Samples, respectively. Rows 1 and 5 of Table 5 (linear regression to predict ILS response) show that neither Markov-Chain nor Snowball LON features build a good model. The values are just too weak. However, when we look at the equivalent random forest models (rows 3 and 7) we have strong models with around 64% (Markov-Chain) and 80% (Snowball) of variance explained, and very small relative MSE values. This could reflect the capability of regression trees to capture nonlinearity and complex interactions between predictors. The same trend is seen in the tabu search prediction models in Table 5. Linear models (rows 2 and 6) are weak, with small and comparably larger MSE. The random forest results (rows 4 and 8) are strong, with 90% or over variance explained by features of LONs produced by both LON construction algorithms.
LON Method . | Regression . | Response Variable . | . | MSE . |
---|---|---|---|---|
Markov-Chain | Linear | ILS | 0.043 | 0.002 |
Markov-Chain | Linear | TS | 0.180 | 0.081 |
Markov-Chain | RandomForest | ILS | 0.645 | 0.000 |
Markov-Chain | RandomForest | TS | 0.925 | 0.008 |
Snowball | Linear | ILS | 0.057 | 0.003 |
Snowball | Linear | TS | 0.252 | 0.029 |
Snowball | RandomForest | ILS | 0.804 | 0.000 |
Snowball | RandomForest | TS | 0.922 | 0.003 |
LON Method . | Regression . | Response Variable . | . | MSE . |
---|---|---|---|---|
Markov-Chain | Linear | ILS | 0.043 | 0.002 |
Markov-Chain | Linear | TS | 0.180 | 0.081 |
Markov-Chain | RandomForest | ILS | 0.645 | 0.000 |
Markov-Chain | RandomForest | TS | 0.925 | 0.008 |
Snowball | Linear | ILS | 0.057 | 0.003 |
Snowball | Linear | TS | 0.252 | 0.029 |
Snowball | RandomForest | ILS | 0.804 | 0.000 |
Snowball | RandomForest | TS | 0.922 | 0.003 |
Predicting ILS . | Predicting TS . |
---|---|
mean fitness | mean fitness |
funnel-floor fitness | funnel-floor fitness |
number optima | edges |
incoming global | diameter |
edges | number optima |
diameter | out-degree |
out-degree | incoming global |
Predicting ILS . | Predicting TS . |
---|---|
mean fitness | mean fitness |
funnel-floor fitness | funnel-floor fitness |
number optima | edges |
incoming global | diameter |
edges | number optima |
diameter | out-degree |
out-degree | incoming global |
Predicting ILS . | Predicting TS . |
---|---|
mean fitness | mean fitness |
maximum HC paths | fitness correlation |
fitness correlation | maximum HC paths |
mean HC length | weight disparity |
weight disparity | maximum HC length |
maximum HC length | mean HC length |
weight loops | weight loops |
Predicting ILS . | Predicting TS . |
---|---|
mean fitness | mean fitness |
maximum HC paths | fitness correlation |
fitness correlation | maximum HC paths |
mean HC length | weight disparity |
weight disparity | maximum HC length |
maximum HC length | mean HC length |
weight loops | weight loops |
Taking Markov-Chain and Snowball as rivals for predicting ILS response, Snowball is slightly superior; for TS, they are roughly indistinguishable for predictive power—Markov-Chain has slightly higher but also higher error.
Now let us consider the contributions of the Markov-Chain Sampling predictors in our models from Table 6. These variable importances are calculated as such; for all trees, the subsample of observations not used in the building of the tree has prediction accuracy calculated on it. To get the importance of a variable (predictor), a random shuffle is applied to the variable values in the subsample. Everything else is kept constant. The prediction accuracy is then tested again on these permuted data. This accuracy decrease is averaged over the trees to obtain variable importance for a predictor.
Each column is for a random forest model, and the predictors from top to bottom are best to worst. Immediately apparent is the importance of the sampled fitness distribution. In fact, mean fitness is the top predictor for all four models. Funnel-floor fitness is the second best predictor for both Markov models, telling us again the role of sampled fitness levels. Out-degree is not an important Markov-Chain LON feature for algorithm performance prediction, ranking lowest and second-lowest in the models.
Table 7 reports the rankings for the Snowball LON Sampling RF models. Just like in Table 6, mean fitness is top-ranked in each. Also important for predicting TS is fitness correlation, appearing second. In the middle are the local search metrics collected during the Snowball LON process and the weight disparity. Ranking lowest is weight loops, pertaining to unsuccessful escape attempts from local optima. These findings are unusual: the bulk of LON literature focuses heavily on edge connection patterns and densities in the LON (Herrmann et al., 2016; Thomson et al., 2017; McMenemy et al., 2018) rather than simply the fitness distribution the sample reflects.
5.4 Best of Both Sampling Methods?
Markov-Chain LON Sampling and Snowball LON Sampling produce LONs which capture different fitness landscape information. Features to do with edge and node density, as well as those relating to path lengths, are redundant as predictors when calculated on a Snowball LON: they are prescribed by the sampling algorithm parameters. A speculation is put forward now that the union of different features from LON types, for the same problem instances, might predict more accurately than using one LON type only. The three best-ranked predictors (according to Section 5.3) for each response variable (ILS or TS) and each LON method (Markov or Snowball) are bundled together into single models. The results are in Table 8. Rankings are in Table 9. Table 8 illuminates further that linear regression may not suit LON performance prediction models, with rows 1 and 2 showing weak values. The lack of specificity suggests they miss nonlinear interactions between variables. This is in contrast to the random forest counterparts, rows 3 and 4, strikingly stronger than the linear models. They account for 97% (ILS) and 99% (TS) of search variance explained. The error rates (see MSE column) appear to be low. The values surpass those seen when using features from one type of LON.
Regression . | Response Variable . | . | MSE . |
---|---|---|---|
Linear | ILS | 0.076 | 0.003 |
Linear | TS | 0.254 | 0.026 |
RandomForest | ILS | 0.972 | 0.000 |
RandomForest | TS | 0.991 | 0.000 |
Regression . | Response Variable . | . | MSE . |
---|---|---|---|
Linear | ILS | 0.076 | 0.003 |
Linear | TS | 0.254 | 0.026 |
RandomForest | ILS | 0.972 | 0.000 |
RandomForest | TS | 0.991 | 0.000 |
Predicting ILS . | Predicting TS . |
---|---|
mean fitness (Snowball LONs) | mean fitness (Snowball LONs) |
funnel-floor fitness | funnel-floor fitness |
mean fitness (Markov LONs) | mean fitness (Markov LONs) |
number optima | maximum HC paths |
maximum HC length | fitness correlation |
mean HC length | edges |
Predicting ILS . | Predicting TS . |
---|---|
mean fitness (Snowball LONs) | mean fitness (Snowball LONs) |
funnel-floor fitness | funnel-floor fitness |
mean fitness (Markov LONs) | mean fitness (Markov LONs) |
number optima | maximum HC paths |
maximum HC length | fitness correlation |
mean HC length | edges |
In Table 9, we show the six features ranked by importance in the RF models. As in Section 5.3, the prestige of the fitness features is obvious. Both models share the same three features. The mean sampled Snowball LON fitness (mean fitness (Snowball LONs)) is top-ranked. In second place is funnel-floor fitness, calculated on Markov LONs only. In third place is the Markov LONmean fitness. As in Section 5.3, the role of the LON connectivity variable (edges) is weak, ranking last in its model.
6 Discussion
Markov-Chain LON Sampling has not been validated against “ground truth” enumerated LONs before. Section 5.1 showed that for small ( = 11) QAP instances, both Markov-Chain LON Sampling and Snowball LON Sampling produced accurate approximation of the number of nodes (local optima) and edges (optima transformations) when comparing to the “ground-truth” fully enumerated LON. Markov-Chain LON Sampling did not find all of them but the quantities correlated with the enumerated LON values. The latter found almost the same number of nodes and edges as the enumerated LON.
Although this is encouraging, there must be caution when tempted by extrapolations to larger problems. Both LON construction algorithms are designed for large problems, and use the according amount of computation. There may be a particular consideration for scaling Snowball LON Sampling to larger problems: The process uses short, restricted paths through local optima and is not based on optimisation. Therefore, it is argued Snowball LON Sampling may be highly dependent on the random starting solution; the rest of the sample is based around that. In a large fitness landscape, the obtained sample may actually be rather low-quality local optima, far from the promising regions near the global optimum. In future work, the intention is to examine the relationship between LON sampling parameter choices and problem dimension for both algorithms.
Another engaging observation from the sampling validation attempt (Section 5.1) was that generally, sampled LON features explained more search variance than those of the enumerated LONs. This is important because the vast majority of LON research (Ochoa et al., 2008; Daolio et al., 2011; Verel et al., 2011; Herrmann et al., 2016) dissected fully enumerated LONs, even using their features for algorithm performance prediction (Daolio et al., 2012; Herrmann et al., 2018; Thomson et al., 2018). Our results suggest that LON construction algorithms may approximate or infer yet unseen fitness landscapes better than best-improvement exhaustive enumeration of the local optima level. This does, intuitively, conform to logic: LON sampling algorithms use search operators—and sequences of operators—which overlap with search algorithm operators used to optimise QAP instances.
In Section 5.2, we saw that capping the LON construction algorithms to 50,000 fitness function evaluations produces LONs whose features can explain up to around 50% of search variance. The Snowball LON Sampling seemed to infer the tabu search response variable better than its competitor; Markov-Chain Sampling LONs did slightly better on the ILS response.
The main QAPLIB LONs told us that they have solid predictive power for search algorithms in Section 5.3. Both Markov-Chain LON Sampling and Snowball LON Sampling seem to produce samples that infer prospective fitness landscapes, with much variance in search being explained with the features.
Sections 5.3 and 5.4 showed us that random forest trees yield more promising regression models than linear regression. This suggests the use of RF in future algorithm performance prediction models using fitness landscape features, in pursuit of capturing complex variable interactions.
Another aspect of the results was the apparent importance of the fitness levels sampled by the LON construction algorithms, seen in Sections 5.3 and 5.4. Metrics about the distribution were repeatedly present in the top two predictors for random forest models.
The strongest models we saw were when the best features from Markov-Chain LONs were combined with the best features from the Snowball LONs in Section 5.4, accounting for 97% (ILS) and 99% (TS) of search variances. The suggestion is therefore combining features from both LON construction algorithms as a “super-sample” in predictive models. In the future, the intention is to implement a combined LON algorithm which draws on both the exploitation of the Snowball algorithm and the exploration of the Markov algorithm.
There are certainly limitations to the approach and indeed the results presented. The results are valid but currently hold only for our particular search operator choices, configurations of LON construction algorithms, choices of QAP heuristic search algorithms, choice for quantifying search success, chosen neighbourhood function, and the features taken from the LONs. There is also the issue of the nondeterminism in the sampling algorithms.
7 Conclusions
Local Optima Network (LON) construction algorithms have been scrutinised to gain information about their ability to infer future fitness landscapes. The two most recent LON sampling algorithms were studied, and they were named Markov-Chain LON Sampling and Snowball LON Sampling. The aim was to compare them and assess the quality of the resultant LON samples for predicting performance on the problems. We also used an algorithm for exhaustively enumerating LONs. All were applied to the Quadratic Assignment Problem, but the frameworks could easily be modified to a different problem in the manner of metaheuristic framework. The QAP instance set included both benchmark and generated problems.
The study began with validation of LON samples by comparison with “ground truth” (the enumerated LON for the same instance). Then Markov-Chain LON Sampling and Snowball LON Sampling were given 50,000 fitness evaluations to produce LONs of QAPLIB instances. After that Markov-Chain LON Sampling and Snowball LON Sampling were given default computation (according to the algorithms specification by the LON algorithm authors). A large set of QAPLIB instances up to had fourteen LONs mapped in this way. Finally, the three most important features for each type of LON sample were taken and joined into one model.
The results suggested that for optimisation predictions made by LON features, random forest trees better capture the nonlinear effects between variables in the fitness landscape. Repeatedly, the apparent key role of sampled fitness levels in explaining search algorithm variance was seen. A lot of LON literature focuses on edge connectivity patterns and not the fitness distribution amongst local optima—is this misguided? A surprising result was that the enumerated LON, when compared to sampled LON, had less predictive power. An intuitive thought is that a better prediction comes from more extensive fitness landscape information. This is not the case here. An argument is made that this stems from LON sampling algorithms sharing similar operators, or indeed sequences of operators, with heuristics used in QAP research.
This study ends with a note that there is an abundant field of work waiting to be done in this area, and we are excited to explore it. In particular, we acknowledge that our results are of course dependent on our algorithm configurations, neighbourhood function choice, choices for search algorithms, choice of problem domain, and so on. The pursuit of different choices is the next endeavour.