## Abstract

One of the major challenges in the field of evolutionary algorithms (EAs) is to characterise which kinds of problems are easy and which are not. Researchers have been attracted to predict the behaviour of EAs in different domains. We introduce fitness landscape networks (FLNs) that are formed using operators satisfying specific conditions and define a new predictive measure that we call motif difficulty (MD) for comparison-based EAs. Because it is impractical to exhaustively search the whole network, we propose a sampling technique for calculating an approximate MD measure. Extensive experiments on binary search spaces are conducted to show both the advantages and limitations of MD. Multidimensional knapsack problems (MKPs) are also used to validate the performance of approximate MD on FLNs with different topologies. The effect of two representations, namely binary and permutation, on the difficulty of MKPs is analysed.

## 1. Introduction

Intrinsically, evolutionary algorithms (EAs; Forgel, 1999; Bäck et al., 1997; Goldberg, 1989) are a kind of stochastic algorithms. Thus, one of the major challenges in this field is to characterise which kinds of problems are easy for a given algorithm to solve and which are not. In this direction, researchers have been attracted to predict the behaviour of EAs in different domains and have proposed predictive measures to quantify problem difficulty. The primary methodology used in the available predictive measures is fitness landscape (FL) analysis. The concept of an FL was introduced in theoretical genetics (Wright, 1932) as a way to visualise evolutionary dynamics. FLs were connected to EAs via a neighbourhood structure based on operators used in EAs, which highlights the association between search spaces and fitness spaces. With the property of neighbourhood structure in mind and at some level of granularity, each FL can form a network. In such a network, each node corresponds to a point in the search space, and each edge connects one node to one of its neighbours. The fitness value can be viewed as the weight of each node. Therefore, different problems correspond to different networks, and the process of EAs solving problems is equivalent to navigating through these networks. From this viewpoint, problem difficulty can be predicted by analysing the features of FL networks (FLNs). Although it is well known that FLs are related to networks, no predictive measure has been proposed based on the features of FLNs.

A number of global features of complex networks, such as the small-world, clustering, and scale free properties have been studied thoroughly. In addition to these global features, network motifs (Milo et al., 2002) were proposed to uncover the structural design principles and resulted in a deeper understanding of complex networks. Motifs are connected subgraphs occurring in complex networks at numbers that are significantly higher than those in random networks. It has been shown that network motifs exist widely in various complex networks, and different complex networks have different types of motifs. Thus, network motifs are in fact an intrinsic property of complex networks, and can be used to differentiate networks. For example, Ghoneim et al. (2008) used network motifs to discriminate two-player strategy games. For EAs, FLNs corresponding to various problems are expected to be different in some intrinsic properties. Therefore, we propose a predictive difficulty measure for EAs, namely motif difficulty (MD), by extracting motif properties from directed FLNs. We define MD by synthesising the effect of different classes of distance motifs on the searching process. Our experimental results show that MD can quantify the difficulty of different problems into the range of −1.0 (easiest) to 1.0 (most difficult), and performs especially well on some counterexamples for other measures.

This paper is organised as follows. Section 2 discusses related work. Preliminaries on fitness landscapes and network motifs are given in Section 3. Section 4 presents the definition of motifs in FLNs, and Section 5 presents a qualification of problem difficulty based on the motifs defined in Section 4. Experiments on exact and approximate MD are given in Sections 6 and 7, respectively. Finally, Section 8 summarises the work in this paper, and discusses both advantages and disadvantages of MD.

## 2. Related Work

In general, the study of factors affecting the performance of EAs can be divided into two classes (Borenstein and Poli, 2005a). The first class focuses on the properties of a particular algorithm, while the second class focuses on the problem itself, and particularly on FLs. In the first class, the BB hypothesis (Goldberg, 1989), which is famous in the genetic algorithm (GA) community, states that a GA tries to combine low order and highly fit schemata. Following the BB hypothesis, the notion of deception has been defined (Goldberg, 1989; Forrest and Mitchell, 1993). Epistasis variance (Davidor, 1991) and epistasis correlation (Naudts, 1998) try to assess the GA-hardness of problems from the perspective of theoretical genetics.

In the second class, methods focus on using statistical properties of FLs to characterise problem difficulty. The first study in this class proposed isolation (needle in a haystack; Forrest and Mitchell, 1993) and multimodality (Davidor, 1991). The other popular method is fitness distance correlation (FDC; Jones and Forrest, 1995), which measures the hardness of a landscape according to the correlation between the distance from the optimum and the fitness value of the solution. However, none of the available measures fully achieved success. Isolation might be sufficient, but it is not a necessary condition for a landscape to be difficult to search. Multimodality is neither necessary nor sufficient for a landscape to be difficult to search (Kallel et al., 2001). FDC has achieved some success, but is still not able to predict the performance in some scenarios (Naudts and Kallel, 2000; Jansen, 2001).

In addition to FLs, Borenstein and Poli (2005a) pointed out that a limitation of the original FL approach is that it does not provide a way to quantify the amount of information available in a landscape nor to assess its quality. Thus, they proposed information landscapes (ILs) based on tournament selection in GAs. Using ILs, they proposed a method to predict GA hardness and a theoretical model to study search algorithms (Borenstein and Poli, 2005b, 2005c). Based on the observation that FLs can actually form networks, Ochoa et al. (2008) proposed a network characterisation of combinatorial FLs, using the well-known family of NK landscapes as an example, and exhaustively extract local optima networks on NK landscape instances. This work is the first attempt at using network analysis techniques in connection with the study of FLs and problem difficulty. However, they did not propose predictive measures.

He et al. (2007) gave a rigorous definition of difficulty measures in black box optimisation, and proved that in general predictive difficulty measures that run in polynomial time do not exist unless certain complexity-theoretical assumptions are wrong. However, there are still some successful applications of using predictive difficulty measures to guide the design of new algorithms. For example, Merz and Freisleben (2000) and Tavares et al. (2008) conducted an FL analysis for quadratic assignment problems and multidimensional knapsack problems (MKPs), respectively; Yang et al. (2006) presented an attempt at characterising the search space difficulties in red teaming by using fitness landscapes.

## 3. Preliminaries

### 3.1. Fitness Landscapes

*P*(

*e*) denotes the occurrence probability of event

*e*, and denotes the set of configurations that can be obtained by performing on , namely the set of neighbours of . Since EAs are determined by various operators, the above neighbourhood structure defined over operators reflects the features of different EAs, and correspondingly, these features can be reflected in the resultant FLs.

### 3.2. Network Motifs

Network motifs are patterns of interconnections, which can be reflected by connected subgraphs (Milo et al., 2002). Milo et al. (2002) showed that there are 13 types of three-node connected subgraphs (Figure 1) for all directed networks. In a directed network, a pattern of interconnections can be viewed as a network motif only when its number of occurrences in this network is significantly higher than that in the corresponding random network. More specifically, network motifs are those connected subgraphs for which the probability of appearing in a random network an equal or greater number of times than in the real network is lower than a cutoff value 0.01 (Milo et al., 2002).

To detect *n*-node network motifs in a network, the network was scanned for all of the possible *n*-node subgraphs, and the number of occurrences of each subgraph was recorded (Milo et al., 2002). Milo et al. (2002) showed that several networks (i.e., gene regulation, neurons, food webs, electronic circuits, and the World Wide Web) exhibit different types of network motifs, and frequencies of different network motifs vary from one network to another. Motifs may thus define universal classes of networks, and are basic building blocks of most networks. Therefore, network motifs have been widely used in studying complex systems and in characterising features on the system level by analysing locally how the substructures are formed.

## 4. Motifs in Directed Fitness Landscape Networks

In EAs, when an individual moves from one node to its neighbour under the effect of an operator, it is equivalent to exploring the local structures or subgraphs. Thus, subgraphs, which can be reflected by motifs, that an EA has visited during the whole evolutionary process have a close relationship with its performance. Since network motifs are just connected subgraphs, obviously, the number of possible motifs in undirected networks is much lower than that in directed networks. Moreover, the topology of FLNs built by the same operator is identical if the fitness value of each node is ignored. Thus, we first convert undirected FLNs to directed FLNs (DFLNs) by making use of fitness values so that more different types of network motifs can be extracted. Second, we propose another way to consider the 13 types of possible three-node motifs presented in Figure 1, and define six types of basic motifs. Finally, based on these basic motifs, a new type of motif, namely distance motifs, is designed by taking global optima as references.

### 4.1. Directed Fitness Landscape Networks and Basic Motifs

*is larger than 0, then the edge exists in the corresponding network. In this way, a complete graph will be generated when using the mutation operator in which each bit will be flipped with probability 1/*

**operator***n*for a binary string of length

*n*(labelled as 1/

*n*mutation in the following text), and the probabilities of transforming this string to different neighbours are not always the same. These incur some problems in detecting network motifs. First, if an operator leads to a complete graph, the topology of fitness landscapes for all problems is the same. Second, if the probabilities of transforming a string to different neighbours are not always the same, which is equivalent to the fact that edges have different weights, the edges in a motif may have different importance, and we cannot just count the number of each type of motif. Therefore, in the following text, for the sake of simplicity, the operator used to building DFLNs must satisfy the conditions in Equations (6) and (7). That is to say, the number of neighbours of each node must be much smaller than the number of nodes in the network, and the probabilities of converting one node to different neighbours must be identical. Edges in DFLNs use only the relative difference between the fitness values of two nodes rather than the absolute fitness values. Thus, DFLNs are suitable for analysing comparison-based search algorithms, such as (1+1)EA (Droste et al., 2002).

The original network motifs are defined on the difference in frequency from random networks (Milo et al., 2002). However, random FLNs are built from a random fitness function. The difficulty of this random fitness function for EAs is also a part of study, so we cannot take it as a reference to detect network motifs in DFLNs. In fact, if we further check the 13 types of possible three-node motifs listed in Figure 1, we can find that motifs of types 1, 2, 3, 4, 7, and 8 are subsets of motifs of types of 5, 6, 9, 10, 11, 12, or 13. Therefore, in this study, for a three-node subgraph, we consider only the edges between two pairs of nodes, namely motifs of types 1, 2, 3, 4, 7, and 8 in Figure 1, which we call Basic Motifs.

*Given a directed fitness landscape network , a Basic Motif in , labelled as M^{b}, is a connected three-node subgraph,*

Clearly, based on the edges between and , there are six types of basic motifs in total for all DFLNs, which are labelled as through and shown in the first two columns of Table 1. These six types of basic motifs can be viewed as basic building blocks of a DFLN.

Type of basic motifs . | Class of distance motifs . | ||||
---|---|---|---|---|---|

Number . | Motifs . | Condition . | Class . | Condition . | Class . |

Neutral | |||||

Guide | Core guide | ||||

d_{3}<d_{2} | Deceptive | d_{3}>d_{1} | Core deceptive | ||

Guide | Core guide | ||||

d_{2}<d_{3} | Deceptive | d_{1}>d_{3} | Core deceptive | ||

Guide | and | Core guide | |||

d_{1}<d_{3} | Deceptive | d_{1}>d_{2} and | Core deceptive | ||

d_{2}>d_{3} | |||||

and | Guide | ||||

d_{2}<d_{1} or d_{2}<d_{3} | Deceptive | ||||

and | Guide | ||||

d_{1}<d_{2} or d_{3}<d_{2} | Deceptive |

Type of basic motifs . | Class of distance motifs . | ||||
---|---|---|---|---|---|

Number . | Motifs . | Condition . | Class . | Condition . | Class . |

Neutral | |||||

Guide | Core guide | ||||

d_{3}<d_{2} | Deceptive | d_{3}>d_{1} | Core deceptive | ||

Guide | Core guide | ||||

d_{2}<d_{3} | Deceptive | d_{1}>d_{3} | Core deceptive | ||

Guide | and | Core guide | |||

d_{1}<d_{3} | Deceptive | d_{1}>d_{2} and | Core deceptive | ||

d_{2}>d_{3} | |||||

and | Guide | ||||

d_{2}<d_{1} or d_{2}<d_{3} | Deceptive | ||||

and | Guide | ||||

d_{1}<d_{2} or d_{3}<d_{2} | Deceptive |

### 4.2. Distance Motifs

A selection mechanism in EAs, such as binary tournament selection, causes the search heuristic to prefer high fitness regions. Thus, the problems would be easy if high fitness regions are close to global optima. On the contrary, if high fitness regions are far from global optima, the selection mechanism may lead the search heuristic in the wrong direction. This indicates that the distance between candidate solutions and global optima is another important factor that impacts EAs’ performance. Thus, we use the distance information to refine basic motifs. There are a number of ways to define the distance between a candidate solution and a global optimum. Naturally, the one that is most suitable for the searching process on fitness landscapes is the shortest path length. To calculate the shortest path length, we need to identify the nodes in a network that correspond to the global optima. However, for indirect encoding methods, like the permutation encoding for MKPs, the computational cost to match the points in the search space to the global optima may be quite high. For binary encoding, since the expected number of bits that are flipped in the 1/*n* mutation is 1, we use the 1-bit flip operator to build the DFLN, and then the Hamming distance is used as the distance measure, which is equivalent to the shortest path length in this case. For other encoding methods, if the computational cost to identify the nodes corresponding to global optima and calculate the shortest path length is feasible, the shortest path length can be used as the distance measure; otherwise, an approximate way should be designed.

*Given a basic motif, , the corresponding Distance Motif is , where d_{1}, d_{2}, and d_{3} are the distances attached to , , and , respectively*.

When the searching process visits a basic motif from to to , we can check *d*_{1}, *d*_{2}, and *d*_{3} to see whether the searching process is heading toward the right direction or not. Clearly, when we check whether a direction is correct or not, we should take into account both the information about the fitness and the distance. Since the information about the fitness is reflected by the edges in DFLNs, we first define paths in a distance motif. Then, based on their contributions to the searching process, distance motifs are divided into three classes.

*Given a distance motif, . For , if can reach by only visiting edges in , then there is a Path between and . Furthermore, for each edge on a path in , if the edge with inverse direction does not exist, then this path is an Effective Path*.

*Given a distance motif, . If there is no effective paths in , then M^{d} is a Neutral Motif. If all effective paths with the largest length (the number of edges in the path) in satisfies that the distance of the start node is not less than that of the end node, then M^{d} is a Guide Motif; otherwise, it is a Deceptive Motif*.

In a path, the node with the lower fitness value always points to the node with the higher fitness value. When the distance of an end node on a path is not larger than that of the other end node, then it means that when the fitness value increases, the distance decreases. In such a case, the motif has a positive effect on the searching process, thus, it is a guide motif. Table 1 illustrates how the three classes of distance motifs are related to the six types of basic motifs. For , since the set of effective paths is empty, all motifs in are neutral motifs. For , the only effective path is between *n*_{3} and *n*_{2}. Thus, if , then they are guide motifs; otherwise, deceptive motifs. The case for is similar. For , there are three effective paths, namely the paths between *n*_{1} and *n*_{2}, *n*_{2} and *n*_{3}, and *n*_{1} and *n*_{3}, whose lengths are 1, 1, and 2, respectively, and only the path with the largest length needs to be considered. Thus, if , then they are guide motifs; otherwise, deceptive motifs. For , there are two effective paths, namely the paths between *n*_{2} and *n*_{1}, and between *n*_{2} and *n*_{3}, whose lengths are both 1, and both need to be considered. Thus, if both and , then they are guide motifs; otherwise, deceptive motifs. The case for is similar.

If we further analyse guide and deceptive motifs, paths with length two exist in some cases. On such paths, if the difference in distances and fitness values of each pair of nodes connected by an edge is consistent with that of two end nodes, then the corresponding motifs have a more explicit impact on the searching process. Thus, we further define two sub classes of guide and deceptive motifs.

*Given a guide motif, . If there exists a path with length two, labelled as , where and , satisfies the conditions that when and when , then M^{G} is a Core Guide Motif*.

*Given a deceptive motif, . If there exists a path with length two, labelled as , where and d_{i}<d_{j}, satisfies the conditions that d_{i}<d_{k} when and d_{k}<d_{j} when , then M^{d} is a Core Deceptive Motif*.

According to Definitions 6 and 7, none of the motifs in and is a core guide or a deceptive motif because no path with length two exists. The relationship in other groups of basic motifs are also shown in Table 1.

## 5. Predictive Difficulty Measures based on Distance Motifs

Guide motifs can mostly help the searching process head toward the right direction. This is because when the searching process goes from nodes with lower fitness values to nodes with higher fitness values, it naturally and smoothly approaches global optima. On the contrary, deceptive motifs mostly lead the searching process in the wrong direction since when the searching process goes from nodes with lower fitness values to nodes with higher fitness values, it deviates away from global optima. For neutral motifs, fitness values of the three nodes are the same, thus, neither deceptive nor guide information is provided. Next, we first analyse DFLNs of some well-studied fitness functions to show how the motifs composing a DFLN are related to problem difficulty. Then, based on the analyses, predictive difficulty measures are proposed.

### 5.1. DFLNs of Well-Studied Fitness Functions

#### 5.1.1. Effect of the Amount of Different Types of Motifs on Problem Difficulty

*If a fitness landscape, , where S is composed of binary strings, f, and the 1-bit flip operator is used to construct N, then distance motifs in the corresponding directed fitness landscape network are guide motifs*.

Let a distance motif in the corresponding DFLN be . If an edge , then according to the 1-bit flip operator and , we have

*M*^{d} belongs to .

*M*^{d} belongs to .

Then, we have *d*_{2}>*d*_{1} and *d*_{2}>*d*_{3}; that is, *M*^{d} is a guide motif.

Before we give Proposition 2, we first prove some properties of DFLNs built by the 1-bit flip operator.

In a fitness landscape network built by the 1-bit flip operator, (1) the number of edges connecting any three nodes is less than three, (2) each node belongs to basic motifs where *n* is the dimension of the search space, (3) the number of all basic motifs is 2^{n-1}*n*(*n*−1).

Clearly, Equations (17) and (18) contradict each other. Thus, the number of edges connecting any three nodes is less than three.

*n*. According to Lemma1(1), any three nodes can form only one basic motif at most. Thus, all distance motifs that a node belongs to can be divided into two cases; that is, (a) both and connect to and (b) connects to and connects to . The number of basic motifs that belongs to is

^{n}, the number of all basic motifs is

*Given a fitness landscape, , where S is composed of binary strings with length n, f, and the 1-bit flip operator is used to construct N. Then, the fraction of neutral motifs in the corresponding directed fitness landscape network is equal to or greater than *.

Given a distance motif *M*^{d}, and the configurations corresponding to three nodes in *M*^{d} are not the global optimum, then the fitness values of these three nodes are 0. Thus, *M*^{d} belongs to , namely a neutral motif.

*Given a fitness landscape, , where is composed of binary strings with length n, , and the 1-bit flip operator is used to construct N; then, the fraction of deceptive motifs in the corresponding directed fitness landscape network is equal to or greater than *.

In fact, this is equivalent to , and according to the proof for Proposition 1, we have that *M*^{d} belongs to , , or .

*M*^{d} belongs to . According to Equation (14), we obtained that the number of 1s in is smaller than that of , and that of is smaller than that of . Since the global optimum is the string of all 0s, we have *d*_{1}<*d*_{2}<*d*_{3}; that is, *M*^{d} is a core deceptive motif.

*M*^{d} belongs to . According to (15), we obtained that the number of 1s in is smaller than those of and . Thus, we have *d*_{2}<*d*_{1} and *d*_{2}<*d*_{3}; that is, *M*^{d} is a deceptive motif.

*M*^{d} belongs to . According to Equation (16), we obtained that both the number of 1s in and is larger than that of . Thus, we have *d*_{2}>*d*_{1} and *d*_{2}>*d*_{3}; that is, *M*^{d} is a deceptive motif.

That is to say, only when a motif includes a node corresponding to the global optimum can it be a guide or neutral motif. Then, being similar to the proof for Proposition 2, we have that the number of deceptive motifs is equal to or greater than .

The above propositions for and validate our knowledge that the greater the number of guide motifs is, the easier the problem is, and the more the number of deceptive motifs is, the more difficult the problem is. Although most motifs in are neutral (that is, neither guide nor deceptive information exists) presents a level of difficulty for black box algorithms (Droste et al., 2006).

#### 5.1.2. Effect of the Spatial Distribution of Motifs on Problem Difficulty

^{(i)}denotes 11⋅⋅⋅1 and 0

^{(i)}is similar.

For , if *s*_{1}=0, then no matter what the values of the later *n*−1 bits are, the fitness value is 0. For the search space with dimension *n*, the number of such nodes is 2^{n-1}, and all distance motifs composed of such nodes are neutral motifs. However, guide motifs also exist and connect to most nodes.

*Given a fitness landscape, , where S is composed of binary strings with length , f=, and the operator used in N is the 1-bit flip operator; then, each node whose Hamming distance to the global optimum is no less than two belongs to at least one core guide motif*.

*s*

_{1i}=

*s*

_{1j}=0, ,

*i*<

*j*, and

*s*

_{1k}=1 for

*k*<

*i*and

*i*<

*k*<

*j*. Let and . Then, we have

Let , , and be the nodes corresponding to , , and , respectively, and *d*_{1}, *d*_{2}, and *d*_{3} be the distance of the three nodes to the global optimum. We have *d*_{2}=*d*_{1}+1, *d*_{3}=*d*_{2}+1; that is, *d*_{3}>*d*_{2}>*d*_{1}. Therefore, these three nodes form a core guide motif.

For , the situation is different. Both deceptive and guide motifs exist, but only small number of nodes can form guide motifs.

*Given a fitness landscape, , where is composed of binary strings with length n, , and the 1-bit flip operator is used to construct N. Let be the set of configurations that satisfy the first condition in Equation (24). (1) Only distance motifs containing at least one node corresponding to the configuration in can be guide motifs. (2) Each node whose corresponding configuration is in belongs to at least one core guide motif.*

(1) Let *M*^{d} be a distance motif where the configurations corresponding to the three nodes are not in . Then, the fitness value of each node decreases as the number of 1s increases. According to the proof for Proposition 3, *M*^{d} is a deceptive motif. This proves that only when at least one node's configuration is in , can *M*^{d} be any type of motif except a deceptive type.

Therefore, each group of three connected nodes forms a core guide motif.

According to previous research (Jansen, 2001), is easy. Although Jansen also showed that is an easy problem for a hill climber, Borenstein and Poli (2005b) showed that a simple GA with uniform crossover and mutation was unable to find the solution quickly. Propositions 4–5 show that although core guide motifs exist in both cases, their spatial distributions are completely different. For , core guide motifs nearly spread all over the whole network; while for , they are only confined in a narrow space. This topological difference validates our experiences with the difficulty of these two functions to some extent.

### 5.2. Motif Difficulty

The most straightforward way to predict problem difficulty by motifs is to count the number of distance motifs. This also reflects the first aspect of the relationship between motifs and problem difficulty. Since this kind of difficulty measure is equivalent to making a statistic over the whole network, we call it **Network Level Difficulty**.

*For a directed fitness landscape network, , its Network Level Difficulty, labelled as , is defined as*

The value of network level difficulty is in the range of [−1.0, 1.0]. When is equal to −1.0, all distance motifs are guide motifs, so the problem is the easiest. On the contrary, when is equal to 1.0, all are deceptive motifs, so the problem is the most difficult. To reflect the spatial distribution of each class of distance motifs in a predictive measure, we need to analyse how the nodes are involved in each class of distance motifs. Thus, we classify nodes into two different types.

*For a node , if one of the distance motifs that is a part of is a core guide motif, moreover, is not the middle node, and the fitness value of is the smallest, then is a Core Guide Node. If one of the distance motifs that is a part of is a core deceptive motif, moreover, is not the middle node, and the fitness value of is the largest, then is a Core Deceptive Node*.

By intuition, any node that is a part of guide motifs is helpful for leading the searching process in a right direction, and any node that is a part of deceptive motifs is harmful. However, after further analysing motifs, we find that the three nodes play different roles. For example, for core guide motifs in , only is a core guide node. This is so because when the searching process visits this motif from to , it is heading toward the right direction. But when the searching process reaches , other motifs that is a part of may be deceptive motifs, and the search process cannot continuously head toward the right direction. Thus, we cannot determine whether the contribution of is positive or negative. Based on core guide and deceptive nodes, we design another measure, namely **Node Level Difficulty**.

*For a directed fitness landscape network, , is the set of core guide nodes, and is the set of core deceptive nodes. Then, Node Level Difficulty, labelled as , is defined as follows.*

Here, denotes the set of core deceptive nodes which are not core guide nodes. According to Definition 9, any node can be a core guide or a core deceptive node simultaneously. However, the most important characteristic of the searching process of EAs is that it works under selection pressure. How selection pressure works is reflected by which kinds of paths are chosen. Thus, under selection pressure, guide motifs always have priority since the high fitness region is always preferred. Even if a node is a part of both guide and deceptive motifs, the search process may not visit deceptive motifs at all. This trend is reflected in node level difficulty by giving core guide nodes a priority. That is, once a node is a core guide node, its effect is taken into account, and only when a node is a core deceptive node but not a core guide node, can its effect be taken into account. The value of node level difficulty is still in the range of [−1.0, 1.0].

Since is a statistic over the whole network, it can predict problem difficulty in general, but cannot reflect the detailed difference. Clearly, this detailed difference can be reflected by , since it takes into account the situation of each node. Therefore, we integrate and together to form a new measure, namely **Motif Difficulty**, whose value is still in the range of [−1.0, 1.0], where −1.0 corresponds to the easiest problems while 1.0 to the most difficult ones.

## 6. Experiments on Exact Motif Difficulty

If a measure is calculated on the whole search space, it is said to be exact (Naudts and Kallel, 2000). Thus, the value of motif difficulty is calculated exhaustively on the whole search space in this section. First, a collection of reasonably well-studied artificial fitness functions of known difficulty with different characteristics and a nonartificial problem, namely , are used to validate the MD prediction under various situations. Then, MD is used to estimate the difficulty of three counterexamples for other difficulty measures. Finally, counterexamples for MD are discussed to show the limitation of MD. The problems’ search spaces are binary strings with length 16. The FLNs are built using the 1-bit flip operator.

To make a fair comparison with other measures, we also implemented FDC (Jones and Forrest, 1995) and information landscape (IL; Borenstein and Poli, 2005b) under the same conditions. Furthermore, to validate the MD prediction of EA behaviour, three types of EAs and a local search algorithm (LS) are used to observe the hardness of these fitness functions. The first EA we used is (1+1)EA (Droste et al., 2002), which is widely used in analysing the behaviour of EAs theoretically. The second EA is a GA with the 1/*n* mutation only, labelled as GA_M. The third EA is a GA with both the uniform crossover and the 1/*n* mutation, labelled as GA_CM. Both GA_M and GA_CM use binary tournament selection and have 100 population size. LS is the same as (1+1)EA except the 1-bit flip operator is used instead. For these four search algorithms, the maximum number of fitness function evaluations (NFFEs) is set to 10^{5}. The NFFEs used to find the global optimum is used as the performance measure. If the global optimum is not found within the maximum NFFEs, then the performance is set to the maximum NFFEs. All results are averaged over 1000 independent runs.

### 6.1. Well Studied Fitness Functions

The second (Borenstein, 2008) is a multimodal function with a varying number of local optima. For , the global optimum is first selected uniformly at random from the search space, and its fitness value is set to 2*n*. Then *i* different local optima are selected uniformly at random from the search space, and their fitness values are set to *n*. Finally, the fitness values of all other configurations are calculated as *n* minus the distance to the closest local or global optimum.

Experimental results for and are given in Table 2. As can be seen, the performance of four algorithms is consistent with the prediction of three measures. Both the performance of algorithms and the predictive measures show that is extremely easy, and is more difficult than , but still belongs to easy problems since MD is smaller than −0.5.

Fitness functions . | ONEMAX
. | LEADING_ONES
. | RIDGE
. | NIAH
. | RAND
. |
---|---|---|---|---|---|

FDC | -1.0000 | -0.3535 | 0.9877 | -0.0156 | 0.0038 |

IL | 0 | 0.3813 | 0.8599 | 0.4300 | 0.5012 |

MD | -0.9999 | -0.6167 | 0.9971 | -0.0009 | -0.1110 |

(1+1)EA | 9 | 215 | 668 | 64,153 | 98,880 |

GA_M | 1,643 | 4,401 | 98,228 | 61,125 | 57,695 |

GA_MC | 826 | 3,820 | 99,700 | 50,796 | 51,577 |

LS | 43 | 129 | 295 | 53,140 | 100,000 |

Fitness functions . | ONEMAX
. | LEADING_ONES
. | RIDGE
. | NIAH
. | RAND
. |
---|---|---|---|---|---|

FDC | -1.0000 | -0.3535 | 0.9877 | -0.0156 | 0.0038 |

IL | 0 | 0.3813 | 0.8599 | 0.4300 | 0.5012 |

MD | -0.9999 | -0.6167 | 0.9971 | -0.0009 | -0.1110 |

(1+1)EA | 9 | 215 | 668 | 64,153 | 98,880 |

GA_M | 1,643 | 4,401 | 98,228 | 61,125 | 57,695 |

GA_MC | 826 | 3,820 | 99,700 | 50,796 | 51,577 |

LS | 43 | 129 | 295 | 53,140 | 100,000 |

is selected to test how MD reflects the effect of multimodality on problem difficulty. The number of local optima varies from 1 to 30. Values of three predictive measures are given in Figure 2(a). There is no clear decreasing or increasing trend on FDC, IL, and MD with the number of local optima. The performance of four search algorithms are given in Figure 2(d), in which we can see that the computational costs of GA_M and GA_MC have no explicit relationship with the number of local optima. Therefore, MD confirms the previous observations regarding GA hardness; that is, multimodality is neither necessary nor sufficient for a landscape to be difficult to search (Kallel et al., 2001).

To further quantitatively evaluate the predictive capability of MD, we calculate the correlation coefficient between values of three measures and the average NFFEs of four search algorithms for , which is shown in Table 3. For (1+1)EA and LS, MD shows a high correlation, which reaches 0.85, and is higher than those of FDC and IL. For GA_M, although the correlation coefficient of MD is lower than those of FDC and IL, it is still larger than 0.7, and shows a positive correlation. For GA_MC, these three measures show a low correlation. This is mainly due to the fact that all of them do not take into account the effect of the crossover operator.

Problems . | Measures . | (1+1)EA . | GA_M . | GA_MC . | LS . |
---|---|---|---|---|---|

Multimodal | FDC | 0.8139 | 0.7758 | 0.2921 | 0.7959 |

IL | 0.8481 | 0.7353 | 0.2154 | 0.8395 | |

MD | 0.8580 | 0.7133 | 0.1800 | 0.8516 | |

Trap | FDC | 0.9261 | 0.8440 | 0.8447 | 0.9171 |

IL | 0.9369 | 0.8185 | 0.8756 | 0.9336 | |

MD | 0.9345 | 0.8257 | 0.8738 | 0.9356 | |

MAXSAT | FDC | 0.4075 | 0.6204 | 0.6653 | 0.4059 |

IL | 0.4271 | 0.6195 | 0.6613 | 0.4178 | |

MD | 0.3357 | 0.5571 | 0.6202 | 0.3992 | |

Scaling | FDC | -0.1501 | -0.1632 | 0.2688 | 0.3439 |

IL | 0 | 0 | 0 | 0 | |

MD | 0 | 0 | 0 | 0 | |

Constantness | FDC | 0.5363 | 0.6151 | 0.8113 | 0.5313 |

IL | 0.3914 | 0.4690 | 0.6795 | 0.3866 | |

MD | 0.5393 | 0.6225 | 0.8498 | 0.5339 | |

Irrelevant deceptive | FDC | -0.6855 | -0.7809 | -0.8200 | -0.7412 |

IL | 0.4556 | 0.5227 | 0.5588 | 0.5063 | |

MD | 0.5835 | 0.6697 | 0.7103 | 0.6397 |

Problems . | Measures . | (1+1)EA . | GA_M . | GA_MC . | LS . |
---|---|---|---|---|---|

Multimodal | FDC | 0.8139 | 0.7758 | 0.2921 | 0.7959 |

IL | 0.8481 | 0.7353 | 0.2154 | 0.8395 | |

MD | 0.8580 | 0.7133 | 0.1800 | 0.8516 | |

Trap | FDC | 0.9261 | 0.8440 | 0.8447 | 0.9171 |

IL | 0.9369 | 0.8185 | 0.8756 | 0.9336 | |

MD | 0.9345 | 0.8257 | 0.8738 | 0.9356 | |

MAXSAT | FDC | 0.4075 | 0.6204 | 0.6653 | 0.4059 |

IL | 0.4271 | 0.6195 | 0.6613 | 0.4178 | |

MD | 0.3357 | 0.5571 | 0.6202 | 0.3992 | |

Scaling | FDC | -0.1501 | -0.1632 | 0.2688 | 0.3439 |

IL | 0 | 0 | 0 | 0 | |

MD | 0 | 0 | 0 | 0 | |

Constantness | FDC | 0.5363 | 0.6151 | 0.8113 | 0.5313 |

IL | 0.3914 | 0.4690 | 0.6795 | 0.3866 | |

MD | 0.5393 | 0.6225 | 0.8498 | 0.5339 | |

Irrelevant deceptive | FDC | -0.6855 | -0.7809 | -0.8200 | -0.7412 |

IL | 0.4556 | 0.5227 | 0.5588 | 0.5063 | |

MD | 0.5835 | 0.6697 | 0.7103 | 0.6397 |

is selected to test whether MD can reflect gradual changes in problem difficulty. When *i* varies from 1 to *n*, problem difficulty decreases gradually. In fact, this reflects the effect of isolation on problem difficulty; that is, the more the global optimum is isolated from the other high fitness solutions, the more difficult the problem is. Values of three predictive measures are given in Figure 2(b). Clearly, three measures reflect gradual changes in problem difficulty. Therefore, MD not only confirms the previous results about the effect of isolation on GA hardness, but also shows an ability to reflect gradual changes of problem difficulty. The performance of four search algorithms is given in Figure 2(e), which shows the same phenomenon. The correlation coefficient between the values of three measures and the average NFFEs of four search algorithms is shown in Table 3. For the four search algorithms, these three measures show a relatively high correlation, and MD reaches 0.93 for (1+1) EA and LS.

### 6.2. Problems

To test MD on landscapes of nonartificial problems, 10 random 3- instances are considered. Each instance with 16 literals and 68 clauses is generated by choosing for each clause three different literals uniformly at random and then negating each of them with probability 0.5, and only one global optimum is considered; that is, if an instance has more than one global optimum, one is selected uniformly at random. What should be noted is the ratio of clauses to literals, 4.25, which means the generated instances lie in the phase transition region. In such a case, the probability that an instance is satisfiable is neither near one nor near zero. Values of three predictive measures are given in Figure 2(c).

Although the 10 instances are generated with the same number of literals and clauses and the same literal clause ratio, their difficulty varies from −0.5976 to −0.4520 on MD, from −0.4499 to −0.1247 on FDC, and from 0.3458 to 0.4573 on IL. The three measures show that instances 6 and 7 are more difficult than others. This also confirms the previous results (Jansen, 2001) that different instances of the same problem might have different degrees of difficulty in black box scenarios. The performance of four search algorithms is given in Figure 2(f), in which we can see that the performance of four algorithms fluctuates from an instance to another, and all show a higher computational cost on instances 6 and 7. The correlation coefficient between the values of three measures and the average NFFEs of four search algorithms is shown in Table 3. For GA_M and GA_MC, the results of FDC, IL, and MD are all above 0.55, and for (1+1)EA and LS, the results are smaller, but still larger than 0.33. Thus, a positive correlation is shown for the four algorithms.

The above experimental results show that MD is consistent with FDC and IL on both artificial and nonartificial problems, and confirm previous results on the problem difficulty, including isolation, deception, and multimodality.

### 6.3. Counterexamples for Other Difficulty Measures

Three classes of problems, namely scaling, constantness, and irrelevant deception problems, are used here. It has been shown that other difficulty measures fail to predict the difficulty of these three classes of problems. Epistasis measures, including epistasis variance and epistasis correlation, cannot correctly predict the difficulty of constantness and scaling problems. Although FDC can detect the presence of constantness in landscapes, FDC can be blinded by the presence of a large proportion of irrelevant deception since its construction is based on averaging (Naudts and Kallel, 2000).

#### 6.3.1. Scaling Problems

Clearly, the larger the value of *m* is, the larger the fitness value of the global optimum, namely the string of all 1s, gets isolated from the mass of fitness values. In fact, this class of functions is a modified version of . For a GA with comparison-based selection, the difficulty of this class of functions does not change with *m*. The values of three predictive measures for this function are given in Figure 3(a). Both MD and IL are stable, and equal to −0.9999 and 0, respectively. However, FDC increases from −1.0000 to −0.0738, which confirms the results in Naudts and Kallel (2000); that is, FDC goes to 0 for both *m* and *n* going to infinity. The performance of four search algorithms is given in Figure 3(d) in which all algorithms show a stable performance similar to that of for all *m*. Clearly, the values of MD and IL are consistent with the performance of four search algorithms. Therefore, MD has the advantage of being insensitive to nonlinear scaling.

The correlation coefficient between the values of three measures and the average NFFEs of four algorithms is shown in Table 3. Since there are no clear changes in both values of IL and MD and the performance of four algorithms, the correlation coefficient of IL and MD is 0, while that of FDC fluctuates from negative to positive, which shows a failure in predicting the difficulty of this problem.

#### 6.3.2. Constantness Problems

*m*and , and functions with the same

*m*have the same difficulty. Here, since log

*n*=4, experiments on all 15 pairs of are conducted, and the values of three predictive measures are given in Figure 3(b), where the 15 pairs of parameters are ordered from easy to difficult (1–15) as follows on the

*x*axis: (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4).

As can be seen, MD perfectly reflects the difficulty changes; that is, for *m* = 0, 1, 2, 3, and 4, the value of MD is constant for each level. Thus, MD can detect the presence of constantness in landscapes. FDC and IL also show this change in general. However, for *m*=0 and 1, FDC and IL clearly increase with while MD is stable in each level. The performance of four algorithms is given in Figure 3(e), which shows that the functions with the former 12 pairs of parameters are relatively easy. MD is −0.8749 for *m*=1 and −0.6025 for *m*=2, which also indicates that the former 12 functions are easy.

The correlation coefficient between the values of three measures and the average NFFEs of four algorithms is shown in Table 3. The correlation coefficient of FDC and MD for four search algorithms is similar, and in the range of [0.5, 0.85]. Moreover, that of MD is always higher than that of FDC. The correlation coefficient of IL is low, which is only about 0.4 for (1+1)EA, GA_M, and LS, and 0.6795 for GA_MC while those of both FDC and MD are higher than 0.8.

#### 6.3.3. Irrelevant Deception Problems

The higher the mixture coefficient *m* is, the harder the problem is. But both FDC and sitewise optimisation measure (a generalisation for FDC and epistasis; Naudts and Kallel, 2000) failed to correctly predict the performance for this function. Here, experiments on *m* = 1 (easy), 2, , *n*−1 (difficult) are conducted, and the values of three predictive measures are given in Figure 3(c). As can be seen, the value of FDC decreases with *m*, which is in contradiction with the characteristics of this class of functions. For IL and MD, both of their values increase with *m*. The values of MD vary from −0.5311 to −0.0749, which are always smaller than 0 and indicate that this class of functions is not very difficult for EAs. The performance of four search algorithms is given in Figure 3(f) in which all results show that although the difficulty of this class of functions increases with *m*, they are easy in general, and EAs can solve this class of functions easily. This confirms the analysis in Naudts and Kallel (2000) and the deceptive information in this class of functions is irrelevant to the searching process of EAs. Thus, MD can distinguish irrelevant information existing in landscapes.

The correlation coefficient between the values of three measures and the average NFFEs of four algorithms is shown in Table 3. Obviously, FDC shows a negative correlation. Although the above results show that both IL and MD correctly predict the changing trend of the difficulty, the correlation coefficient shows that MD is more accurate than IL since the values of the correlation coefficient for MD are in the range of [0.58, 0.71], and are consistently larger than those of IL by 0.1.

The above experimental results show that FDC fails in detecting irrelevant deceptive information and is sensitive to nonlinear scaling. Although IL can detect irrelevant deceptive information and is insensitive to nonlinear scaling, IL is not as stable as MD in detecting the presence of constantness, and the correlation coefficient between the values of IL and the performance of search algorithms is much lower than that of MD. Moreover, in predicting irrelevant deceptive problems, MD is always more accurate than IL. In general, MD performs very well on the three classes of counterexamples for other difficulty measures, and is insensitive to nonlinear scaling, which is consistent with GA, can detect the presence of constantness, and is robust to irrelevant deceptive information.

### 6.4. Counterexamples for Motif Difficulty

In spite of the success that MD achieved in the above experiments, MD still has limitations. In this section, three counterexamples for MD, namely , , and , are presented. The global optima of and are selected uniformly at random from the search space. For , the fitness value of the global optimum is set to 2^{n} while those of all other configurations are chosen uniformly at random from [1, 2^{n}−1]. The values of three predictive measures and the performance of four search algorithms are given in Table 2.

The performance of four algorithms on confirms previous discussion (Jansen, 2001; Borenstein, 2008); that is, is an easy problem for a hill climber, but a simple GA with uniform crossover and mutation has difficulty in solving it. However, MD (0.9971) indicates that is an extremely difficult problem, which fails in predicting the performance of (1+1)EA and LS. As we discussed in Section 5.1, the number of guide motifs in the DFLN for is relatively small and distributes only in a narrow space, so it shows difficulty in both node level and network level difficulty, and naturally shows difficulty in MD since MD is actually a combination of the two statistics. The reason that GA fails in solving is mostly due to the fact that binary tournament selection regularly loses individuals from the ridge. Thus, the right prediction of MD on the performance of GA_M and GA_MC is mainly due to the fact that the population size is large here. clearly exposes a limitation of MD; that is, when a class of distance motif in a DFLN is small in amount and narrow in spatial distribution, MD may not predict its difficulty correctly for (1+1)EA, local search or GA with small populations since MD is based on two statistics over the whole network.

For and , MD shows that they have similar difficulty, which is neither easy nor difficult. However, Droste et al. (2006) already showed that is extremely difficult for black box algorithms. As we discussed in Section 5.1, most motifs in are neutral; that is, neither guide nor deceptive information exists. In fact, when the search space is large, the lack of guide information presents a fatal problem for black box algorithms. Although the amount of guide motifs is similar to the amount of deceptive motifs for , they distribute randomly, and no explicit guide or deceptive information presents at all. Thus, another limitation of MD is the prediction on the situation that neither guide nor deceptive information presents in an explicit way.

## 7. Experiments on Approximate Motif Difficulty

If a measure is computed on a sample of the search space, its values are called approximate (Naudts and Kallel, 2000). Since exhaustive computation on whole networks quickly becomes impractical, good sampling techniques on networks are required. In this section, we first propose a sampling technique for calculating the approximate MD, and then give the approximate MD for problems in Section 6. Additionally, since the search spaces of all problems in Section 6 are binary, MKPs are also used to validate the performance of MD on FLNs with different topologies.

### 7.1. Sampling Technique for Approximate MD

Clearly, the computational cost of calculating lies in finding the sequence , for any pair of nodes in . Since this sequence depends on the neighbourhood structure, the method to find it depends on the operator. For the 1-bit flip operator, the simplest method is to find the bits with different values in the pair first, and change the values one by one. For the swap operator under permutation encoding, one of the methods is to do the following operations for each position with different values in the pair until the two nodes are the same. Suppose and , and . If *x _{j}*=

*y*, then swap

_{i}*x*and

_{i}*x*. What should be also noted is for a pair of nodes, this sequence may not be unique, and we need to use only one for the sake of the sampling technique.

_{j}### 7.2. Experiments on Approximate MD

In the following experiments, the size of the search space is 32, is 1,000, and the values of approximate MD are averaged over 10 independent samplings. Table 4 lists average approximate MD and standard deviations for the five fitness functions in Section 6. As can be seen, approximate MD is consistent with MD for these five functions. Moreover, the standard deviations are quite small.

Fitness functions . | ONEMAX
. | LEADING_ONES
. | NIAH
. | RAND
. | RIDGE
. |
---|---|---|---|---|---|

MD | −0.9999 | −0.6167 | −0.0009 | −0.1110 | 0.9971 |

−0.9997 | −0.5198 | 0 | −0.2303 | 0.9998 | |

Standard | |||||

Deviation |

Fitness functions . | ONEMAX
. | LEADING_ONES
. | NIAH
. | RAND
. | RIDGE
. |
---|---|---|---|---|---|

MD | −0.9999 | −0.6167 | −0.0009 | −0.1110 | 0.9971 |

−0.9997 | −0.5198 | 0 | −0.2303 | 0.9998 | |

Standard | |||||

Deviation |

To further illustrate the performance of approximate MD on testing gradual changes in difficulty, and constantness problems are used, where *i* varies from 1 to 32, and the *x* axis of the constantness problems is on all 21 pairs of ; that is, (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (4, 4), (4, 5), and (5, 5). The experimental results are shown in Figure 4, which confirms that the sampling technique not only has the capability to reflect gradual changes in difficulty, but can also detect the presence of constantness in landscapes. Moreover, the standard deviations are very small, which illustrates that approximate MD is stable.

### 7.3. Multidimensional Knapsack Problems

To validate the performance of MD on FLNs with different topologies, we analyse the effect of two representations on the difficulty of MKPs. The first is the binary representation, where each candidate solution is represented by a characteristic bit vector and each bit is mapped to an item. A bit set to 1 indicates that the corresponding item is packed into the knapsack. For this representation, we use the 1-bit flip operator to build FLNs, and the fitness of each configuration is calculated by the penalty function () in Gottlieb (2001).

The second is the permutation representation, where each configuration is represented by a permutation of all items . For this representation, we use the swap operator to build the FLNs; that is, two different items are first selected from and then swapped. Since the global optimum is represented as a binary string and the computational cost to identify the permutations corresponding to the global optimum is high, the shortest path length cannot be used as the distance. However, the fitness of each configuration needs to be calculated by decoding the permutation into a binary string using a first-fit heuristic (Tavares et al., 2008), so we use the Hamming distance for the corresponding decoded binary string to form distance motifs. This is also the method used to calculate the distance in Tavares et al. (2008). The MKPs are selected from OR-Library.^{1} The experimental results are shown in Table 5.

Instance . | . | . | . | . | ||
---|---|---|---|---|---|---|

. | . | . | . | . | ||

Number . | n
. | m
. | Representation . | . | (1+ 1)EA . | GA_M . |

1 | 28 | 10 | Binary | 219,093 | 391,745 | |

Permutation | 1,011 | 3,147 | ||||

2 | 39 | 5 | Binary | 499,631 | 499,917 | |

Permutation | 265,997 | 247,055 | ||||

3 | 50 | 5 | Binary | 497,970 | 500,000 | |

Permutation | 72,116 | 188,585 | ||||

4 | 60 | 5 | Binary | 225,695 | 500,000 | |

Permutation | 145,309 | 398,969 | ||||

5 | 60 | 30 | Binary | 484,465 | 500,000 | |

Permutation | 147,402 | 148,265 |

Instance . | . | . | . | . | ||
---|---|---|---|---|---|---|

. | . | . | . | . | ||

Number . | n
. | m
. | Representation . | . | (1+ 1)EA . | GA_M . |

1 | 28 | 10 | Binary | 219,093 | 391,745 | |

Permutation | 1,011 | 3,147 | ||||

2 | 39 | 5 | Binary | 499,631 | 499,917 | |

Permutation | 265,997 | 247,055 | ||||

3 | 50 | 5 | Binary | 497,970 | 500,000 | |

Permutation | 72,116 | 188,585 | ||||

4 | 60 | 5 | Binary | 225,695 | 500,000 | |

Permutation | 145,309 | 398,969 | ||||

5 | 60 | 30 | Binary | 484,465 | 500,000 | |

Permutation | 147,402 | 148,265 |

As can be seen, the values of approximate MD for the permutation representation are always smaller than those for the binary representation. This confirms the results in Tavares et al. (2008); that is, the permutation representation with a first-hit heuristic is better than the binary representation. To further illustrate the difference between these two representations, (1+1)EA and GA_M are used to solve these problems. For the permutation representation, the 1/*n* mutation operator is replaced by the swap operator. The maximum NFFEs is set to , and other parameters are the same as those of the above experiments. The results are given in Table 5. For the five problems, the average NFFEs for the permutation representation is always smaller than that for the binary representation, which is consistent with the results of approximate MD.

## 8. Discussion and Conclusion

In this study, we proposed a new predictive difficulty measure using network motifs. Based on properties of DFLNs, distance motifs were first designed, and three classes of distance motifs based on their contributions to the search process of EAs were proposed. The new difficulty measure, namely MD, was defined by synthesising the features of three classes of distance motifs, namely guide, deceptive, and neutral motifs. A sampling technique for computing approximate MD was also designed. Extensive experiments were conducted to validate the performance of both exact and approximate MD, and the experimental results showed that MD is not only consistent with previous results on problem difficulty in terms of isolation, deception, multimodality, gradual changes of problem difficulty, and the effect of different representations on the difficulty of MKPs, but also worked very well for three counterexamples for other difficulty measures. MD manifested the advantages of being insensitive to nonlinear scaling, detecting the presence of constantness, and being robust to irrelevant deceptive information.

The comparison with two other measures, namely FDC and IL, also illustrated the good performance of MD. First, FDC failed in scaling problems and irrelevant deceptive problems, while both MD and IL succeeded. Second, although the performance of MD and IL is competitive, MD is more accurate than IL for constantness problems and irrelevant deceptive problems since its correlation coefficient is much higher than that of IL. The limitations of MD were also discussed. The first one is due to the fact that MD is based on two statistics over the whole network, MD may fail in predicting the performance of (1+1)EA or local search or GA with small populations when a class of distance motif in a DFLN is small in amount and narrow in spatial distribution. The second is the prediction of MD on the situation that neither guide nor deceptive information presents explicitly.

It is well known that fitness landscapes can be related to networks, but few studies have been done to design predictive difficulty measures using network properties. Since network motifs are basic building blocks of complex networks, problem difficulty is expected to be related to these building blocks, and our work confirmed this. In fact, network motifs can be seen as the features of problems, and problem difficulty is determined by these features. Thus, to predict problem difficulty is equivalent to learning the relationship between these features and problem difficulty, which can be seen as a machine learning problem. This study represents our attempt toward finding such relationships in a simple way.

More work remains to be done. First, taking global optima as references is impractical for practical applications; thus, other reference points need to be investigated. Second, the current version of MD has constraints on the operator used to build FLNs. That is, the operator should be unary and small in the size of neighbourhood, and the probability of converting one node to different neighbours should be the same. Since edges in DFLNs are created using only the relative difference between fitness values of two end nodes, MD is suitable only for analysing comparison-based search algorithms that use the operator satisfying the above conditions. Another drawback of MD is that the ideal distance, namely the shortest path length, may not work for some encoding methods, and we had to find other ways to calculate the distance. Moraglio and Poli (2004) and Moraglio (2007) assigned distances to search operators following the geometric framework, in which both mask-based crossovers and bitwise mutations for binary strings are associated with the Hamming distance. This work is really useful in helping MD break through current limitations on operators and distance, and it is also one of our future works. To study problem difficulty is only a step toward the design of more efficient algorithms. We thus hope that MD will be useful in the design of efficient search techniques or operators.

## Acknowledgments

This work is partially funded by an Australian ARC Discovery Project Grant. The authors would like to thank the reviewers for their helpful comments and valuable suggestions.