Abstract

This article investigates Gray Box Optimization for pseudo-Boolean optimization problems composed of M subfunctions, where each subfunction accepts at most k variables. We will refer to these as Mk Landscapes. In Gray Box Optimization, the optimizer is given access to the set of M subfunctions. We prove Gray Box Optimization can efficiently compute hyperplane averages to solve non-deceptive problems in time. Bounded separable problems are also solved in time. As a result, Gray Box Optimization is able to solve many commonly used problems from the evolutional computation literature in evaluations. We also introduce a more general class of Mk Landscapes that can be solved using dynamic programming and discuss properties of these functions. For certain type of problems Gray Box Optimization makes it possible to enumerate all local optima faster than brute force methods. We also provide evidence that randomly generated test problems are far less structured than those found in real-world problems.

1  Introduction

Pseudo-Boolean optimization problems correspond to a class of functions, that use bit strings as the natural representation, and which can output any real value. The class of k-bounded pseudo-Boolean optimization problems refers to problems that restrict the nonlinearity of the function to at most order k interactions. Let X denote the set of binary strings of length n which constitutes the search space. These k-bounded pseudo-Boolean functions can be expressed in the following form:
formula
1
where maski selects the k bits that are extracted from the bit string these k bits are then used to evaluate the subfunction fi. In practice, the mask can be dropped in as much as the mask is inherent in the definition of subfunction This class of problems include MAX-kSAT problems, NK Landscapes (from theoretical biology) and spin glass problems (from physics). Thus, this represents an important and widely studied class of optimization problems. Furthermore, just as every SAT problem can be expressed as a MAX-kSAT problem, every pseudo-Boolean function with an algebraic evaluation function can be expressed as a quadratic pseudo-Boolean function (Boros and Hammer, 2002). This means that virtually every benchmark problem that can be found in the optimization literature that uses a bit representation can be expressed as a k-bounded optimization problem.

This article will use the term “Mk Landscape” to refer to any k-bounded pseudo-Boolean optimization problem. M refers to the number of subfunctions in Equation (1). And k is a constant that provides an upper bound on the number of variables that can be passed as input to any one subfunction. M is polynomial in n, the number of variables, since any k-bounded pseudo-Boolean function can be written such that . In many cases, . In some ways, this is a new term for an old concept. The class of spin glass problems is also general enough to refer to any k-bounded pseudo-Boolean optimization problem (Young, 1984). Furthermore, Heckendorn (2002) has already used the label embedded landscapes for the same class of problems.

So why introduce a new label? One reason is that while NK Landscapes have become popular as test functions, they impose unnecessary restrictions on the structure of the functions. In an NK Landscape, and (which is confusing) and variable xi must appear in subfunction None of these requirements turn out to be important to most fundamental theoretical properties of NK Landscapes. Furthermore, the Mk notation is much closer in form to the range of problems seem in MAX-kSAT problems. For example, random MAX-3SAT problems are characterized by a phase transition at ; below this phase transition, most random MAX-3SAT problems are satisfiable and above this phase transition, most problems are unsatisfiable (Cheeseman et al., 1991). Furthermore, every NK landscape can be converted into an Mk Landscape where but the resulting Mk landscape does not always meet the definition of an NK Landscape. By transferring what we know about NK Landscapes to Mk Landscapes, we remove unnecessary restrictions and create new connections to problems such as MAX-kSAT and spin glass problems.

Another theme of this article is that Gray Box Optimization methods can be used to analyze problem structure.

Definition 1 (Gray Box Optimization for Mk Landscapes).

A Gray Box Optimization algorithm for Mk Landscapes is one that is given access to the individual subfunctions that are summed in Equation (1).

The optimizer does not need to know the specific application. The application could be an NK Landscape or a MAXSAT problem for encryption or AI Planning. But problem structure can still be extracted from Equation (1). For example, it becomes immediately clear if the problem is separable. From a Gray Box optimization point of view, an enormous amount of useful information can be gained with little or no loss of generality.

Understanding problem structure is also closely related to a fundamental question that should be of paramount importance to all researchers in the evolutionary algorithms community: What problems should we try to solve with Evolutionary Algorithms and what problems should be solved with other methods?

Black Box complexity has become a popular line of research in the Evolutionary Algorithms community. But much of this work focuses on the runtime analysis of relatively simple algorithms such as, for example, the (1 + 1) Evolution Strategy. Furthermore, often this analysis is done on problems in the complexity class P. Obviously, in most cases random stochastic search algorithms should not be applied to problems in the complexity class P. The runtime analysis might be relevant if it is modeling some natural process that is inherently stochastic. But then we should also expect to know details of that natural process.

This article makes several fundamental contributions. First, in Section 3 we show that the concept of deception is not just relevant to the search behavior of a simple genetic algorithm (Whitley and Das, 1991). It is also a provable requirement for the class of Mk Landscape problems (and the class of MAX-kSAT problems) to be NP Hard. The concept of deception is closely related to hyperplane statistics; basic statistics about hyperplanes for every Mk Landscape can be computed in polynomial time. Therefore, deception is not related to any specific algorithm. In fact, statistical summaries over the order 1 hyperplanes have long been used in the SAT community as a heuristic to guide search.

Second, in Section 4 we generalize the idea that certain Mk Landscapes can be solved using dynamic programming, and thus fall into the complexity class P. Currently, dynamic programming is used to find the global optimum of “Adjacent NK Landscapes.” We prove that every Adjacent NK Landscape with the normal wrapping interaction bounded by can always be converted into another Adjacent NK Landscape that is not wrapping and bounded by at most variable interactions. This is important because Adjacent NK Landscapes are still being used to test the effectiveness of a wide range of optimization methods, including Estimation of Distribution Algorithms (EDAs).

We introduce a new kind of polynomial-time solvable Mk Landscape, the Tree Decomposition Mk Landscape. This represents a more general class of Mk Landscapes that can be solved in polynomial time by the dynamic programming algorithm developed by Hammer et al. (1963). In Section 4.3 we also show that even “easy” problems that have a polynomial time solution can display significant amounts of deception.

In Section 5, results are presented analyzing new move selection methods for local search. For Mk landscapes, we never need to use mutation operators to find improving moves. The location of improving moves can be calculated deterministically. Section 5.2 presents a proof that improving moves in the Hamming distance 1 neighborhood can be found in constant time on average. Section 5 also discusses the number of local optima found in certain Mk Landscapes, as well as methods that can enumerate all of the local optimal in larger problems (e.g., N = 60).

Finally, Section 6 shows that for application based MAX-kSAT problems there can be a great deal of problem structure, including a large amount of problem symmetry for certain problems. MAX-kSAT problems can be expressed as CNF clauses, or as DNF functions. For industrial problems, the DNF functions can be an order of magnitude more compact than the CNF clause representation. This should also be taken as a warning. Mk Landscapes (e.g., randomly generated NK Landscapes) can also be too random. Real-world problems typically display a significant amount of problem structure and the most successful search algorithms for real-world problems are likely to be those that effectively exploit that problem structure.

2  Pseudo-Boolean Functions

Note that and in this paper; either notation will be used due to the use of different notations in different communities. However, . Our notation in this section follows the Boros and Hammer (2002) tutorial on pseudo-Boolean functions.

Let n denote a positive integer, and let denote a set of variable indices. All pseudo-Boolean functions can be uniquely represented as multi-linear polynomials:
formula
where cS is a constant weight associated with subset .
Every pseudo-Boolean function minimization problem with an algebraic form can be converted into a quadratic pseudo-Boolean function. A key insight to this result can be seen by making the following observation (Boros and Hammer, 2002). Assume x, y, and z are Boolean variables.
formula

The new variable z can act as an auxiliary variable that can replace . This means pairs of nonlinearly interacting variables can be replaced by a single auxiliary variable along with additional quadratic expressions (subfunctions) that act as constraints to guarantee the new problem has the same global optima as the original problem. The reduction is computed in polynomial time. This is analogous to the widely known result that every SAT instance can be reduced to a MAX-kSAT problem (Cormen et al., 1990).

It should be noted that while the new k-bounded optimization problem is guaranteed to have the same global optima as the original problem, the original problem and the transformed problem typically do not induce exactly the same search space. This may also impact what it means for a solution to be “near optimal” or “good enough” to be useful.

3  A Proof that the Only Challenging Mk Landscapes Are Deceptive

Some problems are inherently easy to solve under Gray Box Optimization. And yet, we often see problems that are easy to solve being used as benchmark functions to evaluate new search algorithms. This almost always means that problem structure is being ignored or that the search method is exploiting problem structure found in simpler problems that may not be present in more difficult NP-hard problems. In this section we begin by reconsidering the concept of “deception.”

3.1  Walsh Analysis and Hyperplanes

We can transform the evaluation function of any Mk Landscape into a polynomial form by applying a discrete Fourier transform, which is also known as a Walsh transform and denoted here by W.
formula
2

The Walsh transform can be normalized in various ways, or be used unnormalized when subfunctions have exactly k variables. If the nonlinearity of a pseudo-Boolean function is unbounded, the Walsh transform has cost of (Goldberg, 1989a). However, if there are at most coefficients per subfunction and at most coefficients in the Walsh polynomial. In general, because M is a polynomial function of n, the Walsh analysis can be done in polynomial time for Mk Landscapes.

The Walsh coefficients can be used to efficiently compute the average evaluation of solutions contained in any -dimensional hyperplane; an -dimensional hyperplane is also known as an order j hyperplane (Rana et al., 1998; Goldberg, 1989a). Let h denote an order j hyperplane where j variables have preassigned bit values and let be a mask with 1 bits marking the locations where the j variables appear in the problem encoding; has 0 bits elsewhere. Let solution x assign values to the j variables. Then the mean fitness of hyperplane h is
formula
The notation is a concise way to count the number of 1 bits that bit strings y and x share in common; is the “Walsh Function,” which is also sometimes denoted by , that determines the sign of the Walsh coefficient. The mean for a single order 1 hyperplane therefore can be computed in time:
formula
3
where determines which bit associated with hyperplane h and determined the bit associated with hyperplane h.

Assume all of the means for all hyperplanes up to order j have been computed, where j is a constant. Let return the hyperplanes with the best mean over all of the order j hyperplanes that can be defined using the mask; note that can return more than one, in case of ties. The j bit values returned by hyperplanes in represent the bits that match those in the hyperplanes with the maximal mean over the order j set defined over positions

Definition 2 (Order j Deceptive).

Let return the bit assignment found in string x in the positions selected by A pseudo-Boolean function f is order j deceptive if at least one of the following conditions occurs:

  • there is a hyperplane h of order such that contains more than one hyperplane (there is a tie), or

  • for every global optimum of f, there exists at least one hyperplane h of order j such .

Under this definition, if a function is not deceptive at any order, there exists a specific global optimum, , and forall hyperplanes Note that there can be more than one global optimum, but all hyperplanes point to only one specific global optimum.

Theorem 1.

Every Mk Landscape that is not deceptive at any order can be exactly solved in polynomial time by a Gray Box Optimizer.

Proof.

If an Mk Landscape is not deceptive at any order, then there exists a specific global optimum such for every hyperplane. The Walsh decomposition in Equation (2) yields a Walsh polynomial with at most coefficients when k is a constant. M is polynomial in size with respect to n. It is sufficient to compute the means of the n order 1 hyperplanes to identify ; this can be done using Equation (3). Equation (3) executes in constant time for a single order 1 hyperplane, and in time for the n order 1 hyperplanes.

In general, we do not know if a problem is deceptive or not. However, computing the solution that is consistent with the n best order 1 hyperplanes is inexpensive and the solution can provide a very good initial solution even if the problem is deceptive.

We can, of course, reverse engineer problems that are not deceptive. One easy way to construct a problem that is not deceptive is to pick a global optimum (e.g., a string of 1 bits of length n) and then construct an Mk Landscape such that each subfunction is optimal using a bit pattern that matches the global optimum (e.g., a string of 1 bits of length k). Finally, by construction, we can build each subfunction by selecting Walsh coefficients so that the bits returned by match the bits found in a target pattern corresponding to a global optimum.

Holland (2000) has similarly suggested how to construct “Hyperplane Defined Functions.” But, Hyperplane Defined Functions are also easy to solve if they are Mk Landscapes and we explicitly know they are not deceptive above some order j; as long as j is reasonably small, we exploit the calculation to identify a global optimum.

There are also simple problems that are not k-bounded, but nevertheless, are still not deceptive. The “Leading Ones” problem is an example (Jansen, 2013). It is easy to prove mathematically that will always return a 1 bit for all of the order 1 hyperplanes for the maximization Leading Ones problem. A proof that “Leading Ones” is not deceptive is given by Whitley (2015). In this case, the order 1 hyperplanes can be sorted from those that display the greatest difference to those that display the least difference. None are deceptive, but it is easier to detect (using probabilistic methods) the differences in those hyperplanes where the differences are greatest. This creates a kind of “staircase” effect for stochastic search methods because if they are able to make a decision about one variable at a time, the path to the global optimum becomes clearer after each variable assignment.

3.2  MAX-SAT and Deception

Since classic decision problems such as MAX-3SAT are a form of Mk Landscapes, hyperplane statistics that can be computed in polynomial time cannot provide a reliable tool for locating globally optimal solutions unless P = NP.

Håstad (1997) has shown that in the worst case one cannot guarantee even an above average solution to a MAX-SAT instance unless P = NP. By extension, in the worst case, hyperplane statistics cannot be used to guarantee even an above average solution to any arbitrary Mk Landscape. Thus, while order j hyperplane statistics are relatively inexpensive to compute, they cannot be used to reliably guide search in the worst case. This is of course bad news (but not new news (Rana et al., 1998; Rana and Whitley, 1998a)) for theories from before 1999 about genetic algorithms as hyperplane sampling algorithms. It should be pointed out that the worst case behavior of search algorithms outlined by Håstad also equally applies to methods such as WalkSAT, which are quite good in practice.

All nontrivial MAX-kSAT problems are deceptive. The “clause count” counts the number of times that a variable appears as a positive variable compared to the number of time it appears as a negated variable across all Conjunction Normal Form (CNF) clauses. Modern exact solvers, such as MiniSat, use a “polarity vector” that guides the initial assignment of variables (Een and Sorensson, 2015). One way the polarity vector can be set is using the clause count. Observations from Rana et al. (1998) can be used to compute order 1 Walsh coefficients from the clause counts.

The following represents an example of 3 SAT clauses in CNF form cast as a MAX-3SAT problem:
formula

For MAX-SAT, the clauses operate as functions that return 1 if they are satisfied, or 0 if they are not satisfied. Thus, a satisfying assignment over all clauses (if one exists) will return a value of M. Note that there are 8 possible assignments, and all but 1 of these 8 assignments will satisfy a clause.

Let return a k-bit string with bits indicating which variables in the clause are negated. For example, for clause , thus:

Then the Walsh coefficients for fi are:
formula

Thus if returns a 1 bit in the same position as the position of bit string j corresponding to the index of the order 1 hyperplane, then returns a negative sign and the order 1 Walsh coefficient is positive; otherwise the order 1 Walsh coefficient is negative.

If all clauses have the same number of variables, then has the same magnitude across all variables. In that case, each time a variable is negated it contributes to the overall hyperplane average. And each time a variable is positive, it contributes . Thus, hyperplane averages can be computed directly from the clause count, and the sign of the hyperplane average directly correlates with whether the variable appears more often in negated form or in positive form.

Therefore, if a MAX-kSAT problem with fixed k across all clauses is not order 1 deceptive, then initializing the polarity vector using the clause count will guarantee an optimal solution in polynomial time because the clause count is a proxy for the best order 1 hyperplanes.

3.3  Bounded Separable Functions = Bounded Complexity

Definition 3.

An Mk Landscape is order k separable if every variable is restricted so that it appears in one and only one subfunction.

Theorem 2.

A global optimum of an order k separable Mk Landscape is found in one evaluation in time by a Gray Box Optimizer.

Proof.

Since each variable appears in only one subfunction, . Since each subfunction is independent from every other subfunction, each can be optimized in time by brute force enumeration. The full evaluation function is executed only once.

Trap functions (Deb and Goldberg, 1993) are often expressed as a form of order k separable Mk Landscape. In their separable form, they are often made up of subfunctions, where each subfunction is defined over k variables. What makes these problems “challenging” is that an order k subfunction is constructed so that the order nonlinear interactions lead away from the order k solution that optimizes that particular subfunction. The construction is usually deceptive, such that (without loss of generality) the global optimum is the string of all 1 bits, but the gradient from all other points in the search space lead toward the next best local optimum at the string of all 0 bits. This induces low order hyperplane averages where solutions with 0 bits are better on average than solutions with 1 bits.

Under Black Box optimization, the optimizer does not automatically know which bits belong to a particular subfunction. However, for a Gray Box optimizer that has access to the definitions of the subfunctions, order k separable trap problems become easy to solve.

4  Solving Mk Landscapes Using Dynamic Programming

In this section, we consider Mk Landscapes where the global optimum can be found in polynomial time using Dynamic Programming. A higher level concern is whether Mk Landscapes that have a polynomial time solution can be reliably used to evaluate search algorithms that are designed to generate good solutions to problems that are NP-hard.

4.1  Adjacent Landscapes and Tree Decomposition

In a standard NK Landscape, subfunction fi includes binary variable xi as well as K other variables; thus there are variables per function. Weinberger (1996) first showed that Kauffman’s NK Landscape model is NP-complete when .

However, in special cases, an NK Landscape can be optimized in polynomial time. In particular, Adjacent NK Landscapes are constructed so that the variables are contiguous. Thus, subfunction fi takes as input the variables xi to . The standard definition of the Adjacent NK Landscapes therefore assumes that the interactions wrap so that bit xN interacts with variables x1 to xK.

We will refer to the standard Adjacent NK Landscape as a Cyclic Adjacent NK Landscape. We can also define an Acyclic Adjacent NK Landscape where the variables do not wrap. We then relate both Cyclic and Acyclic Adjacent NK Landscapes and the concept of tree-width and tree decomposition. We first define a variable interaction graph for Mk Landscapes.

Definition 4.

A Variable Interaction Graph is a graph G(V,E) where the vertex set corresponds to the set of variables in the Mk Landscape and edge if and only if variables xi and xj have a nonlinear interaction.

We will use the following subfunctions to illustrate the concept of a variable interaction graph. In this case, and . The subfunctions are as follows:
formula

This set of subfunctions results in the Variable Interaction Graph shown in Figure 1. This figure assumes that every subfunction induces a nonlinear relationship between all of the variables in the subfunction.

Figure 1:

The Variable Interaction Graph (VIG) tracks sources of nonlinearity.

Figure 1:

The Variable Interaction Graph (VIG) tracks sources of nonlinearity.

Gao and Culberson (2003) use the name interaction graph for this concept, except they only require that two variables co-occur in some subfunction. This is also similar to the definition of a Variable Incident Graph used by the MAX-SAT community (Ansótegui et al., 2012).

Tinós et al. (2015) consider that two variables interact when there exists a nonzero Walsh coefficient that depends on both variables; that definition is used here in as much as the Walsh coefficient represents a nonlinear interaction. For randomly generated NK Landscapes and random MAX-SAT problems, often there is no difference between these definitions. However, one can construct problems where the difference can be dramatic; the difference can also be significant when a function with fewer variables is embedded in a function with a larger number of variables. An example of this kind of embedding is used to convert a Cyclic NK Landscape into an Acyclic NK Landscape (see below). But this kind of embedding should not change the Variable Interaction Graph.

We next define tree-width and tree decomposition. This definition is provided by Gao and Culberson (2003), who in turn cite the work of Kloks (1994).

Definition 5 (Kloks, 1994).

A tree decomposition of a graph G(V, E) is a pair where is a collection of subsets of the vertices of G, and is a tree with one node for each subset of S, such that:

  1. ,

  2. for all the edges there exists a subset such that both u and w are in Xi, and

  3. for each vertex v, the set of nodes that contain v, , form a subtree of T.

The width of the tree decomposition is . And the tree-width of a graph is the minimum width over all tree decompositions of the graph.

Hammer et al. (1963) proposed a dynamic programming algorithm to find the global optimum of pseudo-Boolean optimization problems in time , where t is the tree-width of the variable interaction graph of the optimization problem.

For the Acyclic Adjacent NK Landscape, the set of subfunctions forms a tree decomposition of tree-width K. This follows directly from the definition of Acyclic Adjacent NK Landscapes and the definition of tree decomposition. Because each subfunction becomes a node in the tree decomposition, each variable appears in a contiguous set of nodes (i.e., subfunctions) and thus forms a subtree of the tree decomposition; each node (i.e., subfunction) is of size , and the tree-width is K.

For example, let f be the sum of subfunctions that form an Acyclic Adjacent NK Landscape:
formula
The Variable Interaction Graph of f connects variable xi with , , and when they exist (we assume nonlinearities between the pairs of variables in each subfunction). A tree decomposition of f would be where S contains , , , and , and the edges in T are . The tree decomposition is illustrated graphically in Figure 2.
Figure 2:

Example of tree decomposition for an Acyclic Adjacent NK Landscape. The tree-width is 2.

Figure 2:

Example of tree decomposition for an Acyclic Adjacent NK Landscape. The tree-width is 2.

Lemma 1.

Every Cyclic Adjacent NK Landscape can be converted into an Acyclic Adjacent NK Landscape with and .

Proof.
We present a constructive proof. We collapse the cycle of variables into a sequence of variables that is then used to construct an equivalent Acyclic Adjacent NK Landscape. We assume (without loss of generality) that N is even. The sequence of variables in the new Acyclic Adjacent NK Landscape is given by
formula

This sequence can be partitioned into contiguous subsequences of length . Clearly every original subfunction of the form, lies within a subsequence of in the new variable ordering since exactly one variable lies between each pair of (originally) adjacent variables. There are subsequences; each subsequence becomes a subfunction in the Acyclic Adjacent NK Landscape. Except at the two extreme ends of the sequence, each new subfunction in the new Acyclic Adjacent NK Landscape includes only the variables that map to exactly one of the original subfunctions. Enumerating the cases at the extreme ends of the sequence (where the wrapping occurs) will complete the proof.

As an example, one can enumerate the subfunctions of a Cyclic Adjacent NK Landscape where and .
formula
These subfunctions can then be converted into the following Acyclic Adjacent NK Landscape. Note that this construction embeds subfunctions of 3 variables into subfunctions of 5 variables. However, no new nonlinearity is introduced and the Variable Interaction Graph remains the same by Definition 6.
formula
The variables can be renamed so that adjacent variables have adjacent integer indices. For example, let denote the renamed variables:
formula

This construction has two useful implications. The subfunctions of an Acyclic NK Landscape form a tree decomposition of width K (which may be relative to some related Cyclic NK Landscape). From this result, we know that dynamic programming can be used to find the optimal solution of any Acyclic Adjacent NK Landscape in time and any Cyclic Adjacent NK Landscape in time. Other tree decompositions exist which yield the same result.

Wright et al. (2000) proposed a dynamic programming algorithm to find the global optimal solution in (cyclic) Adjacent NK Landscapes. Their algorithm runs in . For high values of K this algorithm is slower than Hammer’s algorithm. Others have previously observed that the Cyclic Adjacent NK Landscape has a tree-width of (Gao and Culberson, 2003; Weinberger, 1996). Gao and Culberson (2003) also address the following question: What is the expected tree-width of a tree decomposition for Random NK Landscapes? Recall that in this case, the K additional variables that appear in subfunction fi are randomly chosen. The variable interaction graph of Random NK Landscapes has tree-width one when and, thus, can also be solved in . In the case , Gao and Culberson prove the following theorem:

Theorem 3 (Gao and Culberson, 2003).
Let w(N, K) be the tree-width of the interaction graph of the NK Landscape with random neighborhoods. For , there is a fixed constant such that
formula

This suggests that NK Landscapes with randomly generated neighborhoods are very unlikely to be solved by dynamic programming methods.

4.2  Tree Decompostion Mk Landscapes

Can we use dynamic programming in a general Mk Landscape? Computing the tree-width of a general graph is an NP-hard problem (Crama et al., 1990). A constant upper bound for the tree-width of a variable interaction graph is enough to prove that the problem can be solved in using Hammer’s dynamic programming algorithm. When the tree-width is unknown, greedy algorithms can be used to obtain an upper bound of its tree-width of the Variable Interaction Graph. This upper bound of the tree-width would allow one to bound the cost of running dynamic programming.

Adjacent NK Landscapes control tree-width by localizing variable interactions. This idea can also be generalized to create “Tree Decomposition Mk Landscapes” which can be solved in polynomial time using dynamic programming. Whitley (2015) recently introduced the following definition:

Definition 6.

A Localized Mk Landscape is an Mk Landscape where each subfunction fi is defined over a window of variables from xi to , however, the number of variables used in each subfunction can be less than .

Consider the following function:
formula
The subfunctions are not part of an Adjacent NK Landscape. However an adjacent relationship can be created between subfunctions by adding additional dummy variables:
formula
In this article we propose a more general result.
Definition 7.

A Tree Decomposition Mk Landscape (TD Mk Landscape) is any Mk Landscape which has a known and bounded tree-width of k.

As is the case with Adjacent NK Landscapes, TD Mk Landscapes can be constructed. We first construct an table. Each of the M rows of the table corresponds to one subfunction; each row enumerates the variables that will be used by that subfunction. In total there are n variables, and each variable appears in at least one row of the table. In order to be a TD Mk Landscape, every variable that is used must appear in a contiguous block of subfunctions (i.e., a contiguous set of rows in the look-up table). When constructed in this way, the set of subfunctions for the corresponding TD Mk Landscapes is also a tree decomposition of tree-width under Definition 7. The set of subfunctions (the rows of the table) corresponds to the set S. T is a tree with one node for each row of the table. The set of nodes (rows) that contain variable v form a subtree of T because all references to variable v occurs in a contiguous set of rows in the table. We illustrate the TD Mk Landscape with an example in Figure 3, where , and .

Figure 3:

Example of lookup table of variables of a TD Mk Landscape. In the example , and . Each row of the table can become a subfunction in an Mk Landscape, with variables V1 to V6. The table also corresponds to a Tree Decomposition of that same set of functions.

Figure 3:

Example of lookup table of variables of a TD Mk Landscape. In the example , and . Each row of the table can become a subfunction in an Mk Landscape, with variables V1 to V6. The table also corresponds to a Tree Decomposition of that same set of functions.

As is the case with NK landscapes, the codomain values can be generated randomly as a second, lookup table. We should also note that there are (at least) two ways to generate subfunctions for Mk Landscapes. We can generate random codomain values for each subfunction, which is how NK Landscapes have been traditionally generated. But we could also randomly generate Walsh coefficients for each subfunction. If we wish to use the Walsh coefficients to compute information about the Mk Landscape, then generating random subfunctions directly as Walsh polynomials makes sense. The subfunctions also do not have to be random.

To capture the idea of “Localized Mk Landscapes” a subfunction corresponding to one row of the table can be sparse, either by making the corresponding Walsh coefficients sparse for the subfunction corresponding to that row, or by making the individual subfunctions in one row of the table a composition of smaller, sparse subfunctions. For example (as we have already seen) we might define a sparse function as follows:
formula

In this case, x9 and x8 may be necessary to have a tree decomposition, but not every variable needs to play an active role in every subfunction. In Figure 4 we show the Variable Interaction Graph of a hypothetical pseudo-Boolean function f whose subfunctions could depend on the variables in the way shown in Figure 3. One can represent the function in Figure 4 as a Localized Mk Landscape where k = 3 and M = 23. However, the function in Figure 4 can also be represented as a TD Mk Landscape where k = 6 and M = 10 by embedding the function into the tree decomposition in Figure 3.

Figure 4:

Example of Variable Interaction Graph for a Localized Mk Landscape. This corresponding function can be embedded into the tree decomposition in Figure 3.

Figure 4:

Example of Variable Interaction Graph for a Localized Mk Landscape. This corresponding function can be embedded into the tree decomposition in Figure 3.

The importance of the TD Mk Landscape construction is two-fold: first, researchers interested in polynomial-time solvable pseudo-Boolean functions should not restrict their attention to Adjacent NK Landscapes; second, real-world problems can also be transformed into a TD Mk Landscape if a tree decomposition of some fixed tree-width is already known. Again, Adjacent NK Landscapes do not allow us the same flexibility.

4.3  Localized Mk Landscapes and Deception

One idea that long motivated research in the genetic algorithm community in the past is that at a sufficiently high order, hyperplane statistics should be useful for guiding search (Holland, 1975; Goldberg, 1989b). Are Localized Mk Landscapes that have a polynomial time solution less deceptive than other problems?

Table 1 presents an Adjacent NK Landscape with 10 variables. It is also an “NKq” landscape (Geard et al., 2002), because the evaluation of each subfunction returns only 0 or 1; the “q” in the “NKq” label refers to the fact that the codomain values are discrete and can take only one of q values, from 0 to . Thus, in this case. This yields a search space with plateaus containing neighboring moves with the same evaluation.

Table 1:
This is a simple Adjacent NK Landscape. Each subfunction has four local optima with evaluation 1; all other points have evaluation 0. The global optimum is the string of all 1 bits, and each subfunction has 111 as a local optimum.
FunctionLocal OptimaCodomain vector
 000  001  100  111 1,1,0,0,1,0,0,1 
 011  001  100  111 0,1,0,1,1,0,0,1 
 001  010  101  111 0,1,1,0,0,1,0,1 
 110  100  011  111 0,0,0,1,1,0,1,1 
 001  100  110  111 0,1,0,0, 1,0,1,1 
 000  100  010  111 1,0,1,0,1,0,0,1 
 001  010  101  111 0,1,1,0,0,1,0,1 
 000  001  100  111 1,1,0,0,1,0,0,1 
 000  010  011  111 1,0,1,1,0,0,0,1 
 000  101  110  111 1,0,0,0,0,1,1,1 
FunctionLocal OptimaCodomain vector
 000  001  100  111 1,1,0,0,1,0,0,1 
 011  001  100  111 0,1,0,1,1,0,0,1 
 001  010  101  111 0,1,1,0,0,1,0,1 
 110  100  011  111 0,0,0,1,1,0,1,1 
 001  100  110  111 0,1,0,0, 1,0,1,1 
 000  100  010  111 1,0,1,0,1,0,0,1 
 001  010  101  111 0,1,1,0,0,1,0,1 
 000  001  100  111 1,1,0,0,1,0,0,1 
 000  010  011  111 1,0,1,1,0,0,0,1 
 000  101  110  111 1,0,0,0,0,1,1,1 

The problem in Table 1 has a very simple solution. Every subfunction has a local optimum at the substring 111. And indeed the global optimum is the string 1111111111. So, this is not a difficult problem.

One might expect that computing order 3 statistics that match up with the 3 variables that appear in each subfunction might be useful. However, this does not appear to be the case. Computing the order 3 hyperplane averages for the problem in Table 1 revealed no pattern that might suggest the location of the global optimum.

Statistics for order hyperplanes were also computed. All of the hyperplanes for the bit combinations of were considered. This span of variables captures all of the nonlinear Walsh coefficients that touch the middle variable . This is sufficient to capture the nonlinearity of a subfunction fi as well as the nonlinearity of the two subfunctions that are adjacent: and When the hyperplane statistics were computed over these sets of variables, a general pattern emerged that displayed a bias for 1 bits, but the pattern was imperfect and still too weak to guide search. The hyperplanes with the highest averages are shown next.
formula

Despite the fact that this is a simple Adjacent NK Landscape (with localized variable interactions and localized nonlinearity) the order 5 hyperplane statistics do not provide any recognizable mechanism for identifying the global optimum. This simple function is still deceptive at order 5.

One of the things that makes this simple function a difficult optimization problem are the plateaus. Locally, for any reasonably good solution almost all Hamming distance 1 neighbors might look equally good. The function in Table 1 is, in fact, also a MAX-SAT problem. The evaluation has been simplified to be only 1 or 0 (true or false).

5  When Mk Landscapes Are NP-Hard

For general classes of problems that are NP-hard, exact algorithms that are guaranteed to return a globally optimal solution at some point will fail to scale. When exact methods can no longer be applied, heuristic methods must be used.

5.1  Mk Landscapes and Local Search

The evolutionary computation community has long used pseudo-Boolean optimization problems as test functions. There was a bias toward using bit representations in the genetic algorithm community starting in the 1970s and continuing until the turn of the century (Goldberg, 1989b; Holland, 1975). Of course, random mutation and recombination are considered to be key elements of almost any evolutionary algorithm.

However, random mutation is completely unnecessary to find improving moves in any Mk Landscape problem. All improving moves can be found deterministically, and usually this can be done in constant time per move (Whitley and Chen, 2012; Whitley et al., 2013; Chicano et al., 2014).

In general, we will consider the search space to be a hypercube. A local search neighborhood can be defined over the Hamming distance 1 “bit flip” neighborhood. When a bit flips, it only changes the evaluation of those subfunctions in which that bit appears. Furthermore, new improving moves can only occur among those bits that feed into the affected subfunctions. This is illustrated in Figure 5. By clever bookkeeping (using either derivative calculations, or tracking the contribution of Walsh coefficients), the location of improving moves can be tracked in time (Whitley and Chen, 2012). This means that we never need to use “mutation operators” to locate improving moves in the Hamming distance 1 neighborhood.

Figure 5:

The location of potential improving moves is known deterministically. When one bit is flipped, it affects only a subset of subfunctions. The only new improving moves must be among the bits that feed into the affected subfunctions.

Figure 5:

The location of potential improving moves is known deterministically. When one bit is flipped, it affects only a subset of subfunctions. The only new improving moves must be among the bits that feed into the affected subfunctions.

The MAX-kSAT community have long exploited a specialized fast form of local search that exploits this idea. For example, the GSAT algorithm exploits this insight (Mitchell et al., 1992; Hoos and Stützle, 2005). However, it can be shown that a generalization of this fast form of update holds for all Mk Landscapes (Whitley and Chen, 2012; Whitley et al., 2013).

There is a one-time cost of enumerating the neighborhood to determine the location of improving moves. A bit to flip is then selected. After this, to determine the location of improving moves, if the most recent bit flip was applied to bit xi, then the creation of a new improving move at location xj is possible only if there is a nonlinear interaction between bit xi and xj. Two variables can have a nonlinear interaction if and only if they are adjacent in the Variable Interaction Graph.

Let us use the Variable Interaction Graph in Figure 1 for an example. If the most recent variable to flip was variable x5, the graph tells us that the only possible new improving moves will be at bits or x2. Furthermore, if there already exists improving moves at one of these bits locations, it is also possible that an improving move may have been destroyed; both the creation and destruction of improving moves must be tracked. But, on average, tracking improving moves can be done in constant time per move for the Hamming distance 1 neighborhood.

5.2  Runtime Analysis for Move Selection

We will begin by proving conditions where improving moves can be selected in constant time. We start with Hamming distance 1 local search using next improving moves. We will assume that , where M is the number of subfunctions. In particular we will assume where n is the number of variables and c is a positive integer. We will define a “critical” variable as a variable that appears in more than subfunctions, where is a positive integer.

We can use a Score vector to track the potential contribution of each bit. stores partial updates to the evaluation function such that if the current solution is and a neighbor yi is obtained by flipping bit
formula

Creating the vector has a time cost. It can be constructed by enumerating a starting neighborhood one time.

The following results hold for all local search algorithms for Mk Landscapes that maintain a vector. Let Ub be the cost of updating the vector after flipping variable b and let be the average cost of updating the vector assuming that all n variables have been flipped exactly once. These will be used to create a foundation for an amortized cost analysis. Let be a constant that bounds the cost of updating the vector for a single variable in a single subfunction after one bit is flipped. We next compute the average cost of an update to the vector after every variable is flipped exactly one time.

Lemma 2.

Assume k is bounded by a constant, and each of the n variables is flipped once: .

Proof.
Assume all subfunctions reference exactly k variables; this will provide a bounding case for subfunctions that reference fewer than k variables. The cost needed to update the vector in just one location can be bounded by a constant . When one variable is flipped, if a subfunction is affected (because it references that variable), that subfunction causes k locations to be updated in the Score vector, resulting in work. After all n variables are flipped, each subfunction will update the vector exactly k times in k locations each time. Thus the total number of updates to the Score vector associated with exactly one subfunction is always . We can therefore bound the total amount of work done when all n bits are flipped exactly one time:
formula
Since , we obtain a bound

This provides the basis for an amortized runtime analysis. While any one update associated with one bit flip (as represented by Ub) might take time because one variable might appear in subfunctions, it is clear that after all of the variables are flipped exactly one time, on average each update takes constant time. In the next theorem, we relax the requirement that every bit must be flipped exactly one time.

Theorem 4.

Select a constant such that . Select a second constant . If any variable appears in more than subfunctions it will only be flipped times during any sequence of n moves. Then the amortized cost per bit flip move associated with the updates to the vector is and is bounded by over any sequence of improving moves.

Proof.
Let C be the set of variables that appear in more than subfunctions. These variables can be flipped at most times. Collectively (by Lemma 12) the associated runtime cost is bounded as follows:
formula

For the remainder of the variables (not in set C), we allow any variable to be flipped at any time (to satisfy the proof, until n total moves occur). Since all other variables appear in at most subfunctions, and updating one subfunction causes at most updates to the Score vector, there are at most updates to the score vector by all of the other variables (not in set C) that are flips over any sequence of n flips. This can be bounded by .

Therefore the total amount of work is bounded by
formula
since . Thus, over any sequence of n moves, the amortized number of updates to the Score vector is less than for 1 move.

This result is sufficient to implement “next improving” move local search where any improving move can be taken next (Ochoa et al., 2010). But it is also relatively easy to implement an approximate form of best improving move local search. If the number of improving moves is less than some constant, then the moves can simply be stored in a buffer and the best improving move can be returned in constant time. However, occasionally there might be many improving moves. In this case there needs to be an estimate of the expected range of the improving moves, then improving moves can be stored in multiple buffers. Whitley et al. (2013) show how this can be done for MAX-kSAT and prove that approximate forms of best improving move can still be done in time.

Recently, Chicano et al. (2014) have shown that under relatively general conditions, improving moves can also be identified in time in a Hamming radius r neighborhood; they present results for 10,000 variable NK Landscape problems using up to radius . This idea can also be explained using the Variable Interaction Graph. Note that there are no edges between variables and x6 in Figure 1. Also assume there are no improving moves in the Hamming distance 1 neighborhood. Then flipping the bits and x6 simultaneously can never yield an improving move because there are no nonlinear interactions between these variables. Assuming improving moves in radius less than r have already been exploited, all Hamming distance r improving moves must utilize bit flips using combinations of variables that are connected in the Variable Interaction Graph.

This also means that for most Mk Landscape problems, evolutionary algorithms using mutation operators that randomly change a small number of bits are unnecessary. Rather than randomly searching for improving moves, we can rapidly compute the location of improving moves.

5.3  Mk Landscapes and Local Optima

One thing that is fundamentally different between NK Landscapes and MAX-kSAT problems is how local minima occur in the search space. MAX-kSAT problems are dominated by large plateaus where all of the points on a plateau have the same evaluations. Plateaus are a major factor in the performance of SAT solvers. Hampson and Kibler (1993) and Frank et al. (1997) found that as search approached the global optimum, plateau size grows exponentially with n, while the density of improving moves decreases. Local search heuristics for MAX-SAT have focused on how to force search into more promising regions in the presence of these plateaus (see Kautz et al. (2008) for an overview).

Nevertheless, it has been observed empirically that entire neighborhoods are never flat in MAX-kSAT problems. Sutton et al. (2009) proved that there is a zone of size around , the average function evaluation over the search space, such that all points less that are never flat, and all points greater than are never flat. Thus, unless one is sufficiently close to in evaluation, every neighborhood includes at least one improving move or one disimproving move.

NK Landscapes (and some classes of Mk Landscapes) display very different behaviors than MAX-kSAT because the evaluation of every point in the search space can be unique. When this happens, local optima are well defined in the sense that every local extremum is such that is it better (or worse) than all of its neighbors.

Assume we are given unique co-domain values and a random permutation is generated to create a function. Kauffman and Levin (1987) use M1 to denote the expected number of local optima under the standard Hamming distance 1 neighbor. Then
formula
Kauffman and Levin (1987) derive this result by pointing out that given a point x and its n neighbors, there is a probability that x is better than (or worse than) all of its neighbors. They then just sum this over all vertices in the hypercube that constitutes the entire search space.
Rana and Whitley (1998b) derive the same equation for a more general result. Assume the search space is any arbitrary size S and the neighborhood is any arbitrary size d. If one generates a single injective function and then look at all transforms in representation, the set of all possible representations corresponds to the set of functions that makes up the permutation closure for f. Assume every candidate solution x had d neighbors under every permutation closure. The odds that a candidate solution is a local optimum is again and this is the probability of an arbitrary solution being locally optimal. Multiplying this probability by the number of solutions in the search space, S, we get the expected number of local optima:
formula
4

If the functions are not injective (repeated co-domain values are allowed), the probability of a solution being locally optimal is higher than . Thus, is, in general, a lower bound of the expected number of local optima for family of functions closed under permutation.

The derivation of Rana and Whitley makes it clear that this calculation represents the expectation for any randomly chosen function. But it is based on the exact count of all optima for the set of functions (or the set of representations) that correspond to the set of functions closed under permutation given any one function of this set as a seed. For sets closed under permutation, Sharpened No Free Lunch applies (Schumacher et al., 2001). This means that no search algorithm is better than another, and no search algorithm is better than random enumeration when averaged over a set of functions closed under permutation. This also means that a global optimum can never be guaranteed in polynomial time across all (or even most) problems contained in the set of functions closed under permutation; if this were not true, then some algorithm would be better than random enumeration. It should also be noted, however, that almost all of the functions closed under permutation are random and uncompressible if all (or most) of the co-domain values are unique. Then, on average, they do not have an algebraic representation that is fundamentally more compact than a table look-up that enumerates the function co-domain values (Schumacher et al., 2001).

This has two implications for NK Landscapes (and similar Mk Landscapes). First, if each subfunction is generated randomly, then the expected number of local optima in each subfunction (independent of the combined function) is . Second, if we allow k (or K) to grow so that , then the expected number of local optima for the entire NK Landscape is again because the function is again randomly generated. This critical level of randomness may also be induced by values of K that are but still less than N. It should also be noted that NK Landscape problems where are also uncompressible. This is because each randomly generated subfunction table would have size which is obviously exponentially large for truly random functions. This also means that the set of NK Landscapes where are no longer in the complexity class P or NP; even if an exponentially large table could be stored, it would take exponential time to convert the NK Landscape into SAT.

5.4  Enumerating All Local Optima for Mk Landscapes

An interesting question is whether we can do better than brute force enumeration to find all of the local optima in an Mk Landscape. Goldman and Punch (2016) have shown that for sufficiently small k and M the answer is yes. In an Mk Landscape it is possible to identify an improving move by examining bit values. If a move is improving, those bits define a hyperplane which cannot contain any local optima. Algorithm 1 provides an example for how this property can be leveraged.1 We present a formal proof of the correctness of their algorithm in Appendix A.

formula

When we enumerate all of the local optima in both Adjacent NK Landscapes (see Table 2) and in Random NK Landscapes (see Table 3), we find two interesting results. First, there are more local optima in Adjacent NK Landscapes than in Random NK Landscapes. This was also recently reported by Ochoa et al. (2015). This is at first surprising. However, Adjacent NK Landscapes have more local constraints with more localized effects. Two subfunctions fi and fj where are too far apart for the subfunctions to interact. Thus, local minima in fi are not affected by the variable assignments in subfunction fj and vice versa. We conjecture that this localization of interactions results in there being a larger quantity of local optima in Adjacent NK Landscapes.

Table 2:
Average number of local optima for the Adjacent NKq Landscape with (rounded to an integer value).
   K = 3   K = 4   K = 5   K = 6   K = 7   K = 8
N = 20 277 654 1243 2204 3559 5084 
N = 25 1097 3174 7724 14793 26326 42064 
N = 30 4557 16232 46806 106387 201052 357121 
N = 35 15951 84447 270426 716400 1539169 3025417 
N = 40 71018 402816 1691727 4967117 11858318 25869554 
N = 45 290492 2114984 9996098 33634633 91777539 N/A 
N = 50 1125154 10296342 60214183 235535065 N/A N/A 
N = 55 4949555 53341143 332894062 N/A N/A N/A 
N = 60 21185775 N/A N/A N/A N/A N/A 
   K = 3   K = 4   K = 5   K = 6   K = 7   K = 8
N = 20 277 654 1243 2204 3559 5084 
N = 25 1097 3174 7724 14793 26326 42064 
N = 30 4557 16232 46806 106387 201052 357121 
N = 35 15951 84447 270426 716400 1539169 3025417 
N = 40 71018 402816 1691727 4967117 11858318 25869554 
N = 45 290492 2114984 9996098 33634633 91777539 N/A 
N = 50 1125154 10296342 60214183 235535065 N/A N/A 
N = 55 4949555 53341143 332894062 N/A N/A N/A 
N = 60 21185775 N/A N/A N/A N/A N/A 
Table 3:
Average number of local optima for the Random NKq Landscape with (rounded to an integer value).
  K = 3  K = 4  K = 5  K = 6  K = 7  K = 8
N = 20 181 409 875 1519 2490 3758 
N = 25 668 1980 4708 9006 16200 27336 
N = 30 2359 8981 23096 55291 112701 201899 
N = 35 9386 37258 N/A N/A N/A N/A 
N = 40 31829 160539 N/A N/A N/A N/A 
  K = 3  K = 4  K = 5  K = 6  K = 7  K = 8
N = 20 181 409 875 1519 2490 3758 
N = 25 668 1980 4708 9006 16200 27336 
N = 30 2359 8981 23096 55291 112701 201899 
N = 35 9386 37258 N/A N/A N/A N/A 
N = 40 31829 160539 N/A N/A N/A N/A 

The second thing that is notable is that Goldman’s algorithm is able to enumerate larger Adjacent NK Landscapes for both N and K compared to the Random NK Landscapes. In fact, for Goldman’s algorithm does not appear to be dramatically better than brute force enumeration of the search space for Random NK Landscapes. This again points to fundamental differences between Adjacent NK Landscapes and Random NK Landscapes.

Clearly, Random NK landscapes and Adjacent NK Landscapes are very different problems. One is NP-hard, the other is a closed problem with a polynomial time solution for bounded k. The fact that Adjacent NK Landscapes display more local optima than Random NK Landscapes implies that Adjacent NK Landscapes are in some sense more “rugged” than Random NK Landscapes. Ironically, this also suggests that Adjacent NK Landscapes may actually be more difficult than Random NK Landscapes for algorithms that only rely on hill climbing, such as simple local search with restarts or a simple (1 + 1) Evolution Strategy. This is surprising given that from a problem complexity point of view, these are extremely different problems.

We next turn to the issue of finding problem structure in variants of MAX-SAT problems, and look at new ways of representing MAX-SAT problems as Mk Landscapes.

6  MAX-kSAT, Mk Landscapes, and Problem Structure

By definition, MAX-kSAT problems that are in Conjunctive Normal Form (CNF) are also Mk Landscapes. In this case, each CNF clause is a subfunction and the evaluation function f sums over the (0 or 1) evaluation of each individual clause. However, there are other compact representations for SAT that are also Mk Landscapes that might more concisely capture variable interactions.

When unrestricted SAT expressions are converted into CNF-SAT, the SAT formula is broken into groups of k or fewer variables where each group of variables is represented in an intermediate form such as the following expression:
formula
This intermediate form can be directly converted into CNF-SAT. Following the DIMACS format, we will represent variables by their integer index, and a negative variable (e.g., −38) indicates a negated variable. The CNF3SAT expression for is as follows:
formula
However, we can also directly convert the intermediate form into a Disjunctive Normal Form (DNF) formula:
formula
Disjunctive Normal Form formulas are awkward, however, because one cannot use logical “and” to sum over and combine DNF formulas. Still there are distinct advantages to DNF formulas: the formulas indicate exactly which combinations of variables must be satisfied in order to achieve an overall satisfying assignment. As a compromise, we will introduce “DNF Functions.”
Definition 8.

A DNF function is a Boolean function over k variables, , which evaluates a DNF formula for the same k variables. For any given assignment of variables, the DNF function returns true if and only if is true.

In effect, the DNF function is just a truth table. We will represent the output of a DNF function as a binary vector of the evaluation of the truth table. For example,
formula
where the binary vector indicates exactly which variable assignments will satisfy the DNF function. To improve readability, we will represent a DNF function and the co-domain vector in the following form
formula
where the co-domain vector is a binary string broken into blocks of four bits.
To evaluate the DNF function, assignments to the variables are converted into an integer from 0 to and the integer then indexes the corresponding truth value from the co-domain vector. Thus, if then F(0,1,1) = F(3) = 1. The positions 0, 3, 5, 7 in the co-domain vector directly correspond to the DNF clauses in the DNF formula:
formula

Every DNF function must be true for the overall SAT instance to evaluate as true. Thus, a constraint that must hold for one DNF function is also a constraint over the entire SAT instance. We can easily “and” over DNF functions, or we can sum DNF functions to create a form of MAX-SAT.

6.1  Structure in Real-World Problems

By examining many large CNF-SAT applications from recent SAT competitions, we have discovered that these problems display “natural blocks.” A“natural block” is a set of clauses that are generated consecutively (by algorithms that convert SAT into CNF) and which utilize the same variables.

The following is an example of a “natural block” of CNF clauses:
formula
This “natural block” can be converted into the following DNF function with variables:
formula

Figure 6 is the DNF function representation of an actual CNF-SAT problem.2 This SAT problem is a prime factorization problem from the Tough SAT Project website (https://toughsat.appspot.com). The generator asks for two prime numbers as input. When two primes are multiplied together it results in a number that is a semiprime. Large semiprimes result in some of the most difficult factorization problems. Many encryption methods rely on the fact that factoring large composite integers is difficult. For example, the RSA public-key cryptosystem uses a large semiprime as part of the encryption.

Figure 6:

The prime factoring SAT problem after converting into DNF functions; the factors are 13 and 23. The vector V holds variables where the truth assignment is already constrained after exactly one step of unit propagation.

Figure 6:

The prime factoring SAT problem after converting into DNF functions; the factors are 13 and 23. The vector V holds variables where the truth assignment is already constrained after exactly one step of unit propagation.

The prime factoring problem in Figure 6 used the factors 13 and 23. This resulted in a CNF-SAT instance with 83 variables and 369 clauses. We have done nothing to simplify this problem at this point. Only one step of unit propagation was applied. By combining the “natural blocks” of adjacent clauses, the 83 variables and 369 clauses can be compressed into 36 DNF functions, which is a 10-fold compression of the representation. All of the DNF functions have either 4 or 5 variables. What is even more surprising is that all of the functions are one of only 5 types, which means that all of the co-domain vectors are of just 5 types. Most of these are of just 3 types: F2, F3, and F4.

These same DNF functions reoccur across other prime factoring problems of different sizes. For the problem with factors 15485863 and 15485867 the corresponding CNF-SAT instance generated by Tough SAT has 1824 variables with 9673 clauses after one step of unit propagation. When converted into DNF functions, there are 526 of type F2 (accounting for 7364 clauses), 68 of type F3 (accounting for 544 clauses), and 288 of type F4 (accounting for 1728). This means that over 99 percent of all clauses (9636/9673) can be assigned to DNF functions of type F2, F3, or F4.

Our preliminary findings also indicate this kind of symmetry and repetition is extremely common in very large real world applications with millions of variables. There is a very powerful take-home message here: Real problems are not trivial, but they are also not random. There is structure, and usually that structure has modularity and repetition. Some variable interactions are very localized interactions; other variable interactions are more global. If we really want to solve real problems more effectively, we need to break open the Black Box and examine problem structure.

7  Conclusions

This article has introduced Mk Landscapes as a way of unifying what we know about NK Landscapes and other k bounded pseudo-Boolean problems such as MAX-kSAT. The concept of “Tree Decomposition Mk Landscapes” was introduced. These problems are solvable in polynomial time using dynamic programming. Separable Mk Landscape instances are also trivial to solve. The article has also shown that if any Mk Landscape is not deceptive, that problem can also be solved in polynomial time.

Insights about Mk Landscapes that have a polynomial time solution probably tell us nothing about solving difficult Mk Landscape problems that are NP-hard. However, studying random NK Landscapes and random MAX-kSAT problems may be equally misleading. Random problems do not have the kind of structure and repetition that is found in real-world applications.

We also noted that the set of NK Landscapes where are no longer in the complexity class P or NP because on average these problems no longer have a compressible representation and cannot be transformed into SAT in polynomial time. Problems can simply be too random to be useful for evaluating search algorithms.

An overarching theme that emerges from this work is that exploiting problem structure is critical to solving real world problems. The concept of Gray Box Optimization is introduced to allow the utilization of more powerful and more realistic optimization tools. If we know that a problem is an Mk Landscape, we can use a number of tools to efficiently solve trivial problems that are not NP-hard. Trivial problems such as ONEMAX and “Leading Ones” can be solved in 1 evaluation in time under Gray Box Optimization. For Mk Landscapes, there is no need to use a random mutation operator; the location of improving moves can be efficiently computed.

In general, an algorithm with access to more information about problem structure has major advantages over a blind Black Box Optimizer. The best way to make real progress on real applications is to focus on exploiting problem structure, and that requires the use of Gray Box Optimization.

References

Ansótegui
,
C.
,
Giráldex-Cru
,
J.
, and
Levy
,
J.
(
2012
).
The community structure of SAT formulas
.
Theory and Applications of Satisfiability Testing, SAT 2012
, pp. 
410
423
.
Boros
,
E.
, and
Hammer
,
P
. (
2002
).
Pseudo-Boolean optimization
.
Discrete Applied Mathematics
,
123
(
1
):
155
225
.
Cheeseman
,
P.
,
Kanefsky
,
B.
, and
Taylor
,
W
. (
1991
).
Where the really hard problems are
. In
International Joint Conference on Artificial Intelligence (IJCAI)
, pp. 
331
337
.
Chicano
,
F.
,
Whitley
,
D.
, and
Sutton
,
A
. (
2014
).
Efficient identification of improving moves in a ball for pseudo-Boolean problems
. In
Genetic and Evolutionary Computation Conference
, pp. 
437
444
.
Cormen
,
T.
,
Leiserson
,
C.
,
Rivesti
,
R.
, and
Stein
,
C
. (
1990
).
Introduction to algorithms, 2nd edition
.
New York
:
McGraw-Hill
.
Crama
,
Y.
,
Hansen
,
P.
, and
Jaumard
,
B
. (
1990
).
The basic algorithm for pseudo-Boolean programming revisited
.
Discrete Applied Mathematics
,
29
(
23
):
171
185
.
Deb
,
K.
, and
Goldberg
,
D
. (
1993
).
Analyzing deception in trap functions
. In
Foundations of genetic algorithms 2, (FOGA 2)
, pp. 
93
108
.
Een
,
N.
, and
Sorensson
,
N.
(
2015
).
The MiniSat page
. http://www.minisat.se
Frank
,
J.
,
Cheeseman
,
P.
, and
Stutz
,
J.
(
1997
).
When gravity fails: Local search topology
.
Journal of Artificial Intelligence Research
,
7:249
281
.
Gao
,
Y.
, and
Culberson
,
J
. (
2003
).
On the tree width of NK landscapes
. In
Genetic and Evolutionary Computation Conference GECCO 2003
, pp. 
948
954
.
Geard
,
N.
,
Wiles
,
J.
,
Hallinan
,
J.
,
Tonkes
,
B.
, and
Skellett
,
B
. (
2002
).
A comparison of neutral landscapes: Nk, nkp and nkq
. In
The IEEE Congress on Evolutionary Computation (CEC 2002)
, pp. 
205
210
.
Goldberg
,
D.
(
1989a
).
Genetic algorithms and Walsh functions: Part I, A gentle introduction
.
Complex Systems
,
3:129
152
.
Goldberg
,
D
. (
1989b
).
Genetic algorithms in search, optimization and machine learning
.
Reading, MA
:
Addison-Wesley
.
Goldman
,
B. W.
, and
Punch
,
W. F
. (
2016
).
Hyperplane elimination for quickly enumerating local optima
. In
European Conference on Evolutionary Computation in Combinatorial Optimisation
, pp. 
154
169
.
Hammer
,
P. L.
,
Rosenberg
,
I.
, and
Rudeanu
,
S.
(
1963
).
On the determination of the minima of pseudo-Boolean functions
.
Studii si cercetari matematice
,
14:359
364
.
Hampson
,
S.
, and
Kibler
,
D
. (
1993
).
Plateaus and plateau search in Boolean satisfiability problems: When to give up searching and start again
. In
Cliques, Coloring and Satisfiability. American Mathematical Society, 1996. Proceedings of the Second DIMACS Implementation Challenge
, pp. 
437
456
.
Håstad
,
J
. (
1997
).
Some optimal inapproximability results
. In
Proceedings of the 29th ACM Symposium on Theory of Computation
, pp. 
1
10
.
Heckendorn
,
R. B
. (
2002
).
Embedded landscapes
.
Evolutionary Computation
,
10
(
4
):
345
369
.
Holland
,
J
. (
1975
).
Adaptation in natural and artificial systems
.
Ann Arbor, MI
:
University of Michigan Press
.
Holland
,
J
. (
2000
).
Building blocks, cohort genetic algorithms and hyperplane defined functions
.
Evolutionary Computation
,
8
(
4
):
373
391
.
Hoos
,
H. H.
, and
Stützle
,
T
. (
2005
).
Stochastic local search: Foundations and applications
.
San Francisco
:
Morgan Kaufmann
.
Jansen
,
T
. (
2013
).
Analyzing evolutionary algorithms
.
Berlin
:
Springer
.
Kauffman
,
S.
, and
Levin
,
S.
(
1987
).
Towards a general theory of adaptive walks on rugged landscapes
.
Journal of Theoretical Biology
,
128:11
45
.
Kautz
,
H.
,
Sabharwal
,
A.
, and
Selman
,
B.
(
2008
).
Incomplete algorithms
. In
A.
Biere
,
M.
Heule
,
H. van
Maaren
, and
T.
Walsch
(Eds.),
Handbook of satisfiability
, pp. 
185
203
.
Amsterdam
:
IOS Press
.
Kloks
,
T
. (
1994
).
Treewidth: Computations and approximations
.
Berlin
:
Springer Verlag
.
Mitchell
,
D.
,
Selman
,
B.
, and
Levesque
,
H
. (
1992
).
Hard and easy distributions of SAT problems
. In
Proceedings of the 10th National Conference on Artificial Intelligence
, pp. 
459
465
.
Ochoa
,
G.
,
Verel
,
S.
, and
Tomassini
,
M
. (
2010
).
First-improvement vs. best-improvement local optima networks of nk landscapes
. In
Parallel Problem Solving from Nature, (PPSN XI)
, pp. 
104
113
.
Ochoa
,
G.
,
Whitley
,
D.
,
Tinós
,
R.
, and
Chicano
,
F
. (
2015
).
Tunneling local optima networks
. In
Genetic and Evolutionary Computation Conference GECCO-2015
, pp. 
449
456
.
Rana
,
S.
,
Heckendorn
,
R. B.
, and
Whitley
,
L. D
. (
1998
).
A tractable Walsh analysis of SAT and its implications for genetic algorithms
. In
Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI’98)
, pp. 
392
397
.
Rana
,
S.
, and
Whitley
,
D
. (
1998a
).
Genetic algorithm behavior in the MAXSAT domain
. In
Proceedings of Parallel Problem Solving from Nature V
, pp. 
785
794
.
Rana
,
S.
, and
Whitley
,
D
. (
1998b
).
Search, binary representations, and counting optima
. In
Institute for Mathematics and its Applications Workshop on Evolutionary Algorithms
, pp. 
177
189
.
Schumacher
,
C.
,
Vose
,
M.
, and
Whitley
,
L
. (
2001
).
The no free lunch and problem description length
. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001)
, pp. 
565
570
.
Sutton
,
A.
,
Howe
,
A.
, and
Whitley
,
D
. (
2009
).
A theoretical analysis of the k-satisfiability search space
. In
Engineering Stochastic Local Search Algorithms
, pp. 
46
60
.
Tinós
,
R.
,
Whitley
,
D.
, and
Chicano
,
F.
(
2015
).
Partition crossover for pseudo-Boolean optimization
. In
Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII
,
FOGA
15
, pp. 
137
149
.
Weinberger
,
E.
(
1996
).
Np completeness of Kauffman’s nk model, a tuneably rugged fitness landscape
.
Santa Fe Institute Working Paper: 1996-02-003
.
Whitley
,
D
. (
2015
).
Mk landscapes (nk landscapes and max-ksat): A proof that the only challenging problems are deceptive
. In
Genetic and Evolutionary Computation Conference GECCO-2015
.
Whitley
,
D.
, and
Chen
,
W
. (
2012
).
Constant time steepest descent local search with lookahead for nk-landscapes and max-ksat
. In
Genetic and Evolutionary Computation Conference GECCO-2012
, pp. 
1357
1364
.
Whitley
,
D.
, and
Das
,
R
. (
1991
).
The only challenging problems are deceptive: Global search by order-1 hyperplane sampling
. In
International Conference on Genetic Algorithms
, pp. 
166
173
.
Whitley
,
D.
,
Howe
,
A.
, and
Hains
,
D
. (
2013
).
Greedy or not? Best improving versus first improving stochastic local search for maxsat
. In
Proceedings of AAAI-2013
, pp. 
940
946
.
Wright
,
A. H.
,
Thompson
,
R. K.
, and
Zhang
,
J
. (
2000
).
The computational complexity of N-K fitness functions
.
IEEE Transactions on Evolutionary Computation
,
4
(
4
):
373
379
.
Young
,
A.
(
1984
).
Spin glasses
.
Journal of Statistical Physics
,
34:871
881
.

Appendix  Correctness of the Efficient Local Optima Enumeration Algorithm

In this appendix we prove that Algorithm 1 correctly enumerates all the local optimum of an Mk Landscape.

Remark 1.

For convenience, and abusing of notation, we will use “solution” to refer to the Boolean vector and the value of the binary word it represents. That is, we will use it as the vector or the value . The meaning of will be clear from the context.

Remark 2.

The score associated with each move depends on a set of variables. This set is defined as all variables in the move itself as well as any variables adjacent to those in the Variable Interaction Graph. Position i of counts how many moves that depend only on variable i and higher are currently fitness improvements. As a consequence, flipping i can affect only scores stored in with . It does not affect the scores stored in with .

The proof of correctness is based in the next two lemmas.

Lemma 3.

Let be the predicate: ; it holds . Then, holds after any line in the outer loop of Algorithm 1.

Proof.

Observe that is always true for because the domain of the quantifier is empty in that case. We will analyze the base case and the induction case.

  • Base case. is obviously true the first time the flow reaches Line 3, since solution is initialized with all zeroes and it has not been changed after Line 1.

  • Induction case. Let us assume that is true any time the flow reaches Line 3 and let us prove that it will remain true after executing the body of the loop. The first inner loop on Line 4 can only reduce index, thus, is obviously true after it terminates.If the program flow does not enter the conditional statement of Line 6, the predicate remains true. If it enters the conditional statement, it will be true because solution is unchanged and index will be 0 (and is always true).The loop starting on Line 9 can increase the value of index. However, observe that it increases the value of index by 1 only in the cases where and after flipping the value of (from 1 to 0). This means that at any time inside the loop it holds . It is clear that this loop finishes (can be proved using index as a counter) and the predicate holds after the loop.The conditional statement of Line 12 does not affect the truth value of because it flips (at most) , which does not affect the value of the predicate .Finally, the condition of the outer loop does not have any side effect and does not change the value of . As a consequence, we can claim that is a loop invariant.3

Lemma 4.

Let Q be the predicate: and found contains all the local optima from to , or and found contains all the local optima of the search space. The predicate Q is a loop invariant of the outer loop.

Proof.

We will analyze the base case and the induction case.

  • Base case. When the flow reaches Line 3 for the first time, , solution is 0 and found is empty, so the predicate trivially holds.

  • Induction case. Let us prove that if Q is true at the beginning of the loop, it will be true at the end.The loop of Line 4 can only reduce the value of index and cannot change solution. Thus, Q is true after its completion. Now we distinguish two cases depending on the value of index after this loop.

    • If index is , the current solution is a local optimum as no move is fitness improving. The conditional of Line 6 will add this solution to the found set and Q is not true (since found contains all the local optima up to solution, instead of ). In that case, the remaining lines of the outer loop will increase the value of solution by 1 exactly if (they implement a binary adder). Thus, at the end of the outer loop, Q is again true in this scenario. It could happen that and the condition in Line 12 is not true because . In this case, since found contained all the local optima in the search space from 0 to and the last local optima (in ) has been added, we can claim that all the local optima of the search space are in found, and Q is true.

    • If index is a non-negative value, the current solution is not a local optimum. Furthermore, there is no solution in the search space that is a local optima and has the same values as solution in the positions index or higher. The reason is that there is a move counted in move_bin[index] that prevents the current solution from being locally optimal and depends on variables that are index or higher (see Remark 16). If these variables do not change, the score will not change and the solution cannot be locally optimal. Thus, we can safely discard (not explore) all the solutions having suffix and prefix from to . Fortunately, Lemma 17 claims that is true at this point of the program, which means that all the bits in solution from 0 to are zero. Thus, in order to discard the previous range of non-locally-optimal solutions it is only necessary to add to solution. This is precisely what Line 9 and after do. In Line 12 we have again two cases depending on the condition . If we can claim that all the local optima up to have been identified and stored in found (since we “jumped” through non-locally-optimal solutions). If we can claim that we already explored the whole search space. Thus, Q is true.

    Finally, since the condition of the outer loop does not have any side effect, Q will be true at the beginning of the next iteration of the outer loop.

Theorem 5.

Algorithm 1 finishes in finite time and stores in found all the local optima of the search space.

Proof.

We can prove that found contains all the local optima at the end of the outer loop just using the invariant Q proved in Lemma 18 and requiring that (negation of the loop condition).

In order to prove that the procedure finishes, we first show that the two inner loops finish. The loop on Line 4 decreases index every iteration and terminates when . The outer loop ensures at the start of this loop, meaning it iterates at most n times. The loop on Line 9 increases index every iteration and terminates when . As index starts as a non-negative number, this loop terminates after at most n iterations.

We will use the next counter function to show that outer loop terminates.
formula
5

Since , it holds . We can also prove that at the beginning of the outer loop. The inequality holds because in that case. Now, we will prove that the counter increases by at least one unit in each iteration of the outer loop. We just need to observe that at Line 9, since the loop on Line 4 can only decrease the value of index and if it is Line 6 will change it to 0. The remaining lines add to solution. Thus, if in Line 12, then solution will be increased by , and the counter will increase its value in the same amount (since ). If in Line 12, then the counter takes value and since (for any solution), there is an increase by at least 1 in the counter.

Since the counter increases by at least one unit in each iteration, it eventually reaches (which means ) and the loop condition will be false, so the algorithm will finish.

Notes

1

In the algorithm we assume that the first variable is indexed by 0, in contrast with the rest of the article, where the first variable is x1.

2

As with standard DIMACS notation for clauses, we don’t really need to name different functions, for example, F1, F2, but the labels in Figure 6 will aid the reader.

3

It is an invariant of the outer loop and both inner loops.