Abstract

The frequency distribution of a fitness function over regions of its domain is an important quantity for understanding the behavior of algorithms that employ randomized sampling to search the function. In general, exactly characterizing this distribution is at least as hard as the search problem, since the solutions typically live in the tails of the distribution. However, in some cases it is possible to efficiently retrieve a collection of quantities (called moments) that describe the distribution. In this paper, we consider functions of bounded epistasis that are defined over length-n strings from a finite alphabet of cardinality q. Many problems in combinatorial optimization can be specified as search problems over functions of this type. Employing Fourier analysis of functions over finite groups, we derive an efficient method for computing the exact moments of the frequency distribution of fitness functions over Hamming regions of the q-ary hypercube. We then use this approach to derive equations that describe the expected fitness of the offspring of any point undergoing uniform mutation. The results we present provide insight into the statistical structure of the fitness function for a number of combinatorial problems. For the graph coloring problem, we apply our results to efficiently compute the average number of constraint violations that lie within a certain number of steps of any coloring. We derive an expression for the mutation rate that maximizes the expected fitness of an offspring at each fitness level. We also apply the results to the slightly more complex frequency assignment problem, a relevant application in the domain of the telecommunications industry. As with the graph coloring problem, we provide formulas for the average value of the fitness function in Hamming regions around a solution and the expectation-optimal mutation rate.

1  Introduction

Many problems in combinatorial optimization, such as graph coloring, register allocation, and frequency assignment problems, can be characterized as a search for the extremum of a real-valued function over the set of length-n strings over a q-ary alphabet. Stochastic local search algorithms and mutation-based evolutionary algorithms for such problems can be viewed as processes that perform a biased random walk on the q-ary hypercube graph. Characterizing the frequency distribution of fitness values over regions of this graph is thus an important part of understanding the behavior of such algorithms and the influence of problem structure on this behavior.

In this paper we derive a method for the efficient calculation of exact statistical moments of the frequency distribution of fitness function values over Hamming regions of the q-ary hypercube provided that the function is bounded epistatically by a constant. The retrieval of these moments provides insight into the local structure of the search space and can be used to describe the fitness of states that lie within a certain distance of any given state. Our aim is to provide a framework for studying the statistical structure of local regions of the search space for a broad class of combinatorial optimization problems. This framework is directly relevant to the theory of randomized search heuristics such as evolutionary algorithms, simulated annealing, and variants of randomized local search (hill-climbing). For the graph coloring problem and the frequency assignment problem, we derive specific equations that model the fitness statistics of states sampled by the move and mutation operators of such algorithms.

We present a direct computation of the exact statistics (e.g., mean, variance, and skew) of the distribution of fitness values over regions of the search space for problems over strings. As long as the epistasis of the fitness function is bounded by a constant, our method has polynomial complexity even when the cardinality of the region grows as an exponential in n. Subsequently, we use this result to determine an exact calculation of the expected fitness of an offspring parameterized by mutation rate from any given point. Aside from working out the statistics of the fitness of an offspring of a state undergoing mutation in an evolutionary algorithm, it also allows one to exactly compute the rate that results in the maximal expected fitness at any point. We see that the resulting expression illuminates the behavior of mutation on combinatorial problems such as graph coloring and the frequency assignment problem. Currently, our results are mainly of theoretical interest; however, we also discuss some different ways they could eventually be used in practice to design more principled algorithms.

This paper generalizes the work of Sutton et al. (2012) who presented a polynomial-time algorithm for computing the moments of pseudo-Boolean functions over regions in the Boolean hypercube, that is, for q=2. We will generalize this result to complex, epistatically bounded functions over strings from a finite alphabet of q elements where q is O(1) and the corresponding adjacency is the generalized Hamming graph (i.e., the q-ary hypercube). This paper also generalizes a result of Heckendorn (2002) who derived a polynomial-time algorithm for obtaining the summary statistics of the fitness function distribution over the entire search space when q=2.

The remainder of the paper is organized as follows. In the next section, we introduce some technical preliminaries and show that fitness functions over q-ary strings can be written in a Fourier function basis expansion. In Section 3, we relate this Fourier analysis to the Hamming adjacency by proving that the Fourier basis functions are eigenfunctions of generalized Hamming adjacency matrices. This allows us to closely study the statistical relationship between the fitness function and Hamming regions in the q-ary hypercube. In Section 4 we apply these results to obtain an efficient algorithm for computing moments of the fitness distribution over arbitrary Hamming regions. We then turn our attention to uniform mutation in the q-ary hypercube. Specifically, in Section 5 we apply our results to derive equations that express the expected fitness of an offspring of an arbitrary state undergoing mutation as a function of rate. We then show how the results introduced in this paper can be immediately applied to two specific combinatorial problems, graph coloring and the frequency assignment problem in Sections 6 and 7, respectively. Finally, in Section 8 we conclude the paper.

2  Fourier Analysis of Fitness Functions over Σqn

Let denote an alphabet of cardinality q. We are interested in functions over the set of length-n strings over . Combinatorial optimization problems typically have a real-valued fitness function. We will be slightly more general and work with complex functions over . This makes the analysis somewhat more natural.

Let f be a complex function over . Without loss of generality, we associate the symbols of the alphabet with an element of : the cyclic group of integers under addition modulo q. We can then appeal to tools from group theory to express f in its group-theoretic Fourier expansion. Let G be the finite Abelian group
1
where addition is componentwise. This is the n-way direct product1 of cyclic groups of order q. Since we have associated the symbols of the q-ary alphabet with the elements of the cyclic group (which have additional algebraic structure), we can, without loss of generality, express f as a function over G. The set of all complex functions is an inner product space with operation
where the overline denotes complex conjugation. This inner product space has a complete orthonormal Fourier basis given by the set of functions , where
The notation x[i] denotes the ith element of the vector . When q=2, this is equivalent to the Walsh basis expansion (Rana et al., 1998; Heckendorn, 1999; Sutton et al., 2012). In that particular case, the functions are real-valued (since the 2s cancel and ). However, when q>2, the Fourier functions become complex.
For simplicity, we will express the above complex exponential function as a primitive qth root of unity. A complex number z is a qth root of unity when z satisfies the equation zq=1. Furthermore, a qth root of unity is called primitive when there is no k, 0<k<q such that zk=1. In this paper, we will often appeal to the following simple fact entailed by the properties of a geometric series: if z is a qth root of unity for q>1, then
2
We define
Clearly is a primitive qth root of unity. The Fourier functions can thus be expressed in terms of this primitive root of unity:

2.1  Some Useful Properties of Fourier Analysis

As we have already seen, any complex function belongs to the inner product space of functions spanned by the orthogonal Fourier basis; we can write f in its Fourier basis expansion
3
Furthermore, since is also an element of the space of complex functions over G, it is straightforward to show that
4
One useful property is that, as in the special case of the Walsh coefficients, the zeroth Fourier function is constant. Denote as the function corresponding to the zero element . Then
5
This implies that z0 is the average value of f over G:
6
We also have the following orthogonality property.
7
This is easy to prove by straightforward manipulation, for example, see Terras (1999, p. 169).

In this paper, we will often take advantage of the Fourier spectral properties of fitness functions that have bounded epistasis. We now make this concept rigorous.

Definition:

A function is k if it can be expressed as the sum of functions that each depend on at most k positions of their argument.

Epistatically bounded fitness functions arise from combinatorial problems that are structures built from components of bounded size. For example, the fitness function for many graph-theoretic problems can be expressed as the sum of subfunctions over each edge.

In the lemma that follows, we will make brief use of the following concept of a sequence projection. We will denote by |a| the number of nonzero elements of .

Definition:
The projection of a length-n sequence x onto a length-n sequence b is the length-|b| sequence
where ij is the position of the jth nonzero element of b.

This definition facilitates the following proof that if the kernel of a function2 contains the kernel of a sequence projection, then that function is orthogonal to certain Fourier functions.

Lemma 1:
Let . Suppose is a function such that, for all ,
Then we have
Proof:
By definition of the inner product,
Since |a|>|b| there must be some element u such that and b[u]=0. We can rewrite the inner product as
8
Let such that x and y only differ in position u. Since b[u]=0, we have projb(x)=projb(y) and, by supposition, gb(x)=gb(y). It directly follows that for any ,
We can use this property to rewrite the sum in Equation (8) over gb(x) so it no longer depends on the index j,
The inner product vanishes by Equation (2) and the fact that is a qth root of unity since .

Since epistatically bounded functions are composed of precisely the type of function in the statement of Lemma 1, we may use its result to prove the following theorem, which imposes a constraint on the location of the nonzero Fourier coefficients of any function over G with bounded epistasis.

Theorem 1:
Suppose f is epistatically bounded by k. Let and let za be the Fourier coefficient corresponding to a. We have
Proof:
We have
where g(i) depends on at most k positions. Rewriting,
The final equality follows from Lemma 1.

Epistatically bounded functions thus have sparse representations in the Fourier basis. Indeed, we can immediately bound the number of nonzero coefficients for an epistatically bounded function.

Corollary 1 (to Theorem 1):

If f is epistatically bounded by k, the number of nonzero Fourier coefficients for f is bounded from above by O(nk(q−1)k). Moreover, if f can be written as the sum of m=o(nk) functions that each depend on at most k positions, the count of nonzero Fourier coefficients for f is bounded from above by O(mqk).

3  Hamming Regions in the q-ary Hypercube

Randomized local search and many evolutionary algorithms employ elementary moves or mutation operators to locally sample the set of states by performing small perturbations on current states to construct similar neighboring states. For a given current state x, the neighborhood of x is defined as N(x), which is the set of states reachable from x by a single move or mutation.

Associated with N is the adjacency matrix: a square matrix with each row (and each column) corresponding to a unique state where
In the case where the state set is length-n strings over a q-ary alphabet, is a matrix. Though it is always possible to express N in terms of such a matrix, the analysis can become unwieldy in general because the size of the matrix grows as an exponential in n. However, since we have characterized the state space as a group G, where G is defined in Equation (1), it is possible to use a more compact, group theoretic specification of N. In particular, we can define some neighborhoods N using a fixed set of d unique group elements where . In this case, for any ,
The underlying neighborhood graph is isomorphic to the Cayley graph of the subgroup of G generated by S. Indeed, if S generates G, then the underlying neighborhood graph is connected.
The adjacency matrix is a matrix and can be viewed as a linear operator that acts on the function space . In particular, let g be any complex function over G. The image of g under , denoted , is also a complex function over G where
9
When the neighborhood operator can be described in terms of a fixed set S of group elements, we can make some statements about the Fourier functions and their relationship to the adjacency matrix. We say a function is an eigenfunction of the adjacency matrix when the following equation holds
where is a scalar value. This simply means that the function obtained by applying the linear operator to g evaluated at state x is directly proportional to g evaluated at x with proportionality constant . We say that a function is elementary in the context of the adjacency matrix when we can find a constant value b such that the new function fb is an eigenfunction of (Reidys and Stadler, 2002). We are now ready to prove the following.
Lemma 2:
If the neighborhood operator N is constructed by a fixed set S of group elements from G, then the Fourier function is an eigenfunction of corresponding to the eigenvalue
Proof:
Consider the matrix operating on . We have
where is as claimed.

Thus, for any neighborhood defined by a fixed set S of group operations, the Fourier basis functions are eigenfunctions of the adjacency that arises from that neighborhood.

We define the Hamming distance as
10
In the case of strings over a finite alphabet, the most natural mutation operator is single-element changes of strings. The underlying connectivity of this mutation operator is captured by the n-dimensional q-ary hypercube (or Hamming graph) H(n, q). In particular, the vertices of H(n, q) correspond to the qn strings of , and two vertices and are adjacent if and only if their Hamming distance is one. For a simple example of H(2, 3): the Hamming graph over , see Figure 1. For the remainder of this paper, we will focus exclusively on this adjacency.
Figure 1:

H(2, 3): the two-dimensional tertiary hypercube graph (Hamming graph) over .

Figure 1:

H(2, 3): the two-dimensional tertiary hypercube graph (Hamming graph) over .

The Hamming adjacency satisfies the antecedent in the claim of Lemma 2 since the single-change neighborhood operator can be constructed by a fixed set S of (q−1)n group elements (in this case, S is also a generating set for G):
Hence, the claim of Lemma 2 holds for the Hamming adjacency. In particular, letting expressly refer to the Hamming adjacency matrix, the Fourier function is an eigenfunction of . To compute the corresponding eigenvalue, note that in the expression
the s[i] is nonzero for exactly one i, and each is represented exactly once, so we can rewrite this as
11
Since is a primitive qth root of unity, we have the following property.
12
This again follows from the property expressed in Equation (2). If a[i] is nonzero, then is also a qth root of unity and the summation on the RHS of Equation (12) evaluates to zero. Otherwise, and that summation evaluates to q. Putting this together with Equation (11) yields the following expression for the eigenvalue corresponding to . Then
13

We define the Hamming sphere of radius r around a state as the set of states , and the Hamming ball of radius r around a state as the set of states . In these cases, we refer to x as the centroid of the sphere (ball). A set is called a Hamming region if it is a Hamming sphere (ball) for some centroid and radius.

Consider the generalized adjacency defined as
This defines a generalized r-change neighborhood, that is, the neighborhood defined by r-change mutations. In particular, the generalized r-change neighborhood around a state x is the Hamming sphere of radius r around x.
Theorem 2:
Let denote the radius-r adjacency in the q-ary hypercube. The Fourier function is an eigenfunction of corresponding to eigenvalue
Proof:
Let such that dH(x, y)=r. Then there is a group element s where |s|=r and y=x+s. Indeed, the fixed set of group elements that define this adjacency is the set . Hence, by Lemma 2, is an eigenfunction of corresponding to eigenvalue
We partition this sum by the indices of the nonzero elements of s. Denote as . Since, if s[i]=0, then , and so we can write

For a given , there are exactly ways to choose j indices ik where and ways to choose the remaining rj indices where . Summing over j and appealing to Equation (12) yields the claimed result.

Theorem 2 covers the special case of Equation (13) when r=1. The set of eigenvalues corresponds to the generalized Krawtchouk polynomials. These polynomials have applications in the theory of error correcting codes (MacWilliams and Sloane, 1977) in which generalized Hamming schemes are important.

From the theory of Krawtchouk polynomials (see, e.g., Chihara and Stanton, 1990), we obtain the following generating function for , which will be useful in subsequent sections.
14
Let denote the matrix in which element is 1 if and only if the Hamming distance between x and y is less than or equal to r and 0 otherwise. We will call the radius-r ball matrix because it encapsulates Hamming ball membership. Since the adjacencies are disjoint, this matrix can be written as a sum of adjacency matrices as follows.
15
In Theorem 2 we proved that the Fourier function is an eigenfunction of the generalized Hamming adjacency matrix with eigenvalue . As a consequence, is also an eigenfunction of the ball matrix with eigenvalue
16
The eigenvalues and depend only on the number of nonzero elements of a, that is |a|. In subsequent sections, it will be convenient to write the eigenvalues only in terms of their order. When doing so, we will use square brackets in the subindex:
17
18

4  Moments of Epistatically Bounded Fitness Functions

From an analytical perspective, the distribution of fitness function codomain values over a Hamming region in the q-ary hypercube provides insight into how an evolutionary algorithm might behave in that region of the search space. Suppose is a Hamming region. Consider the random process of drawing a state uniformly at random from R. The resulting state can be characterized as a random variable over R, which we denote as XR. The fitness of this uniformly drawn state is measured by the random variable f(XR).

Because the states are sampled uniformly at random across R, the moments of the frequency distribution of f over R are precisely captured by the moments of the distribution of the random variable f(XR). For instance, the average fitness in R is simply
and the variance of fitness values in R is
The (complementary) cumulative distribution function of f(XR) provides an exact count of states that lie above a specific value. For instance, the expression
19
exactly counts the number of states in the region that have strictly greater fitness than a state x. Note that since R is a finite set, the distribution of f(XR) is necessarily discrete.

An exact computation of this distribution is not tractable in general, indeed the special case of q=2 is already NP-hard (Sutton et al., 2011a, Theorem 1). Clearly, we would expect it to be as hard as the corresponding search problem when R is a Hamming ball of radius . Thus, instead of a complete characterization, we are interested in efficiently computing exact quantities that describe the shape, location, and spread of the distribution of f(XR). Such quantities are called the moments of f(XR), which are equivalent to the moments of the frequency distribution of f over R. We now apply the foregoing results to describe how one can efficiently compute the first such moment, a quantity that captures the expectation of f(XR). This corresponds to the average codomain value of f over the region R.

For any random variable X over G and any complex function , the expectation of the image of X under g is given by
20
This allows us to derive a compact expression for the average value in the image of R under Fourier functions. In particular, we have the following lemma. Again, for a region R we denote XR as a random variable which samples states in R uniformly at random.
Lemma 3:
Let be the radius r sphere centered on an arbitrary state x. For any Fourier function , we have the following identity
whereis given by Theorem 2.
Proof:
By Equation (20),
Since any state y is drawn uniformly at random,
yielding
The latter equality holds by Equation (9). The claim then follows immediately from Theorem 2.

We can also prove the analogous result for an arbitrary Hamming ball.

Lemma 4:
Let be the radius r ball centered on an arbitrary state x. For any Fourier function , we have the following identity
Proof:

This follows directly from Lemma 3 and Equations (15) and (16).

4.1  The Average Fitness in a Hamming Region

Let R be a Hamming region (sphere or ball) of radius r for some and some . The average of f over R is equivalent to the expectation of the random variable f(XR) which measures the fitness of a state XR drawn uniformly at random from R.

For asymptotically large Hamming regions, that is, , the average fitness in R is intractable to compute directly. However, when f is epistatically bounded, we can apply the analysis presented in this paper to derive a compact expression to efficiently compute the exact average. Let be a ball of radius r around an arbitrary state x (the case is analogous for Hamming spheres).

We begin by rewriting f in its Fourier basis expansion, and employing linearity of expectation
and by Lemma 4,
21
Because f is supposed to have a sparse representation in the Fourier basis, the resulting sum is over only polynomially many terms, even when .
This equation provides one with a polynomial-time algorithm for computing the first moment of the fitness function over any Hamming ball in the q-ary cube. Let denote the set of nonzero Fourier coefficients for f. The average value of f over B can be computed by Algorithm 1.

The sum accumulated in s corresponds to the sum in Equation (21) and contains exactly |Z| terms. Since we have supposed f to be epistatically bounded by k, Corollary 1 holds and provides a bound on the cardinality of Z. Since the eigenvalues can be precomputed, the time complexity of FirstMoment is dominated by the loop through the Fourier coefficients and is hence bounded above by O(nk(q−1)k). In the case that f can be written as a sum over m=o(nk) functions that each depend on at most k positions, the bound can be improved to O(mqk).

In the case of a sphere of radius r, the same algorithm can be used changing the factor by , and changing the factor |b| by |S|.

4.2  Higher Moments

We now generalize the above results to calculate higher moments of the frequency distribution of f over Hamming regions. The first moment, for which we gave an efficient computation in Section 4.1, provides a measurement of the location of the frequency distribution of f over a Hamming region. The cth moment of the frequency distribution of f over a Hamming region R is defined as the expectation of the cth power of f:
Higher moments are called the parameters of a distribution and yield further insight into its spread and shape (cf. Rényi, 1970). Often, one is interested in the central moments of a distribution which center the measurement about the mean. The cth central moment can be obtained from the above expression by the transformation
For example, the variance, which describes how far the fitness values deviate from their expectation in R, is . When the cth central moment is normalized by the cth power of the standard deviation (i.e., the positive square root of the variance), the result is the cth standardized moment of the distribution. This quantity is a standard measurement of the shape of the distribution such as skewness (c=3) and kurtosis (c=4).

It is easy to show that if f has a sparse representation in the Fourier basis and c=O(1), so must f c.

Lemma 5:
For any ,
Proof:
The function product is
It then becomes straightforward to derive the Fourier series for f c.
The coefficient corresponding to is nonzero if and only if for . But is a Fourier coefficient for f. Since there are at most |Z| nonzero coefficients for f, there are at most |Z|c nonzero coefficients for f c. The cth moment of f over B is simply the first moment of f c over B. The cth moment can be computed by Algorithm 2.

The time complexity of Moment is dominated by the call to FirstMoment for function f c. Since f is epistatically bounded by k, it follows that f c is epistatically bounded by and the runtime is bounded by O(nck(q−1)ck). Again, if f can be expressed as a linear combination of m=o(nk) terms that depend on at most k positions, the bound is improved to O(mcqck). Since we have assumed c and k to be constants, the complexity is polynomial in n. However, we remark here that it is of course practically constrained to modest values of c.

5  Expected Fitness After a Mutation

The mutation operator has capital importance in evolutionary algorithms. Let us consider the following mutation operator in the q-ary hypercube. For each element x[i] in the solution vector, change its value with probability . Here, is typically called the mutation rate. The change can be made to any of the q−1 possible values that are different from x[i] with the same probability 1/(q−1). The mutation rate parameter is a quantity that describes how much an individual is perturbed to generate an offspring during search. One is often interested in deriving the mutation rate that maximizes the probability of generating a successful offspring, that is, one with strictly improving fitness. Such a quantity has an immediate use in runtime analysis, since it describes the success probability of a geometric random variable that measures the time until an improving move is taken.

The optimal mutation rate for general functions is currently not well-understood. For very simple linear OneMax functions over binary strings, the mutation rate was investigated by Bäck (1992). In practice, the commonly recommended mutation rate for functions over binary strings of length n is the so-called standard mutation rate of 1/n. In the case of linear functions over binary strings, Witt (2012) has proved that the optimal mutation rate is indeed 1/n, and even derived a bound of enln n+O(n) on the corresponding runtime of a simple evolutionary algorithm at this rate, which is tight up to lower-order terms.

The case of nonlinear functions is less clear. Jansen and Wegener (2000) introduce a function over binary strings on which the standard rate of 1/n is provably suboptimal because it takes an exponential number of steps to find a global optimum with high probability. Similarly, Böttcher et al. (2010) proved the standard 1/n mutation rate is suboptimal for the classical LeadingOnes function. They also introduce an adaptive mutation rate that depends on the current fitness and improves the optimization time by 12% over a nonadaptive rate. In the current paper, we find that the expectation-best mutation rate is a function of current solution.

Doerr et al. (2012) consider the class of strictly monotonic functions over binary strings and show that, for mutation rates of c/n, the choice of constant c has a strong effect on the optimization time. They present a particular monotonic function for which changing the mutation rate of the (1+1)-EA by a constant factor leads to an exponential performance gap. These results stress the importance of an understanding of the relationship between the mutation rate and the function being searched, and that the issue of how best to set the mutation rate for general optimization is far from resolved.

For epistatically bounded nonlinear functions over binary strings, Chicano and Alba (2011) showed that the expected fitness of the offspring is a polynomial in the mutation rate. Sutton et al. (2011b) also investigated some properties of this polynomial. In this section we extend these results to the q-ary hypercube.

Let be a random variable that takes the value in G after applying the mutation operator (with mutation rate ) to x. Then, is the probability of obtaining y after the mutation operator is applied to x. We are interested in computing the expected value of a function f over G after a mutation, that is, . The following theorem provides a first result.

Theorem 3:
For with we have
22
Proof:
The probability of jumping from x to y through a mutation is
where dH(x, y) is the Hamming distance between solutions x and y as defined in Equation (10). We can write the expectation as
23
Now, we can use the generating function in Equation (14) to simplify the result:
24
and we get Equation (22).
Corollary 2:
Given an arbitrary function with Fourier expansion
25
then the expected fitness after the mutation of a solution x with probability is given by
26
Proof:
Using Theorem 3 we can write
which proves the statement.
Note that these results are reduced to those of Chicano and Alba (2011) when q=2. In this case we have for a general function f
27

5.1  Discussion

The above derivations allow for determining the mutation rate that maximizes the expected fitness of the offspring of an individual. Unfortunately, such a value is not immediately useful for the following reasons. First, if the fitness function together with mutation operation yields an elementary landscape, then the expected fitness after the mutation of a solution is simplified due to the fact that only the Fourier coefficients za with a=0 and |a|=k for some constant k are nonzero. That is, Equation (26) is simplified to
In this case, the dependence of the expected value on the mutation rate is a simple polynomial with root , and the expectation-optimal expected values can be found at the extremes of the interval ( or ) or at the root of the polynomial . None of these optimal values provide new information from the problem at hand. In other words, the optimal mutation rates do not depend on the problem; they are general. We will see later an example of this in Section 6. If the fitness function is the sum of several elementary landscapes, then the expectation-optimal mutation rate obtained using Equation (26) becomes problem-dependent and the equation is potentially more useful.

Second, the expected value of the fitness function can have little to do with the real probability of finding an improving solution. As an illustration, if , then the expected value is exactly f(x), because there is no mutation. If the landscape is elementary and f(x)>z0, this is the maximum possible expected value. However, it is clear that the probability of finding an improving solution is zero if .

Third, mutation is a local operation. The mutation rate selected is just used in a single application of the mutation operator during one iteration of an evolutionary algorithm. Thus, this expectation-optimal mutation rate can ensure some kind of optimality in only one step of the algorithm. In this sense, the use of the expectation-optimal mutation rate is a greedy approach. It is well-known that for some optimization problems, a greedy approach can sometimes yield poor results.

The second limitation above could be addressed by using the probability that mutating a state x produces an offspring with a strictly improving fitness value. Similar to the complementary cumulative distribution function of f(XR) in Equation (19), this probability, expressed as a function of x and , is
28
Given a sequence of moments, finding the corresponding probability distribution is known as the moment problem. Unfortunately, the number and degree of moments necessary to determine the cumulative probability in Equation (28) may render its exact solution intractable.

However, Sutton et al. (2011a) show how a fitness distribution can be approximated using a small number of moments. The distribution is obtained by multiplying the vector of moments by the inverse and transposed Vandermonde matrix constructed from the set of codomain values of the fitness function.

The result is a vector of polynomials that each approximate the probability of the offspring attaining a particular fitness value. Thus, the probability that an offspring has higher fitness than a state x is given by the sum over the set of polynomials that correspond to fitness values strictly greater than f(x). Since the sum of polynomials is again a polynomial, this approximate probability can be expressed as a polynomial in . One can then solve for the value of that gives the maximal value in the interval [0, 1].

In practice, this numerically derived value could be used by algorithm designers to determine good parameter settings for the mutation rate. Moreover, this idea could potentially be applied to a more theoretical result. Specifically, if the above derivations could be relaxed in such a way as to allow for good bounds on the improvement probability, it would be possible to establish runtime results for simple genetic algorithms (without crossover) parameterized by . We defer these considerations to future work.

6  Results on Graph Coloring

The graph coloring problem is a classical combinatorial optimization problem to find a conflict-minimal q-coloring of a graph. The fitness of a coloring, given by the number of conflicts it induces, is epistatically bounded since it can be expressed as a function that depends only on second-order interactions. In this section, we apply to graph coloring the results presented in the foregoing sections.

Let be a graph with |V|=n. A q-coloring of is an assignment . The set of all q-colorings of is isomorphic to . The objective of the graph coloring problem is to find a coloring x that minimizes the count of edges incident to vertices that have the same color in x.

The fitness function can be characterized as
29
where
The following lemma allows us to characterize the sum of a series of Fourier functions over all elements of G in which two elements are fixed. This will be useful for the calculation of the Fourier coefficients of the graph coloring problem.
Lemma 6:
Let be arbitrary. Let denote the direct product of n−2 cyclic groups of order q, that is
Let be the element obtained from a by removing the components in positions u and v. Then
Proof:
Without loss of generality, suppose u<v. We thus have the following.
which yields the claimed result.

We are now ready to prove the following.

Theorem 4:

Let f be the fitness function for the q-coloring problem on a graph where |V|=n. We say that a group element is a balanced mask of the edge if and only if the following conditions hold:

1. |a|=2,

2. andare the two nonzero entries of a

3. .

Then the Fourier coefficients of f are
Proof:
Substituting the definition of f from Equation (29) into the expression for the Fourier coefficient in Equation (4) (and using the fact that |G|=qn) we can write
Because is nonzero only for for each we can write the Fourier coefficient corresponding to as
Letting be the direct product of n−2 cyclic groups of order q and be the element obtained by removing the components in positions u and v from a, we can appeal to Lemma 6 to yield the following.
30

We now make a case distinction for a.

1. |a|=0, that is, . In this case, in Equation (30) so by Equation (7) we have
Furthermore, since for any u and v, a[u]+a[v]=0, we have
2. |a|=1. Suppose that is the single nonzero component of a. Considering the interior summations of Equation (30), for each , either u=w, v=w, or . If u=w or v=w, then the term
since in this case 0<a[u]+a[v]<q making a qth root of unity and the sum vanishes by Equation (2). On the other hand, if , then and the term
by Equation (7). Since these are the only possibilities for |a|=1, it must be the case that za=0 when |a|=1.
3. |a|=2. Let and be the two only nonzero components of a. If (w, t)∉E, then clearly Equation (30) evaluates to zero. On the other hand, if , then we have
and since , by Equation (7) we have
The final step follows from the fact that if , then is a qth root of unity and . Otherwise, and the sum evaluates to q. Hence, if |a|=2, then if and only if the nonzero components of a correspond to an edge in E and their sum is equivalent to zero modulo q, otherwise za=0. Put another way, if |a|=2, then za is nonzero (and equal to ) if and only if a is a balanced mask of an edge .
4. |a|>2. In this case, so again by Equation (7), , and Equation (30) evaluates to zero. Hence for the case of |a|>2, za=0.

These cases cover all and yield the claimed result.

Thus, the graph coloring fitness function that counts constraint violations in a q-coloring corresponding to has a sparse representation in the Fourier basis: its Fourier coefficients are nonzero for which have exactly two nonzero entries corresponding to balanced masks of edges in .

6.1  The Average Number of Constraint Violations Within r Steps

We now apply the above results to compute the average number of constraint violations that occur within a sphere or ball of radius r centered on an arbitrary coloring. In doing so, we generalize a well-known result of Grover (1992) that relates the average number of constraint violations in the single-change neighborhood with the number of violations in the current coloring.

Let correspond to a q-coloring of . Let denote the Hamming ball of radius r around x. Substituting into Equation (21) results in an expression for this average value of f in B.
Since |a|=0 or |a|=2, we employ the notation defined in Equation (18) to write
31
where and are given by Equation (18).
The same can be done for the spheres. Let denote the Hamming sphere of radius r centered on x, a q-coloring of the graph. Then we have
32
where and are given by Equation (17).
Note that in the simple case, when r = 1, we have |S|=(q−1)n and the above equation simplifies to
Substituting the proper values for we have
From Theorem 4 we get which finally yields
This exactly corresponds to the result of Grover (1992) who derived the same expression for the average number of constraint violations in the immediate Hamming neighbors of a coloring. Equation (31) provides a similar compact expression for the average number of constraint violations within a Hamming region. We also remark here that the higher moments of the frequency distribution of the count of constraint violations could be easily computed using the method presented in Section 4.2.

6.2  Expectation-Optimal Mutation Rate

In Section 5 we provided closed-form formulas for the expected value after mutation of an arbitrary fitness function f. In this section we particularize the expression to the case of the graph coloring problem and we discuss the result from a practical point of view. We start with the main result of this section.

Theorem 5:
In the graph coloring problem, the expected value of the fitness function after the mutation of a solution x with probability is given by:
33
Proof:
According to Theorem 4, the fitness function of the graph coloring problem can be written as:
34
where z0=|E|/q and za=1/q. Then, using Equation (26) we can write the expected value after mutation as:
35
which proves the statement.

We are interested in studying the expected fitness under mutation with respect to the mutation rate . Equation (33) is a quadratic equation in . Furthermore, the vertex of the parabola is in . In Figure 2 we plot against .

Figure 2:

Expected fitness of an offspring of x, that is, , plotted against mutation rate for the graph coloring problem. The fitness of the current solution x is denoted by f(x), whereas |E|/q is the average fitness over all solutions.

Figure 2:

Expected fitness of an offspring of x, that is, , plotted against mutation rate for the graph coloring problem. The fitness of the current solution x is denoted by f(x), whereas |E|/q is the average fitness over all solutions.

Our first observation is that the expected value of the fitness function is f(x) when , since in this case we do not change the individual. Then, the expected value approaches the average value |E|/q in a quadratic way. The expected value is exactly the average value when (the vertex of the parabola). This result can also be explained from a different point of view. If , the probability of keeping the current value of a solution component x[i] is 1/q and the probability of changing this value to a different one is (q−1)/q. There are q−1 values to which the current one can be changed and the probability of selecting one of these values is 1/(q−1). Then, the probability of changing the value of x[i] to a given new value is
exactly the same as keeping the current value of x[i]. In summary, for this particular mutation rate, the mutation operator has the effect of replacing x with a state chosen uniformly at random from the search space. The expected value is thus the average value of the fitness function over the entire search space.

The plot in Figure 2 considers the case in which f(x)>|E|/q. If f(x)<|E|/q, then the plot is inverted along the X axis and the parabola has a maximum at . We can summarize the previous considerations in the following rules:

• If f(x)>|E|/q, then the maximum expected value of the fitness function after mutation is reached when , that is, when no mutation is applied.

• If f(x)<|E|/q, then the maximum expected value of the fitness function after mutation is reached when , which is equivalent to picking a random solution from the search space.

• If f(x)=|E|/q, then the expected mutation is always |E|/q, independent of the value of .

This analysis of the expected value of the fitness function for the graph coloring problem provides some information on the mutation process but does not give interesting values for the mutation rate. The reason is that the graph coloring problem is what we call an elementary landscape, and the expected value of an elementary landscape after a mutation operation has always the maximum and minimum values in the extremes ( and ) and/or a trivial value inside the interval ( in our case). More interesting values can be obtained when the function is not an elementary landscape (e.g., see Chicano and Alba, 2011). This happens in the case of the frequency assignment problem, which we now explore.

7  Results on Frequency Assignment

The frequency assignment problem is the last step in the layout of a radio network, like a 2G (second generation) cellular mobile network. Frequency assignment entails the assignment of a channel (a frequency band) to every transceiver (TRX) in the radio network. The optimization problem arises because the usable radio spectrum is generally very scarce and, consequently, channels have to be reused by many TRXs in the network (Eisenblätter, 2001).

The multiple use of the same channel may cause interference that might reduce the quality of service down to unsatisfactory levels. Indeed, significant interference may occur if the same or adjacent channels are used in neighboring overlapping cells. Two matrices are often used to measure the interferences. In the so-called interference matrix, denoted by M, each element mij indicates the degradation of the network quality if TRXs i and j operate on the same channel. This is called co-channel interference. In addition to co-channel interference, there also may exist a so-called adjacent-channel interference, which occurs when two TRXs operate on adjacent channels (i.e., one TRX operates on channel p and the other on channel p+1 or p−1). Co-channel and adjacent-channel interference are the most important ones in the design of a radio network. But we could also be interested in considering interference due to the overlapping of channels with a larger separation. This is in accordance with real-world applications, since the amount of interference between two channels depends on the separation of the channels (Aardal et al., 2003).

Thus, in our generalized form of the frequency assignment problem, we consider both co-channel interference and adjacent channel interference as well as interferences due to frequencies with a larger separation.

We can assign a cost to each possible interference that can occur in a channel assignment. Then, the objective is to minimize the cost due to interference in a radio network. An additional generalization of this problem also considers the possibility of additional costs due to the mere fact that a given channel is used by a given TRX, for example, a fee could be charged to a telecommunication company for using a channel in a given location. In general, the set of channels that can be assigned to each TRX might be different. We assume that the valid channels of each TRX are sorted and we use an integer number to represent their position in the sorted set. We also assume, without loss of generality, that the number of valid channels is the same, q, in all the TRXs. We denote with n the number of TRXs in the radio network.

In order to take into account all the previous considerations and keep a compact formulation of the problem, we define an array of weights in which we denote each element with wp,ru,v, where and . We can interpret the element wp,ru,v as the cost of having channel p in TRX u and channel r in TRX v. One solution for this problem is a map from the TRXs set, denoted with v, to the set of possible channels . Thus, the solution space is . Using the array of weights we can define the cost function as:
36

The cost element wr,pv,u has the same meaning as wp,ru,v, so we can set one of them to zero. However, for the sake of clarity and without loss of generality, we will take the convention that wp,ru,v=wr,pv,u for all u, v, p, r. Then, the cost element wp,ru,v with must be interpreted as half the cost of having channel p in TRX u and channel r in TRX v. If u=v, then the element wp,pu,u is the additional cost of having channel p in TRX u. The values wp,ru,u with are zero.

Let us rewrite the cost function in Equation (36) in a more convenient way, as a linear combination of functions:
37
where
38
Now we can focus on the functions . The following lemma provides a first result.
Lemma 7:
Given and , the Fourier coefficients of the function defined in Equation (38) are
39
Proof:
Substituting the definition of into the expression for the Fourier coefficients in Equation (4) we can write
Since is one when x[u]=p and x[v]=r and zero otherwise, we can write
and using Lemma 6 we have
where is the direct product of n−2 cyclic groups of order q and is the element obtained by removing the components in positions u and v from a. Equation (7) states that the sum in the previous expression will be only if , that is, if a[t]=0 for all such that and . The sum will be zero if any of the for t different from u and v. Replacing the sum in the previous expression by zero or qn-2 and taking into account that we find Equation (39).

Using this lemma we can prove the following theorem.

Theorem 6:
The Fourier coefficients of the fitness function of the frequency assignment problem f(x) are
40
Proof:
We can find the expression for the Fourier coefficients za of f(x) by computing the weighted sum of Equation (37) using the Fourier coefficients of the functions instead of the functions themselves:
41
where we include the parameters u, v, p, r in the Fourier coefficients of the functions. We now distinguish four cases depending on the number of nonzero elements of a.
• Case 1:
|a|=0. According to Lemma 7, zu,v,p,ra=1/q2, and Equation (41) can be simplified to:
• Case 2:
|a|=1. Let us denote with a[t] the nonzero entry of a. The coefficient zu,v,p,ra is nonzero if t=u or t=v. Then, we can simplify Equation (41) to
42
• Case 3:
|a|=2. Let us denote with a[u] and a[v] the only two entries in a that are nonzero. Then, only the coefficients zu,v,p,ra with any arbitrary value for p and r are nonzero. Equation (41) can be simplified to:
43
• Case 4:

|a|>2. In this case zu,v,p,ra=0 and, as a consequence, za=0.

These cases cover all the possible values for and prove the statement.

As in the case of the graph coloring problem, we can observe that the frequency assignment problem also has bounded epistasis and thus the fitness function has a sparse representation in the Fourier basis. In this case, the Fourier coefficients are nonzero only for the elements with nonzero values in two or fewer entries.

In the following, it will be convenient to group together the terms in the Fourier expansion related to elements having the same number of nonzero entries, the so-called elementary components of the function (Chicano et al., 2011). We identify here three components related to the terms in which |a|=0, |a|=1, and |a|=2. The definition of the elementary components are:
44
45
46
where we omit the variable in f[0] to highlight that it is a constant (function). The fitness function f(x) is just the sum of these three terms: f(x)=f[0]+f[1](x)+f[2](x).

7.1  The Average Fitness in Hamming Regions

We can use the Fourier expansion of the previous section to compute the average value of the fitness function in an arbitrary Hamming region (ball or sphere) of size r. Let correspond to a map of TRXs to channels. Let denote the Hamming ball of radius r around x. Substituting into Equation (21) results in an expression for this average value of f in B.
47
where , , and are given by Equation (18).
The same can be done for the spheres. Let denote the Hamming sphere of radius r centered on x, a map from TRXs to channels. Then we have
48
where , , and are given by Equation (17).
Note that in the simplest case, when r = 1, we have |S|=(q−1)n and the above equation simplifies to
Substituting the proper values for we have

As in the case of the graph coloring problem, the higher moments of the frequency distribution of the values of the fitness function could easily be computed using the method presented in Section 4.2.

7.2  Expectation-Optimal Mutation Rate

Let us now particularize the results of Section 5 to the case of the frequency assignment problem. The following theorem presents the main result.

Theorem 7:
In the frequency assignment problem, the expected value of the fitness function after the mutation of a solution x with probability is given by:
49
Proof:
According to Theorem 4, the fitness function of the graph coloring problem can be written as:
50
Then, using Equation (26) we can write the expected value after mutation as:
and using the definition of f[0], f[1](x), and f[2](x) given in Equations (44), (45), and (46) we have
51
which proves the statement.

According to the previous theorem, the expected value of the fitness function after a mutation is a quadratic equation with respect to the mutation rate . However, the analysis of this quadratic equation provides more information than in the case of the graph coloring problem. The reason is that the frequency assignment is not an elementary landscape and the behavior of the elementary components f[1](x) and f[2](x) is different, in general, from the behavior of f(x).

We are interested in finding the expectation-optimal mutation rate for a given solution x. Let us consider two cases:

• Casef[2](x)=0: In this case, Equation (49) is a linear equation in and the expectation-optimal mutation rate must be in one of the extremes of the interval: or . In particular, since we are minimizing, the expectation-optimal mutation rate will be if f[1](x)>0 and if f[1](x)<0. In the case where f[1](x)=0, the mutation rate is irrelevant, and the expected value of the fitness function is f[0].

• Case: By deriving Equation (49) with respect to , we can find the vertex of the parabola, which is:
52
If this vertex is in the interval [0, 1], then it could be the expectation-optimal mutation rate. The expected value of the fitness function after a mutation when this mutation rate is used is
53
If this expected value is lower than the expected values in the extremes, E(f(M0(x))) and E(f(M1(x))), then is the expectation-optimal mutation rate. Since f[0] is the expected value of the fitness function when a random solution is selected from the search space, Equation (53) reveals that is better than a simple random selection if f[2](x)>0. If f[2](x)<0, then a random selection of the solution provides a lower expected value for the fitness function. Finally, if is not in the interval [0, 1], then the expectation-optimal mutation rate must be in the extremes: or .

8  Conclusions

In this work we have provided closed-form formulas and algorithms for computing the exact moments of the frequency distribution for functions with bounded epistasis defined over length-n strings of a finite alphabet. The results presented in this paper are relevant in the context of combinatorial optimization problems in which candidate solutions are represented by q-ary strings.

In addition to the moments, we also derived an exact expression for the expected value of the fitness function after the application of a mutation operator. Using this expression it is possible to compute the mutation rate for which this expected value is optimum. These derivations are mainly of theoretical interest, and we present them as a rigorous step toward a better understanding of the statistics of combinatorial search spaces. However, we have also briefly discussed potential practical applications of this work. In particular, we presented a connection to previous work on approximating fitness distributions using a linear program that could in principle allow one to estimate the mutation rate that maximizes the probability of producing an offspring that has improving fitness. This result could be used for finding good parameter values during search, but it might also be useful for finding lower bounds on the probability of improvement, and therefore an upper bound for the runtime of a simple evolutionary algorithm. We leave this as an open research question.

Finally, we applied the derivations to the graph coloring problem, a well-known NP-hard problem, and the frequency assignment problem, another NP-hard problem with relevance in the telecommunication industry. For both problems, we presented exact closed-form formulas for the average value of the fitness function in Hamming regions around any state. We also derived, in both cases, the exact formula for the expected fitness of an offspring produced by uniform mutation at a specific rate and discussed the results obtained.

Acknowledgments

This work was partially funded by the Spanish Ministry of Science and Innovation and FEDER under contract TIN2008-06491-C04-01 (the M project), by the Andalusian Government under contract P07-TIC-03044 (DIRICOM project), and by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF, under grant number FA9550-08-1-0422. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.

The authors would also like to thank the organizers and participants of the seminar on Theory of Evolutionary Algorithms (10361) at Schloß Dagstuhl - Leibniz-Zentrum für Informatik.

References

Aardal
,
K. I.
,
van Hoesel
,
S. P. M.
,
Koster
,
A. M. C. A.
,
Mannino
,
C.
, and
Sassano
,
A.
(
2003
).
Models and solution techniques for frequency assignment problems
.
4OR
,
1
(
4
):
261
317
.
Bäck
,
T.
(
1992
).
The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm
. In
R. Männer and B. Manderick (Eds.)
,
Parallel Problem Solving from Nature II
, pp.
85
94
.
Böttcher
,
S.
,
Doerr
,
B.
, and
Neumann
,
F.
(
2010
).
. In
R. Schaefer, C. Cotta, J. Kolodziej, and G. Rudolph (Eds.)
,
Parallel Problem Solving from Nature XI. Lecture notes in computer science
Vol.
6238
(pp.
1
10
).
Berlin
:
Springer-Verlag
.
Chicano
,
F.
, and
Alba
,
E.
(
2011
).
Exact computation of the expectation curves of the bit-flip mutation using landscapes theory
. In
N. Krasnogor and P. L. Lanzi (Eds.)
,
Proceedings of the Thirteenth Annual Conference on Genetic and Evolutionary Computation
, pp.
2027
2034
.
Chicano
,
F.
,
Whitley
,
L. D.
, and
Alba
,
E.
(
2011
).
A methodology to find the elementary landscape decomposition of combinatorial optimization problems
.
Evolutionary Computation
,
19
(
4
):
597
637
.
Chihara
,
L.
, and
Stanton
,
D.
(
1990
).
Zeros of generalized Krawtchouk polynomials
.
Journal of Approximation Theory
,
60
:
43
57
.
Doerr
,
B.
,
Jansen
,
T.
,
Sudholt
,
D.
,
Winzen
,
C.
, and
Zarges
,
C.
(
2012
).
Mutation rate matters even when optimizing monotonic functions
.
Evolutionary Computation
, (
Early Access
),
doi: 10.1162/EVCO_a_00055
.
Eisenblätter
,
A.
(
2001
).
Frequency assignment in GSM networks: Models, heuristics, and lower bounds
.
PhD thesis, Technische Universität Berlin
.
Grover
,
L. K.
(
1992
).
Local search and the local structure of NP-complete problems
.
Operations Research Letters
,
12
:
235
243
.
Heckendorn
,
R. B.
(
1999
).
Walsh analysis, epistasis, and optimization problem difficulty for evolutionary algorithms
.
PhD thesis, Colorado State University, Fort Collins
.
Heckendorn
,
R. B.
(
2002
).
Embedded landscapes
.
Evolutionary Computation
,
10
(
4
):
345
369
.
Jansen
,
T.
, and
Wegener
,
I.
(
2000
).
On the choice of the mutation probability for the (1+1) EA
. In
M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. Merelo, and H.-P. Schwefel
(Eds.),
Parallel problem solving from nature VI
. Lecture notes in computer science, Vol.
1917
(pp.
89
98
).
Berlin
:
Springer-Verlag
.
MacWilliams
,
F. J.
, and
Sloane
,
N. J. A.
(
1977
).
The theory of error correcting codes
.
Amsterdam
:
North-Holland
.
Rana
,
S.
,
Heckendorn
,
R. B.
, and
Whitley
,
L. D.
(
1998
).
A tractable Walsh analysis of SAT and its implications for genetic algorithms
. In
Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)
, pp.
392
397
.
Reidys
,
C. M.
, and
,
P. F.
(
2002
).
Combinatorial landscapes
.
SIAM Review
,
44
(
1
):
3
54
.
Rényi
,
A.
(
1970
).
Foundations of probability
.
San Francisco, CA
:
Holden-Day
.
Sutton
,
A. M.
,
Whitley
,
L. D.
, and
Howe
,
A. E.
(
2011a
).
Approximating the distribution of fitness over Hamming regions
. In
Proceedings of Foundations of Genetic Algorithms XI
, pp.
93
104
.
Sutton
,
A. M.
,
Whitley
,
L. D.
, and
Howe
,
A. E.
(
2011b
).
Mutation rates of the (1+1)-EA on pseudo-Boolean functions of bounded epistasis
. In
N. Krasnogor and P. L. Lanzi (Eds.)
,
Proceedings of the Thirteenth Annual Conference on Genetic and Evolutionary Computation
, pp.
973
980
.
Sutton
,
A. M.
,
Whitley
,
L. D.
, and
Howe
,
A. E.
(
2012
).
Computing the moments of k-bounded pseudo-Boolean functions over Hamming spheres of arbitrary radius in polynomial time
.
Theoretical Computer Science
,
425
:
58
74
.
Terras
,
A.
(
1999
).
Fourier analysis on finite groups and applications
.
Cambridge, UK
:
Cambridge University Press
.
Witt
,
C.
(
2012
).
Optimizing linear functions with randomized search heuristics–The robustness of mutation
. In
C. Dürr and T. Wilke (Eds.)
,
STACS
,
Vol. 14 of LIPIcs
, pp.
420
431
.
Schloß Dagstuhl - Leibniz-Zentrum für Informatik
.

Notes

1

The direct product of groups is defined as the Cartesian product of the groups under elementwise group operation. In other words, given two groups and , an element is given by .

2

The kernel of a function is defined as