The frequency distribution of a fitness function over regions of its domain is an important quantity for understanding the behavior of algorithms that employ randomized sampling to search the function. In general, exactly characterizing this distribution is at least as hard as the search problem, since the solutions typically live in the tails of the distribution. However, in some cases it is possible to efficiently retrieve a collection of quantities (called moments) that describe the distribution. In this paper, we consider functions of bounded epistasis that are defined over length-n strings from a finite alphabet of cardinality q. Many problems in combinatorial optimization can be specified as search problems over functions of this type. Employing Fourier analysis of functions over finite groups, we derive an efficient method for computing the exact moments of the frequency distribution of fitness functions over Hamming regions of the q-ary hypercube. We then use this approach to derive equations that describe the expected fitness of the offspring of any point undergoing uniform mutation. The results we present provide insight into the statistical structure of the fitness function for a number of combinatorial problems. For the graph coloring problem, we apply our results to efficiently compute the average number of constraint violations that lie within a certain number of steps of any coloring. We derive an expression for the mutation rate that maximizes the expected fitness of an offspring at each fitness level. We also apply the results to the slightly more complex frequency assignment problem, a relevant application in the domain of the telecommunications industry. As with the graph coloring problem, we provide formulas for the average value of the fitness function in Hamming regions around a solution and the expectation-optimal mutation rate.
Many problems in combinatorial optimization, such as graph coloring, register allocation, and frequency assignment problems, can be characterized as a search for the extremum of a real-valued function over the set of length-n strings over a q-ary alphabet. Stochastic local search algorithms and mutation-based evolutionary algorithms for such problems can be viewed as processes that perform a biased random walk on the q-ary hypercube graph. Characterizing the frequency distribution of fitness values over regions of this graph is thus an important part of understanding the behavior of such algorithms and the influence of problem structure on this behavior.
In this paper we derive a method for the efficient calculation of exact statistical moments of the frequency distribution of fitness function values over Hamming regions of the q-ary hypercube provided that the function is bounded epistatically by a constant. The retrieval of these moments provides insight into the local structure of the search space and can be used to describe the fitness of states that lie within a certain distance of any given state. Our aim is to provide a framework for studying the statistical structure of local regions of the search space for a broad class of combinatorial optimization problems. This framework is directly relevant to the theory of randomized search heuristics such as evolutionary algorithms, simulated annealing, and variants of randomized local search (hill-climbing). For the graph coloring problem and the frequency assignment problem, we derive specific equations that model the fitness statistics of states sampled by the move and mutation operators of such algorithms.
We present a direct computation of the exact statistics (e.g., mean, variance, and skew) of the distribution of fitness values over regions of the search space for problems over strings. As long as the epistasis of the fitness function is bounded by a constant, our method has polynomial complexity even when the cardinality of the region grows as an exponential in n. Subsequently, we use this result to determine an exact calculation of the expected fitness of an offspring parameterized by mutation rate from any given point. Aside from working out the statistics of the fitness of an offspring of a state undergoing mutation in an evolutionary algorithm, it also allows one to exactly compute the rate that results in the maximal expected fitness at any point. We see that the resulting expression illuminates the behavior of mutation on combinatorial problems such as graph coloring and the frequency assignment problem. Currently, our results are mainly of theoretical interest; however, we also discuss some different ways they could eventually be used in practice to design more principled algorithms.
This paper generalizes the work of Sutton et al. (2012) who presented a polynomial-time algorithm for computing the moments of pseudo-Boolean functions over regions in the Boolean hypercube, that is, for q=2. We will generalize this result to complex, epistatically bounded functions over strings from a finite alphabet of q elements where q is O(1) and the corresponding adjacency is the generalized Hamming graph (i.e., the q-ary hypercube). This paper also generalizes a result of Heckendorn (2002) who derived a polynomial-time algorithm for obtaining the summary statistics of the fitness function distribution over the entire search space when q=2.
The remainder of the paper is organized as follows. In the next section, we introduce some technical preliminaries and show that fitness functions over q-ary strings can be written in a Fourier function basis expansion. In Section 3, we relate this Fourier analysis to the Hamming adjacency by proving that the Fourier basis functions are eigenfunctions of generalized Hamming adjacency matrices. This allows us to closely study the statistical relationship between the fitness function and Hamming regions in the q-ary hypercube. In Section 4 we apply these results to obtain an efficient algorithm for computing moments of the fitness distribution over arbitrary Hamming regions. We then turn our attention to uniform mutation in the q-ary hypercube. Specifically, in Section 5 we apply our results to derive equations that express the expected fitness of an offspring of an arbitrary state undergoing mutation as a function of rate. We then show how the results introduced in this paper can be immediately applied to two specific combinatorial problems, graph coloring and the frequency assignment problem in Sections 6 and 7, respectively. Finally, in Section 8 we conclude the paper.
2 Fourier Analysis of Fitness Functions over Σqn
Let denote an alphabet of cardinality q. We are interested in functions over the set of length-n strings over . Combinatorial optimization problems typically have a real-valued fitness function. We will be slightly more general and work with complex functions over . This makes the analysis somewhat more natural.
2.1 Some Useful Properties of Fourier Analysis
In this paper, we will often take advantage of the Fourier spectral properties of fitness functions that have bounded epistasis. We now make this concept rigorous.
A function is k if it can be expressed as the sum of functions that each depend on at most k positions of their argument.
Epistatically bounded fitness functions arise from combinatorial problems that are structures built from components of bounded size. For example, the fitness function for many graph-theoretic problems can be expressed as the sum of subfunctions over each edge.
In the lemma that follows, we will make brief use of the following concept of a sequence projection. We will denote by |a| the number of nonzero elements of .
This definition facilitates the following proof that if the kernel of a function2 contains the kernel of a sequence projection, then that function is orthogonal to certain Fourier functions.
Since epistatically bounded functions are composed of precisely the type of function in the statement of Lemma 1, we may use its result to prove the following theorem, which imposes a constraint on the location of the nonzero Fourier coefficients of any function over G with bounded epistasis.
Epistatically bounded functions thus have sparse representations in the Fourier basis. Indeed, we can immediately bound the number of nonzero coefficients for an epistatically bounded function.
If f is epistatically bounded by k, the number of nonzero Fourier coefficients for f is bounded from above by O(nk(q−1)k). Moreover, if f can be written as the sum of m=o(nk) functions that each depend on at most k positions, the count of nonzero Fourier coefficients for f is bounded from above by O(mqk).
3 Hamming Regions in the q-ary Hypercube
Randomized local search and many evolutionary algorithms employ elementary moves or mutation operators to locally sample the set of states by performing small perturbations on current states to construct similar neighboring states. For a given current state x, the neighborhood of x is defined as N(x), which is the set of states reachable from x by a single move or mutation.
Thus, for any neighborhood defined by a fixed set S of group operations, the Fourier basis functions are eigenfunctions of the adjacency that arises from that neighborhood.
3.1 The Hamming Adjacency
3.2 Generalized Adjacencies: Hamming Regions
We define the Hamming sphere of radius r around a state as the set of states , and the Hamming ball of radius r around a state as the set of states . In these cases, we refer to x as the centroid of the sphere (ball). A set is called a Hamming region if it is a Hamming sphere (ball) for some centroid and radius.
For a given , there are exactly ways to choose j indices ik where and ways to choose the remaining r−j indices where . Summing over j and appealing to Equation (12) yields the claimed result.
Theorem 2 covers the special case of Equation (13) when r=1. The set of eigenvalues corresponds to the generalized Krawtchouk polynomials. These polynomials have applications in the theory of error correcting codes (MacWilliams and Sloane, 1977) in which generalized Hamming schemes are important.
4 Moments of Epistatically Bounded Fitness Functions
From an analytical perspective, the distribution of fitness function codomain values over a Hamming region in the q-ary hypercube provides insight into how an evolutionary algorithm might behave in that region of the search space. Suppose is a Hamming region. Consider the random process of drawing a state uniformly at random from R. The resulting state can be characterized as a random variable over R, which we denote as XR. The fitness of this uniformly drawn state is measured by the random variable f(XR).
An exact computation of this distribution is not tractable in general, indeed the special case of q=2 is already NP-hard (Sutton et al., 2011a, Theorem 1). Clearly, we would expect it to be as hard as the corresponding search problem when R is a Hamming ball of radius . Thus, instead of a complete characterization, we are interested in efficiently computing exact quantities that describe the shape, location, and spread of the distribution of f(XR). Such quantities are called the moments of f(XR), which are equivalent to the moments of the frequency distribution of f over R. We now apply the foregoing results to describe how one can efficiently compute the first such moment, a quantity that captures the expectation of f(XR). This corresponds to the average codomain value of f over the region R.
We can also prove the analogous result for an arbitrary Hamming ball.
4.1 The Average Fitness in a Hamming Region
Let R be a Hamming region (sphere or ball) of radius r for some and some . The average of f over R is equivalent to the expectation of the random variable f(XR) which measures the fitness of a state XR drawn uniformly at random from R.
For asymptotically large Hamming regions, that is, , the average fitness in R is intractable to compute directly. However, when f is epistatically bounded, we can apply the analysis presented in this paper to derive a compact expression to efficiently compute the exact average. Let be a ball of radius r around an arbitrary state x (the case is analogous for Hamming spheres).
The sum accumulated in s corresponds to the sum in Equation (21) and contains exactly |Z| terms. Since we have supposed f to be epistatically bounded by k, Corollary 1 holds and provides a bound on the cardinality of Z. Since the eigenvalues can be precomputed, the time complexity of FirstMoment is dominated by the loop through the Fourier coefficients and is hence bounded above by O(nk(q−1)k). In the case that f can be written as a sum over m=o(nk) functions that each depend on at most k positions, the bound can be improved to O(mqk).
In the case of a sphere of radius r, the same algorithm can be used changing the factor by , and changing the factor |b| by |S|.
4.2 Higher Moments
It is easy to show that if f has a sparse representation in the Fourier basis and c=O(1), so must f c.
The time complexity of Moment is dominated by the call to FirstMoment for function f c. Since f is epistatically bounded by k, it follows that f c is epistatically bounded by and the runtime is bounded by O(nck(q−1)ck). Again, if f can be expressed as a linear combination of m=o(nk) terms that depend on at most k positions, the bound is improved to O(mcqck). Since we have assumed c and k to be constants, the complexity is polynomial in n. However, we remark here that it is of course practically constrained to modest values of c.
5 Expected Fitness After a Mutation
The mutation operator has capital importance in evolutionary algorithms. Let us consider the following mutation operator in the q-ary hypercube. For each element x[i] in the solution vector, change its value with probability . Here, is typically called the mutation rate. The change can be made to any of the q−1 possible values that are different from x[i] with the same probability 1/(q−1). The mutation rate parameter is a quantity that describes how much an individual is perturbed to generate an offspring during search. One is often interested in deriving the mutation rate that maximizes the probability of generating a successful offspring, that is, one with strictly improving fitness. Such a quantity has an immediate use in runtime analysis, since it describes the success probability of a geometric random variable that measures the time until an improving move is taken.
The optimal mutation rate for general functions is currently not well-understood. For very simple linear OneMax functions over binary strings, the mutation rate was investigated by Bäck (1992). In practice, the commonly recommended mutation rate for functions over binary strings of length n is the so-called standard mutation rate of 1/n. In the case of linear functions over binary strings, Witt (2012) has proved that the optimal mutation rate is indeed 1/n, and even derived a bound of enln n+O(n) on the corresponding runtime of a simple evolutionary algorithm at this rate, which is tight up to lower-order terms.
The case of nonlinear functions is less clear. Jansen and Wegener (2000) introduce a function over binary strings on which the standard rate of 1/n is provably suboptimal because it takes an exponential number of steps to find a global optimum with high probability. Similarly, Böttcher et al. (2010) proved the standard 1/n mutation rate is suboptimal for the classical LeadingOnes function. They also introduce an adaptive mutation rate that depends on the current fitness and improves the optimization time by 12% over a nonadaptive rate. In the current paper, we find that the expectation-best mutation rate is a function of current solution.
Doerr et al. (2012) consider the class of strictly monotonic functions over binary strings and show that, for mutation rates of c/n, the choice of constant c has a strong effect on the optimization time. They present a particular monotonic function for which changing the mutation rate of the (1+1)-EA by a constant factor leads to an exponential performance gap. These results stress the importance of an understanding of the relationship between the mutation rate and the function being searched, and that the issue of how best to set the mutation rate for general optimization is far from resolved.
For epistatically bounded nonlinear functions over binary strings, Chicano and Alba (2011) showed that the expected fitness of the offspring is a polynomial in the mutation rate. Sutton et al. (2011b) also investigated some properties of this polynomial. In this section we extend these results to the q-ary hypercube.
Let be a random variable that takes the value in G after applying the mutation operator (with mutation rate ) to x. Then, is the probability of obtaining y after the mutation operator is applied to x. We are interested in computing the expected value of a function f over G after a mutation, that is, . The following theorem provides a first result.
Second, the expected value of the fitness function can have little to do with the real probability of finding an improving solution. As an illustration, if , then the expected value is exactly f(x), because there is no mutation. If the landscape is elementary and f(x)>z0, this is the maximum possible expected value. However, it is clear that the probability of finding an improving solution is zero if .
Third, mutation is a local operation. The mutation rate selected is just used in a single application of the mutation operator during one iteration of an evolutionary algorithm. Thus, this expectation-optimal mutation rate can ensure some kind of optimality in only one step of the algorithm. In this sense, the use of the expectation-optimal mutation rate is a greedy approach. It is well-known that for some optimization problems, a greedy approach can sometimes yield poor results.
However, Sutton et al. (2011a) show how a fitness distribution can be approximated using a small number of moments. The distribution is obtained by multiplying the vector of moments by the inverse and transposed Vandermonde matrix constructed from the set of codomain values of the fitness function.
The result is a vector of polynomials that each approximate the probability of the offspring attaining a particular fitness value. Thus, the probability that an offspring has higher fitness than a state x is given by the sum over the set of polynomials that correspond to fitness values strictly greater than f(x). Since the sum of polynomials is again a polynomial, this approximate probability can be expressed as a polynomial in . One can then solve for the value of that gives the maximal value in the interval [0, 1].
In practice, this numerically derived value could be used by algorithm designers to determine good parameter settings for the mutation rate. Moreover, this idea could potentially be applied to a more theoretical result. Specifically, if the above derivations could be relaxed in such a way as to allow for good bounds on the improvement probability, it would be possible to establish runtime results for simple genetic algorithms (without crossover) parameterized by . We defer these considerations to future work.
6 Results on Graph Coloring
The graph coloring problem is a classical combinatorial optimization problem to find a conflict-minimal q-coloring of a graph. The fitness of a coloring, given by the number of conflicts it induces, is epistatically bounded since it can be expressed as a function that depends only on second-order interactions. In this section, we apply to graph coloring the results presented in the foregoing sections.
Let be a graph with |V|=n. A q-coloring of is an assignment . The set of all q-colorings of is isomorphic to . The objective of the graph coloring problem is to find a coloring x that minimizes the count of edges incident to vertices that have the same color in x.
We are now ready to prove the following.
We now make a case distinction for a.
- |a|=1. Suppose that is the single nonzero component of a. Considering the interior summations of Equation (30), for each , either u=w, v=w, or . If u=w or v=w, then the term a qth root of unity and the sum vanishes by Equation (2). On the other hand, if , then and the term 7). Since these are the only possibilities for |a|=1, it must be the case that za=0 when |a|=1.
- |a|=2. Let and be the two only nonzero components of a. If (w, t)∉E, then clearly Equation (30) evaluates to zero. On the other hand, if , then we have , by Equation (7) we have , then is a qth root of unity and . Otherwise, and the sum evaluates to q. Hence, if |a|=2, then if and only if the nonzero components of a correspond to an edge in E and their sum is equivalent to zero modulo q, otherwise za=0. Put another way, if |a|=2, then za is nonzero (and equal to ) if and only if a is a balanced mask of an edge .
These cases cover all and yield the claimed result.
Thus, the graph coloring fitness function that counts constraint violations in a q-coloring corresponding to has a sparse representation in the Fourier basis: its Fourier coefficients are nonzero for which have exactly two nonzero entries corresponding to balanced masks of edges in .
6.1 The Average Number of Constraint Violations Within r Steps
We now apply the above results to compute the average number of constraint violations that occur within a sphere or ball of radius r centered on an arbitrary coloring. In doing so, we generalize a well-known result of Grover (1992) that relates the average number of constraint violations in the single-change neighborhood with the number of violations in the current coloring.
6.2 Expectation-Optimal Mutation Rate
In Section 5 we provided closed-form formulas for the expected value after mutation of an arbitrary fitness function f. In this section we particularize the expression to the case of the graph coloring problem and we discuss the result from a practical point of view. We start with the main result of this section.
We are interested in studying the expected fitness under mutation with respect to the mutation rate . Equation (33) is a quadratic equation in . Furthermore, the vertex of the parabola is in . In Figure 2 we plot against .
The plot in Figure 2 considers the case in which f(x)>|E|/q. If f(x)<|E|/q, then the plot is inverted along the X axis and the parabola has a maximum at . We can summarize the previous considerations in the following rules:
If f(x)>|E|/q, then the maximum expected value of the fitness function after mutation is reached when , that is, when no mutation is applied.
If f(x)<|E|/q, then the maximum expected value of the fitness function after mutation is reached when , which is equivalent to picking a random solution from the search space.
If f(x)=|E|/q, then the expected mutation is always |E|/q, independent of the value of .
This analysis of the expected value of the fitness function for the graph coloring problem provides some information on the mutation process but does not give interesting values for the mutation rate. The reason is that the graph coloring problem is what we call an elementary landscape, and the expected value of an elementary landscape after a mutation operation has always the maximum and minimum values in the extremes ( and ) and/or a trivial value inside the interval ( in our case). More interesting values can be obtained when the function is not an elementary landscape (e.g., see Chicano and Alba, 2011). This happens in the case of the frequency assignment problem, which we now explore.
7 Results on Frequency Assignment
The frequency assignment problem is the last step in the layout of a radio network, like a 2G (second generation) cellular mobile network. Frequency assignment entails the assignment of a channel (a frequency band) to every transceiver (TRX) in the radio network. The optimization problem arises because the usable radio spectrum is generally very scarce and, consequently, channels have to be reused by many TRXs in the network (Eisenblätter, 2001).
The multiple use of the same channel may cause interference that might reduce the quality of service down to unsatisfactory levels. Indeed, significant interference may occur if the same or adjacent channels are used in neighboring overlapping cells. Two matrices are often used to measure the interferences. In the so-called interference matrix, denoted by M, each element mij indicates the degradation of the network quality if TRXs i and j operate on the same channel. This is called co-channel interference. In addition to co-channel interference, there also may exist a so-called adjacent-channel interference, which occurs when two TRXs operate on adjacent channels (i.e., one TRX operates on channel p and the other on channel p+1 or p−1). Co-channel and adjacent-channel interference are the most important ones in the design of a radio network. But we could also be interested in considering interference due to the overlapping of channels with a larger separation. This is in accordance with real-world applications, since the amount of interference between two channels depends on the separation of the channels (Aardal et al., 2003).
Thus, in our generalized form of the frequency assignment problem, we consider both co-channel interference and adjacent channel interference as well as interferences due to frequencies with a larger separation.
We can assign a cost to each possible interference that can occur in a channel assignment. Then, the objective is to minimize the cost due to interference in a radio network. An additional generalization of this problem also considers the possibility of additional costs due to the mere fact that a given channel is used by a given TRX, for example, a fee could be charged to a telecommunication company for using a channel in a given location. In general, the set of channels that can be assigned to each TRX might be different. We assume that the valid channels of each TRX are sorted and we use an integer number to represent their position in the sorted set. We also assume, without loss of generality, that the number of valid channels is the same, q, in all the TRXs. We denote with n the number of TRXs in the radio network.
The cost element wr,pv,u has the same meaning as wp,ru,v, so we can set one of them to zero. However, for the sake of clarity and without loss of generality, we will take the convention that wp,ru,v=wr,pv,u for all u, v, p, r. Then, the cost element wp,ru,v with must be interpreted as half the cost of having channel p in TRX u and channel r in TRX v. If u=v, then the element wp,pu,u is the additional cost of having channel p in TRX u. The values wp,ru,u with are zero.
Using this lemma we can prove the following theorem.
- Case 1: |a|=0. According to Lemma 7, zu,v,p,ra=1/q2, and Equation (41) can be simplified to:
- Case 2: |a|=1. Let us denote with a[t] the nonzero entry of a. The coefficient zu,v,p,ra is nonzero if t=u or t=v. Then, we can simplify Equation (41) to
- Case 3: |a|=2. Let us denote with a[u] and a[v] the only two entries in a that are nonzero. Then, only the coefficients zu,v,p,ra with any arbitrary value for p and r are nonzero. Equation (41) can be simplified to:
- Case 4:
|a|>2. In this case zu,v,p,ra=0 and, as a consequence, za=0.
These cases cover all the possible values for and prove the statement.
As in the case of the graph coloring problem, we can observe that the frequency assignment problem also has bounded epistasis and thus the fitness function has a sparse representation in the Fourier basis. In this case, the Fourier coefficients are nonzero only for the elements with nonzero values in two or fewer entries.
7.1 The Average Fitness in Hamming Regions
As in the case of the graph coloring problem, the higher moments of the frequency distribution of the values of the fitness function could easily be computed using the method presented in Section 4.2.
7.2 Expectation-Optimal Mutation Rate
Let us now particularize the results of Section 5 to the case of the frequency assignment problem. The following theorem presents the main result.
According to the previous theorem, the expected value of the fitness function after a mutation is a quadratic equation with respect to the mutation rate . However, the analysis of this quadratic equation provides more information than in the case of the graph coloring problem. The reason is that the frequency assignment is not an elementary landscape and the behavior of the elementary components f(x) and f(x) is different, in general, from the behavior of f(x).
We are interested in finding the expectation-optimal mutation rate for a given solution x. Let us consider two cases:
Casef(x)=0: In this case, Equation (49) is a linear equation in and the expectation-optimal mutation rate must be in one of the extremes of the interval: or . In particular, since we are minimizing, the expectation-optimal mutation rate will be if f(x)>0 and if f(x)<0. In the case where f(x)=0, the mutation rate is irrelevant, and the expected value of the fitness function is f.
- Case: By deriving Equation (49) with respect to , we can find the vertex of the parabola, which is: is the expectation-optimal mutation rate. Since f is the expected value of the fitness function when a random solution is selected from the search space, Equation (53) reveals that is better than a simple random selection if f(x)>0. If f(x)<0, then a random selection of the solution provides a lower expected value for the fitness function. Finally, if is not in the interval [0, 1], then the expectation-optimal mutation rate must be in the extremes: or .
In this work we have provided closed-form formulas and algorithms for computing the exact moments of the frequency distribution for functions with bounded epistasis defined over length-n strings of a finite alphabet. The results presented in this paper are relevant in the context of combinatorial optimization problems in which candidate solutions are represented by q-ary strings.
In addition to the moments, we also derived an exact expression for the expected value of the fitness function after the application of a mutation operator. Using this expression it is possible to compute the mutation rate for which this expected value is optimum. These derivations are mainly of theoretical interest, and we present them as a rigorous step toward a better understanding of the statistics of combinatorial search spaces. However, we have also briefly discussed potential practical applications of this work. In particular, we presented a connection to previous work on approximating fitness distributions using a linear program that could in principle allow one to estimate the mutation rate that maximizes the probability of producing an offspring that has improving fitness. This result could be used for finding good parameter values during search, but it might also be useful for finding lower bounds on the probability of improvement, and therefore an upper bound for the runtime of a simple evolutionary algorithm. We leave this as an open research question.
Finally, we applied the derivations to the graph coloring problem, a well-known NP-hard problem, and the frequency assignment problem, another NP-hard problem with relevance in the telecommunication industry. For both problems, we presented exact closed-form formulas for the average value of the fitness function in Hamming regions around any state. We also derived, in both cases, the exact formula for the expected fitness of an offspring produced by uniform mutation at a specific rate and discussed the results obtained.
This work was partially funded by the Spanish Ministry of Science and Innovation and FEDER under contract TIN2008-06491-C04-01 (the M project), by the Andalusian Government under contract P07-TIC-03044 (DIRICOM project), and by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF, under grant number FA9550-08-1-0422. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.
The authors would also like to thank the organizers and participants of the seminar on Theory of Evolutionary Algorithms (10361) at Schloß Dagstuhl - Leibniz-Zentrum für Informatik.
The direct product of groups is defined as the Cartesian product of the groups under elementwise group operation. In other words, given two groups and , an element is given by .
The kernel of a function is defined as