## Abstract

Many combinatorial optimization problems have underlying goal functions that are submodular. The classical goal is to find a good solution for a given submodular function *f* under a given set of constraints. In this paper, we investigate the runtime of a simple single objective evolutionary algorithm called () EA and a multiobjective evolutionary algorithm called GSEMO until they have obtained a good approximation for submodular functions. For the case of monotone submodular functions and uniform cardinality constraints, we show that the GSEMO achieves a -approximation in expected polynomial time. For the case of monotone functions where the constraints are given by the intersection of matroids, we show that the () EA achieves a ()-approximation in expected polynomial time for any constant . Turning to nonmonotone symmetric submodular functions with matroid intersection constraints, we show that the GSEMO achieves a -approximation in expected time .

## 1 Introduction

Evolutionary algorithms can efficiently find the minima of convex functions. While this is known and well studied in the continuous domain, it is not obvious what an equivalent statement for discrete optimization looks like. Let us recall that a differentiable fitness function is called *convex* if its derivative is nondecreasing in *x*. The bit string analogue of this is a fitness function whose discrete derivative is nondecreasing in *x* for all with *e _{i}* being the

*i*th unit vector. A discrete function satisfying the aforementioned condition is called

*submodular*. Submodularity is the counterpart of convexity in discrete settings (Lovász, 1983).

For understanding the properties of continuous optimizers it is central to study their performance for minimizing convex functions. This has been done in detail for continuous evolutionary algorithms (Beyer and Schwefel, 2002; Hansen, 2006). On the other hand, there is apparently very little prior work on the performance of discrete evolutionary algorithms for optimizing submodular functions. The only reference we are aware of is by Rudolph (1996, sec. 5.1.2.3). He proves that there are submodular functions for which the () EA requires exponential runtime (see also Bäck et al., 1997, sec. B2.4.2.5). We fill this gap and present several approximation results for simple evolutionary algorithms and submodular functions.

Analogous to the situation for convex functions, there is a significant difference between minimization and maximization of submodular functions. Submodular functions can be *minimized* with a (nontrivial) combinatorial algorithm in polynomial time (Iwata et al., 2001). On the other hand, submodular function *maximization* is -hard, as it generalizes many -hard combinatorial optimization problems, like maximum cut (Goemans and Williamson, 1995; Feige and Goemans, 1995), maximum directed cut (Halperin and Zwick, 2001), maximum facility location (Ageev and Sviridenko, 1999; Cornuejols et al., 1977), and several restricted satisfiability problems (Håstad, 2001; Feige and Goemans, 1995). As evolutionary algorithms are especially useful for hard problems, we focus on the maximization of submodular functions. Note that in general, submodular functions can also not be maximized *approximately* better than a constant factor unless PNP (Feige, 1998).

More formally, we consider the optimization problem where *X* is an arbitrary ground set, is a fitness function, and is a collection of independent sets describing the feasible region of the problem. As usual, we assume *value oracle access* to the fitness function, i.e., for a given set *S*, an algorithm can query an oracle to find its value . We also always assume that the fitness function is normalized, i.e., , and non-negative, i.e., for all . We study the following variants of *f* and :

*Submodular functions.*A function*f*is submodular iff for all .*Monotone functions.*A function*f*is monotone iff for all .*Symmetric functions.*A function*f*is symmetric iff for all .*Matroid.*A matroid is a pair composed of a ground set*X*and a nonempty collection of subsets of*X*satisfying (1) If and then and (2) If and then for some . The sets in are called*independent*; the*rank*of a matroid is the size of any maximal independent set.*Uniform matroid.*A uniform matroid of rank contains all subsets of size at most*k*, i.e., .*Partition matroid.*A partition matroid is a matroid formed from a direct sum of uniform matroids. If the universe*X*is partitioned into*k*parts and we have integers*d*with , then in a partition matroid a set_{i}*I*is independent if it contains at most*d*elements from each_{i}*X*, i.e., for all_{i}*i*. (Note that in the literature and in the conference version by Friedrich and Neumann (2014), it is assumed that for all*i*.)*Intersection of*Given*k*matroids.*k*matroids , , …, on the same ground set*X*, the intersection of these matroids is the matroid with . A simple example for is the family of matchings in a bipartite graph; or in general the family of hypergraph matchings in a*k*-partite hypergraph.

Maximizing submodular functions is not only -hard but also -hard to approximate. We therefore also have to formalize the notion of an approximation algorithm. We say an algorithm achieves an -approximation if for all instances of the considered maximization problem, the output returned by the algorithm is at least times the optimal value. In the context of evolutionary algorithms, we are interested in the expected time (usually measured by the number of fitness evaluations) until an evolutionary algorithm has achieved an -approximation.

We study the well-known () EA (Droste et al., 2002) as well as a multiobjective approach for optimizing submodular functions. Optimizing single objective optimization problems by multiobjective approaches such as the global simple evolutionary multiobjective optimizer (GSEMO) has already been shown to be beneficial for many single-objective optimization problems (Knowles et al., 2001; Jensen, 2004; Neumann and Wegener, 2006; Handl et al., 2008; Friedrich et al., 2010; Kratsch and Neumann, 2013). In this article, we prove the following statements.

Based on the seminal work of Nemhauser et al. (1978), we show that the GSEMO achieves in polynomial time a -approximation for maximizing

*monotone submodular*functions under a*uniform matroid constraint*(Theorem^{2}). This approximation factor is optimal in the general setting (Nemhauser and Wolsey, 1978), and it is optimal even for the special case of Max-*r*-Cover, unless (Feige, 1998). Furthermore, we show that there are local optima for the () EA that require exponential time to achieve an approximation better than , for any a constant (Theorem^{1}).Based on work of Lee et al. (2010) and using the idea of

*p*-exchanges, we show that the () EA achieves a -approximation for any*monotone submodular*function*f*under*k*matroid constraints in expected time polynomial in and , where is an integer and is a real value (Theorem^{6}).Based on the work of Lee et al. (2009), we show that the GSEMO achieves in expected time a -approximation for maximizing

*symmetric submodular*functions over, where is a real value (Theorem*k*matroid constraints^{9}). Furthermore, we explore the idea of*p*-exchanges and show that the GSEMO obtains (for , , and ) a -approximation in expected time (Theorem^{11}). Note that these results even hold for*nonmonotone*functions.

Friedrich and Neumann (2014) only studied the GSEMO. This article extends that work by providing lower and upper bounds for the () EA (Section 3.1 and Section 4) as well as using the idea of *p*-exchanges to prove improved bounds for the GSEMO and the case of symmetric submodular functions in Section 5.

The paper is organized as follows. In Section 2 we describe the setting for submodular functions and introduce the algorithm that is the subject of our investigations. We analyze the algorithm on monotone submodular functions with a uniform constraint in Section 3 and present results for monotone submodular functions under *k* matroid constraints in Section 4. In Section 5 we consider the case of symmetric (but not necessarily monotone) submodular functions under *k* matroid constraints. Finally, we discuss open problems in Section 6.

## 2 Preliminaries

### 2.1 Submodular Functions and Matroids

When optimizing a submodular function , we often consider the incremental value of adding a single element. For this, we denote by the marginal value of *i* with respect to *A*. Nemhauser et al. (1978, proposition 2.1) give seven equivalent definitions for submodular functions. Additionally to the definition stated in the introduction we also use that a function *f* is submodular iff for all and .

Many common pseudo-Boolean and combinatorial fitness functions are submodular. As we are not aware of any general results for the optimization of submodular functions by evolutionary algorithms, we list a few examples of well-known submodular functions:

*Linear functions.*All linear functions with for some weights are submodular. If for all , then*f*is also monotone. Note that in the latter case,*f*is both submodular and supermodular and therefore modular.*Cut.*Given a graph with non-negative edge weights . Let be a subset of vertices and be the set of all edges that contain a vertex in*S*and . The cut function is symmetric and submodular but not monotone.*Coverage.*Let the ground set be . Given a universe*U*with*n*subsets for , and a non-negative weight function . The coverage function with and the weighted coverage function with are monotone submodular.*Rank of a matroid.*The rank function of a matroid is monotone submodular.*Hypervolume indicator.*Given a set of points in in the objective space of a multiobjective optimization problem, measure the volume of the space dominated by these points relative to some fixed reference point. The hypervolume is a well-known quality measure in evolutionary multiobjective optimization and is known to be monotone submodular (Ulrich and Thiele, 2012).

We defined the most important matroids in the introduction. Matroid theory provides a framework in which many problems from combinatorial optimization can be studied from a unified perspective. Matroids are a special class of so-called *independence systems* that are given by a finite set *X* and a family of subsets such that is closed under subsets. Being a matroid is considered to be the property of an independence system that makes greedy algorithms work well. Within evolutionary computation, linear functions under matroid constraints have been considered by Reichel and Skutella (2010).

*i*th bit of

*x*is 1 iff . Let be the given submodular function and be the set of feasible solutions. Note that

*f*is defined on every element of . The constraints determining feasibility are given by

*k*matroids, where

*k*is a parameter. Given

*k*arbitrary matroids defined on a ground set

*X*together with their independent systems . We consider the problem where

*f*is a submodular function defined on the ground set

*X*.

Intersections of matroids occur in many settings like edge connectivity (Gabow, 1995), constrained minimum spanning trees (Hassin and Levin, 2004), and degree-bounded minimum spanning trees (Zenklusen, 2012).

*maximum weight matching problem in bipartite graphs*: Given a bipartite graph with bipartition , let and be two partition matroids on

*E*with where is the set of neighbors of

*v*. Then it is easy to see that if and only if

*I*induces a matching in

*G*.

*Colorful spanning trees* are an example for intersecting different kinds of matroids. Let with edges in *E* colored with *k* colors, that is, . Assume we are given integers and aim at finding a spanning tree of *G* that has at most *d _{i}* edges of color

*i*, i.e., for all

*i*. Then this can be phrased as a matroid intersection problem, as it is the combination of a spanning tree matroid and a partition matroid.

### 2.2 Algorithms

*x*. Generalizing the fitness function used by Reichel and Skutella (2010) for the intersection of two matroids, we consider for problems with

*k*matroid constraints , where denotes the rank of

*x*in matroid

*M*, i.e., for the set

_{j}*X*given by

*x*.

*x*. We write iff () holds. If holds, we say that

*y*is dominated by

*x*. The solution

*y*is strictly dominated by solution

*x*iff and . A solution

*x*that is not strictly dominated by any other solution is called Pareto optimal, and the corresponding objective vector is called Pareto optimal as well. The Pareto set of a given multiobjective problem consists of all Pareto optimal solutions, and the Pareto front consists of all Pareto optimal objective vectors. In our studies, we focus on the solution of the GSEMO and study the quality of this solution.

We study the expected number of iterations (of the repeat loop) of the () EA and the GSEMO until their feasible solution is for the first time an -approximation of an optimal feasible solution , i.e., holds. Here denotes the investigated approximation ratio for the considered problem. We call the expected number of iterations to reach an -approximation, the expected (run)time to achieve an -approximation.

## 3 Monotone Submodular Functions with a Uniform Constraint

In this section, we investigate submodular functions with one uniform constraint. In the case of one uniform constraint of size *r*, a solution is feasible if it has at most *r* elements. Hence, we have .

### 3.1 Lower bound for the () EA

We consider the () EA and show that this approach has to cope with local optima with a large inferior neighborhood. Getting trapped in these local optima, the algorithm finds it hard to achieve an approximation ratio greater than , where is a constant.

Based on our previously defined fitness function, we have , as we are considering problems with one uniform constraint. To show the upper bound on the approximation ratio, we consider an instance of the Max-*r*-Cover problem.

Our instance is obtained from a bipartite graph that has already been investigated in the context of the vertex cover problem (Friedrich et al., 2010). Let be the complete bipartite graph on and , where and for . The ground set is given by the set of edges *E*, and each node is identified with the subset of edges adjacent to *v _{i}*, i.e., . Let and be the set of subsets corresponding to the nodes of

*V*

_{1}and

*V*

_{2}, respectively. We consider the () EA working with bit strings of length

*n*, where the set

*E*is chosen iff , .

_{i}For the constraint, we set , where , , is an arbitrary small positive constant as an upper bound on the number of sets. Furthermore, we require , which is equivalent to . This implies that is an optimal solution covering the whole ground set *E* and can be achieved by setting .

There are monotone submodular functions *f* for which the () EA under a uniform matroid constraint may end up in bad local optima. More precisely, there is an instance of the Max-*r*-Cover problem such that starting with , the expected waiting time for the () EA to achieve an improvement and therefore a solution with an approximation ratio greater than is .

The search point has *r* chosen elements, and inserting any further elements without removing any other elements is not accepted. Furthermore, removing one or more elements without inserting any new ones covers fewer elements, which is therefore also not accepted. Each selected set of covers elements that are not covered by any other chosen element, whereas each set of would gain an additional contribution of at most elements.

In order to have a set of included and accepted, at least chosen sets of have to be removed. Removing such elements decreases the fitness by and has to be compensated by choosing at least sets of .

We conjecture that the previous theorem may be generalized to initial solutions chosen uniformly at random by following the analysis of the (1+1) EA for the vertex cover problem on complete bipartite graphs (Friedrich et al., 2010). We do not carry out such a technical analysis, as our purpose in this section is to point out a situation where the (1+1) EA gets stuck in a local optimum with an approximation ratio roughly of the Max-*r*-Cover problem.

### 3.2 Upper Bound for the GSEMO

We now turn to the GSEMO and show that this approach does not have to cope with local optima that may prevent the algorithm from achieving an approximation ratio better than . The GSEMO has the ability of carrying out local search operations but also allows for a greedy behavior, which is beneficial in this case. The greedy behavior of the GSEMO leads to the following result.

The expected time until the GSEMO has obtained a -approximation for a monotone submodular function *f* under a uniform constraint of size *r* is .

We first study the expected time until the GSEMO has produced the solution for the first time. This solution is Pareto optimal and will therefore stay in the population after it has been produced for the first time. Furthermore, the population size is upper bounded by , as it contains for each *i*, at most one solution having exactly *i* 1-bits. The solution is feasible and has the maximum number of 0-bits. This implies that the population will not include any infeasible solution to the submodular function *f* after having included .

For this step, we consider in each iteration the individual *y* that has the minimum number of 1-bits among all individuals in the population and denote the number of 1-bits in this individual. Note, that cannot increase during the run of the algorithm. For a solution with is produced with probability at least , as can be produced by selecting *y* for mutation and flipping one of the 1-bits. The expected waiting time to include the solution for the first time into the population is therefore upper bounded by

*X*with where denotes the value of a feasible optimal solution. Note, that a solution is feasible iff it has at most

_{j}*r*1-bits. After having included the solution into the population, this is true for . The proof is done by induction. Assume that the GSEMO has already obtained a solution fulfilling Equation (1) for each

*j*, . We claim that choosing the solution with for mutation and inserting the element corresponding to the largest possible increase of

*f*increases the value of

*f*by at least Let be the increase in

*f*that we obtain when choosing the solution with for mutation and inserting the element corresponding to the largest possible increase.

We demonstrate the applicability of Theorem ^{2} by two examples. First, consider the maximum coverage problem introduced in Section 2. Given a universe *U* with subsets , we want to maximize a coverage function such that . Theorem ^{2} immediately implies the following.

The expected time until the GSEMO has obtained a -approximation for the Max-*r*-Cover problem is . The achieved approximation factor is optimal, unless (Feige, 1998).

As a second example, we consider a problem from evolutionary multiobjective optimization. As discussed in Section 2, the hypervolume indicator is a monotone submodular function. The hypervolume subset selection problem (HYP-SSP), where we are given *n* points in and want to select a subset of size *k* with maximal hypervolume, therefore aims at maximizing a monotone submodular function under a uniform matroid constraint of rank *k*. HYP-SSP has been addressed by a number of authors (Bringmann et al., 2014a; 2014b; Glasmachers, 2014; Kuhn et al., 2014; Guerreiro et al., 2015). Theorem ^{2} has the following implication for HYP-SSP.

The expected time until the GSEMO has obtained a -approximation for HYP-SSP is .

## 4 Monotone Submodular Functions under Matroid Constraints

The previous section only studied uniform matroid constraints. We now extend this to general matroids and intersection of *k* matroids, and study monotone submodular functions under constraints given by *k* matroids .

We consider the () EA and start by analyzing the time until the algorithm has obtained a feasible solution *x* with . This result serves as the basis for the main result of this section.

Let *f* be a monotone submodular function under matroid constraints and Opt be the value of an optimal solution. The expected time until the () EA has obtained a feasible solution with is .

*x*for which holds. To do this, we generalize Proposition 10 of Reichel and Skutella (2010) to the case of the intersection of

*k*matroids. Suppose that

*x*is an infeasible solution with . During the optimization process, never increases, and there are at least distinct elements that can be removed to decrease . Hence, the probability of decreasing is at least and the expected time until a feasible solution has been produced is upper bounded by For the remainder of the proof, we work under the assumption that a feasible solution has already been obtained. Let

*x*be an arbitrary feasible solution and be an optimal solution. Furthermore, let

*a*be the element in such that . As

*f*is monotone, we have for any feasible solution containing the element

*a*. According to Theorem 2.1 of Lee et al. (2009), a feasible solution

*y*containing

*a*can be obtained from any feasible solution

*x*by introducing

*a*and removing at most

*k*elements from

*x*. The expected waiting time of the () EA for such a -bit flip is . Altogether, the expected time to produce a feasible solution

*x*with is , as for any .

In the previous section, we showed that there are local optima for submodular functions with one uniform constraint that only constitute an approximation ratio of at most . Furthermore, the () EA requires exponential time to leave these local optima. The following theorem shows that the () EA obtains a -approximation for any constant and in expected polynomial time. For the case , this implies a -approximation in expected polynomial time, as we may duplicate the single matroid constraining the search space.

For any integers , and real value , the expected time until the () EA has obtained a -approximation for any monotone submodular function *f* under *k* matroid constraints is .

Because of Lemma ^{5}, a feasible solution *x* with is obtained in expected time . In the following, we work under the assumption that the algorithm has obtained a feasible solution *x* with A *p*-exchange operation applied to the current solution *x* introduces at most new elements and deletes at most elements of *x*. A solution *y* that can be obtained from *x* by a *p*-exchange operation is called a *p*-exchange neighbor of *x*. According to Lee et al. (2010), every solution *x* for which there exists no *p*-exchange neighbor *y* with is a -approximation for any monotone submodular function.

Recall the example of finding colorful spanning trees (Section 2), which can be described as a monotone submodular maximization problem under matroid constraints. By choosing sufficiently large, we get the following corollary.

The expected time until the () EA has obtained a -approximation for colorful spanning trees is for all .

## 5 Symmetric Submodular Functions under Matroid Constraints

We now turn to symmetric submodular functions that are not necessarily monotone. For our analysis, we make use of the following corollary, which can be obtained from Lee et al. (2009).

Let *x* be a solution such that no solution with fitness at least can be achieved by deleting one element, or by inserting one element and deleting at most *k* elements. Then *x* is a -approximation.

Corollary ^{8} states that there is always the possibility of achieving a certain progress if no good approximation has been obtained. We use this to show the following results for the GSEMO. It should be noted that the corresponding Theorem ^{2} in Friedrich and Neumann (2014) is accidentally missing the symmetry condition.

The expected time until the GSEMO has obtained a -approximation for any symmetric submodular function under *k* matroid constraints is .

Following previous investigations, the GSEMO introduces the solution in the population after an expected number of steps. This solution is Pareto optimal and will from that point on stay in the population. Furthermore, is a feasible solution and has the largest possible number of 0-bits. Hence, from the time has been included in the population, the population will never include infeasible solutions.

Selecting for mutation and inserting the element that leads to the largest increase in the *f*-value produces a solution *y* with . The reason for this is that the number of elements is limited by *n* and that *f* is submodular. Having obtained a solution of fitness at least , we focus in each iteration on the individual having the largest *f*-value in *P*. Because of the selection mechanism of the GSEMO, a solution with the maximal *f*-value will always stay in the population, and the value will not decrease during the run of the algorithm.

As long as the algorithm has not obtained a solution of the desired quality, it can produce from its solution *x* with the highest *f*-value a feasible offspring *y* such that The expected waiting time for this event is , as at most specific bits of *x* have to be flipped and using the fact that the population size is at most .

As an example, let us consider again the NP-hard maximum cut problem, where for a given graph with *n* vertices and non-negative edge weights . We want to maximize the cut function over all as defined in Section 2. It is known that the greedy algorithm achieves a -approximation, while the best known algorithms achieve a -approximation (Goemans and Williamson, 1995). Theorem ^{4} immediately implies the following.

The expected time until the GSEMO has obtained a -approximation for the maximum cut problem is .

Note that this result is presumably not tight. We conjecture that a less general analysis can show that the GSEMO achieves a -approximation.

Using the idea of *p*-exchanges from Theorem ^{6}, we can improve the approximation result of Theorem ^{9} with an increasing runtime depending on *p*.

For any integers , and real value , the expected time until the GSEMO has obtained a -approximation for any symmetric submodular function under *k* matroid constraints is .

The GSEMO produces a feasible solution *x* with in expected time (see proof of Theorem ^{9}). After the GSEMO has obtained a solution *x* with , we focus on the solution with the largest *f*-value in the population.

*p*-exchange neighbor

*T*with . As

*f*is symmetric, we have , and adding to both sides yields which implies . The number of improvements by a factor is upper bounded by

Furthermore, the expected waiting time for such an improvement is , as the population size is upper bound by and a specific *p*-exchange has probability (see proof of Theorem ^{6}). This completes the proof.

## 6 Discussion and Open Problems

Maximizing submodular functions under matroid constraints is a very general optimization problem that contains many classical combinatorial optimization problems like maximum cut (Goemans and Williamson, 1995; Feige and Goemans, 1995), maximum directed cut (Halperin and Zwick, 2001), maximum facility location (Ageev and Sviridenko, 1999; Cornuejols et al., 1977), and others. We presented several positive and negative results for the approximation behavior of the simple evolutionary algorithms in the framework. To the best of our knowledge, this is the first paper on the analysis of evolutionary algorithms optimizing *submodular functions*. The only result on the performance of evolutionary algorithms under *matroid constraints* is by Reichel and Skutella (2010). They showed that the () EA achieves in polynomial time a -approximation for maximizing a linear function subject to *k* matroid constraints.

This paper gives a first set of results but also raises many new questions. We briefly name a few:

We only study the () EA and SEMO algorithms, but similar results might be possible for population-based algorithms with appropriate diversity measures.

Our runtime upper bounds might not be tight. It would be interesting to show matching lower bounds, especially for comparing different algorithms and function classes.

The proven approximation guarantees hold for very general problem classes. Much tighter results should be possible for specific problems like maximum cut.

Minimizing submodular functions is in general simpler than maximizing submodular functions. However, it is not obvious what this implies for evolutionary algorithms minimizing submodular functions.

Our proofs strongly rely on the greedy-like behavior of SEMO. It might be possible to prove a general relationship between SEMO and greedy algorithms; or to give an example where SEMO strictly outperforms a greedy strategy.

We assume value oracle access to the fitness function

*f*. It might be worth studying the black box complexity of submodular functions in the sense of Lehre and Witt (2012).We studied submodular fitness functions that are either monotone or symmetric. Future work should also cover submodular functions that are neither monotone nor symmetric.

## Acknowledgments

The research leading to these results received funding from the Australian Research Council (ARC) under grant agreement DP140103400 and from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement 618091 (SAGE).