For stochastic multi-objective combinatorial optimization (SMOCO) problems, the adaptive Pareto sampling (APS) framework has been proposed, which is based on sampling and on the solution of deterministic multi-objective subproblems. We show that when plugging in the well-known simple evolutionary multi-objective optimizer (SEMO) as a subprocedure into APS, ε-dominance has to be used to achieve fast convergence to the Pareto front. Two general theorems are presented indicating how runtime complexity results for APS can be derived from corresponding results for SEMO. This may be a starting point for the runtime analysis of evolutionary SMOCO algorithms.
In recent years, both the fields of multi-objective combinatorial optimization (MOCO) and of stochastic combinatorial optimization (SCO) have made considerable progress, and numerous articles have appeared in these areas. Authors in the MOCO field argue that most real-world problems are characterized by multiple objectives, and authors from SCO emphasize that in real-world situations, decision makers typically face a smaller or larger degree of uncertainty on parameters on which the decisions have to be based. If we join these arguments, we arrive at the conjecture that in a very broad range of applications of decision technology, realistic quantitative models will be characterized by the presence of both multiple objectives and uncertainty (the latter represented by stochastic models). This makes it desirable that, in particular, the toolkit for combinatorial optimization is extended to methods solving stochastic multi-objective combinatorial optimization (SMOCO) problems. Indeed, problems of this type have been described in diverse areas such as financial engineering (Spronk et al., 2005), production planning (Hnainen et al., 2010), transportation logistics (Hassan-Pour et al., 2009), supply chain management (Amodeo et al., 2009), health care management (Baesler and Sepúlveda, 2001), and project management (Gutjahr and Reiter, 2010).
A general methodology for tackling SMOCO problems has begun to develop only recently. Both in the MOCO and in the SCO field, evolutionary algorithms (EAs) play an important role, as seen, for example, from the surveys for MOCO (Coello Coello, 2006) and for SCO (Jin and Branke, 2005; Bianchi et al., 2008). This suggests that EAs could also play an important role in the SMOCO field.
Let us give a short review of available approaches for the solution of SMOCO problems. Concerning the multi-objective part, we focus on the solution concept of determining Pareto-optimal (i.e., efficient) solutions, which appears to be best suited for the purposes of a decision maker who does not want (or is not able) to weigh the different objectives against each other a priori. Caballero et al. (2004) deal with stochastic multi-objective continuous optimization problems, which differ from the stochastic multi-objective combinatorial optimization problems investigated in this paper. Nevertheless, their work is also interesting in our context since they outline the two basic alternative pathways along which a stochastic multi-objective problem can be attacked: Either it is reduced, in a first step, to a deterministic counterpart problem, which is then still multi-objective and can be solved by suitable techniques of multi-objective optimization, or it is first reduced to a single-objective problem, which is then still stochastic and can be solved by techniques of stochastic optimization. In the present work, we choose the first alternative—called the multi-objective approach in Caballero et al. (2004)—which seems to be the more common approach in the literature, but let us mention that also the second alternative—called the stochastic approach in Caballero et al. (2004)—has been applied in the SMOCO area, for example, by adopting the quality indicator concept for scalarizing the given multi-objective problem (see Basseur and Zitzler, 2006; Liefooghe et al., 2007).
The most frequent approach to the solution of a SMOCO problem consists of using a fixed sample of random scenarios1 in order to reduce the stochastic (multi-objective) problem to a deterministic (multi-objective) one, and then to apply some exact or metaheuristic technique to obtain the set of Pareto-optimal solutions w.r.t. the average objective function values over the sample. If an exact technique is chosen, the authors typically apply the so-called ε-constraint method for further reducing the multi-objective problem version to a series of single-objective problems and solve the resulting single-objective combinatorial optimization problems by mathematical programming (MP) methods. (Examples for this solution strategy can be found in Guillén et al., 2005; Cardona-Valdés et al., 2011; Franca et al., 2010.) Alternatively, it is also possible to apply a metaheuristic to deal with the multi-objective problem. For an example, see Claro et al. (2010a, 2010b), where a multi-objective heuristic hybridizing Tabu search and variable neighborhood search is used for this purpose.
Fixed-sample approaches have the limitation that the probability model specified by the drawn sample scenarios is only an approximation of the original probability model given by the problem formulation. As a consequence, finally, a changed problem is solved. The changed problem may considerably differ from the original one especially in a situation where the high computational effort for solving the resulting MOCO problem only allows for the consideration of a relatively small sample of scenarios, which bears the risk that practically relevant scenarios are not represented among the chosen scenarios. In the case where the probability model includes events that are rare but connected with high losses, this may be particularly detrimental. In the area of single-objective sampling-based stochastic optimization, variable-sample methods have been used as a possible alternative to fixed-sample methods. Contrary to techniques working with a fixed randomly selected sample, the application of a variable-sample method can be shown to converge, under suitable conditions, to the solution of the original stochastic optimization problem (see, e.g., Homem-de-Mello, 2003).
In the SMOCO field, some papers use variable-sample approaches. Typically, variable-sample techniques for SMOCO rely on modifications of evolutionary algorithms, especially multi-objective genetic algorithms such as the NSGA-II algorithm by Deb et al. (2002). Whereas the first articles of this kind (e.g., Hughes, 2001; Teich, 2001; Gutjahr, 2005) pursue the variable-sample idea in a rather empirical way, it has been shown (Gutjahr, 2009; Gutjahr and Reiter, 2010) that by a suitable iterative procedure called adaptive Pareto sampling (APS) that exchanges random samples from iteration to iteration, convergence of the currently proposed solution set to the exact set of Pareto-optimal solutions with probability one can be ensured under mild conditions. However, the mentioned articles do not provide analytical results on the speed of convergence. This topic is the subject of the present article.
We study an APS version where the simple evolutionary multi-objective optimizer (SEMO) by Laumanns et al. (2002b) is used for the solution of the occurring deterministic multi-objective subproblems. As a performance measure, we use the expected runtime until the exact Pareto front of the SMOCO problem has been found. It will turn out that for obtaining an efficient algorithmic variant, SEMO has to be modified by including the concept of ε-dominance, using a sufficiently small value of ε. Our theorems show how available runtime results for SEMO on MOCO problems can be translated to runtime results for our APS version on corresponding SMOCO problems. For a recent survey on theoretical analysis of multi-objective evolutionary algorithms in general and SEMO in particular, the reader is referred to Brockhoff (2011).
The plan of this article is as follows: Section 2 presents the investigated APS algorithm. Section 3 introduces the analyzed class of SMOCO problems and explains the sampling technique. In Section 4, the way that SEMO is modified by an ε-dominance concept and how it is incorporated into APS is described. Section 5 derives the general results in expected runtime and provides examples. Finally, Section 6 gives concluding remarks.
2. Adaptive Pareto Sampling
We make use of the following customary definitions: For zi=Fi(x) and , dominates x if for i=1, 2, and for at least one i. In this case, we write . Furthermore, x is nondominated by some if there is no such that dominates x, and x is efficient (i.e., Pareto-optimal) if it is nondominated by X. For abbreviation, we shall write F(x) for the image point in objective space to which solution x is mapped by the objective functions. The set of all efficient solutions is the efficient set; its image in objective space is the Pareto front .
The determination of the efficient set S(k) in part (a) of APS can either be performed by a (multi-objective) exact algorithm, for example, the adaptive -constraint algorithm by Laumanns et al. (2006), or alternatively by a (multi-objective) metaheuristic.
To give an example of an application, let us consider a location problem where out of a set of n possible locations (nodes) for public facilities, a subset of nodes has to be selected. In each selected node j, a capacitated facility is to be built (opened) at a random cost Cj with known distribution function . The facilities provide service for r population nodes. The demand in population node is a random variable with a known distribution function . Suppose that costs and demands in different nodes are independent. From independent random numbers uniformly distributed on [0,1], realizations of Cj and can be computed by and , respectively. We can consider as a particular scenario and write and . The decision is given by the vector where the variable xj takes the value 1 if a facility in node j is opened and 0 otherwise. As the objective functions, let us consider the expectations F1(x) and F2(x) of the negative total cost and of a public benefit function , respectively, where G builds on the distances of customers to opened facilities that can serve them under the given capacity restrictions. If a procedure computing the function G and a random number generator is available, APS can be applied to solve this problem, based on the BSAA estimates in Equation (2). Note that although in this example, the expectation F1(x) could also be computed analytically, that is, without sampling, this may be very difficult or impossible for the expectation F2(x) since G is typically a complex nonlinear function.
3. Analyzed Problems and Scenario Sampling
3.1. Considered Problem Class
3.2. Sampling Procedure
4. Incorporating SEMO into APS
4.1. Basic Version
As mentioned in Section 2, any exact or heuristic algorithm for multi-objective combinatorial optimization problems can be plugged into an APS for the solution of the BSAA subproblem. In the following, we shall discuss the use of the well-known SEMO algorithm (Laumanns et al., 2002b) for this purpose. The pseudocode of SEMO is presented in Algorithm 2. Therein, f(x)=(f1(x), f2(x)) is the vector of objective function values of solution , and is the dominance relation in objective space defined in Section 2. SEMO uses an archive which is initialized by a random solution and successively updated by the addition of nondominated mutants and the removal of elements dominated by mutants. In the pseudocode, the termination criterion is left unspecified; later, we shall use a maximum number of iterations as a termination criterion. When SEMO is terminated, the current is taken as the proposed approximation to the efficient set.
Although the algorithm APS/SEMO obtained by applying SEMO as a subprocedure of APS works, its efficiency is usually rather poor. The reasons are explained by the following example.
An application of APS/SEMO to SLOTZ is inefficient for the following two reasons. In the case where the sample error vectors are very small,
different solutions x and with the same image in objective space (e.g., 10110100 and 10100100) usually have different sample average evaluations, which lets the archive grow too fast, and
it may easily happen that although x is dominated by (but the points in objective space are identical in one component), the sample average evaluations of x and are incomparable, which misleads the search by SEMO.
In order to modify APS/SEMO to an algorithm overcoming these difficulties, we adopt the idea of ε-dominance, which has also turned out to be useful in deterministic multi-objective optimization.
4.2. A Variant Based on ε-Dominance
The key idea is to replace—during the execution of the algorithm—the standard dominance relation by an ε-tolerant version , but to do this in such a way that for small ε, the efficient set is nevertheless determined based on the original relation . The concept of ε-dominance has frequently been applied in the multi-objective optimization literature (e.g., see Laumanns et al., 2002a; Grosan, 2004; Schuetze et al., 2007; Horoba and Neumann, 2008). In this article, the following definition will be used (we give reasons below):
Let . For two points z and in the objective space , we say that
weakly ε-dominates z, written as , if and only if (i=1, 2).
is ε-equal to z, written as , if and only if (i=1, 2).
ε-dominates z, written as , if and only if and not . In other words, ε-dominates z exactly if (i=1, 2) and.
we denote the minimal nonzero distance between two points in in one of the two coordinates. By the finiteness of , we have .
The modified overall algorithm, which we shall call APS/ε-SEMO, differs from APS/SEMO by the following simple changes: Each comparison of objective function values via is replaced by a comparison via , and each comparison of objective function values via = is replaced by a comparison via . More precisely, the following two modifications have to be performed: First, in the solution proposal part of APS, SEMO is replaced by its ε-tolerant version, the algorithm ε-SEMO shown in Algorithm 3. Secondly, also in the solution evaluation part of APS, the trimming, that is, the statement “obtain L(k) as the set of nondominated solutions in according to the objective function estimates just determined,” is now implemented in an ε-tolerant way by the procedure ε-TRIM described in Algorithm 4. Therein, for abbreviation, is denoted by X+, and L(k) is denoted by X-. In both procedures, for F(x), the sample average approximation of F(x) has to be inserted when they are used within the APS framework.
Returning to Example 1, we observe that the disadvantages (a) and (b) of APS/SEMO mentioned there no longer occur with APS/ε-SEMO once the noise becomes small enough. First, for small noise, sample average evaluations of two different solutions x and with are considered as ε-equal, with the consequence that only one of the two solutions is kept in the archive. Secondly, if x dominates but F(x) and are equal in one component, then for small noise (and small ε), the sample average evaluation of x still ε-dominates that of (cf. Figure 1).
5. Runtime Analysis
In this section, we turn to the analysis of the expected runtime that is needed until APS/ε-SEMO proposes a solution set whose image (in objective space) is the entire Pareto front of problem in Equation (1), that is, until equals . This runtime is called the optimization time. The expected optimization time according to the definition above has been used in several investigations of MOCO algorithms (see, e.g., Neumann and Wegener, 2006). Note that for the stopping criterion it is not required to find all Pareto-optimal solutions; it is sufficient to find, for each z on the Pareto front, at least one solution x with F(x)=z. When this has been achieved, we say in short that the Pareto front has been found. This does not mean that the current (disturbed) objective function value estimates F(x) coincide with the points on the Pareto front, but only that the Pareto front is covered with respect to the undisturbed evaluations F(x) of the proposed solutions.
In the theory of evolutionary algorithms, it is customary to use the number of fitness evaluations as a measure of the runtime required by an algorithm. Essentially, we follow this convention, but in our context where fitness approximations are based on random samples, it is necessary to extend the definition in order to obtain a meaningful measure, since an evaluation based on a sample of size, say, one million, will certainly require much more computation time than an evaluation based on a sample of size one. For this reason, the following runtime unit definition will be applied. A runtime measure is an approximate objective function evaluation in Equation (2) based on sample size s that is assumed to require s time units.
Essentially, Theorem 1 below (which is proven with the help of three lemmas) states that by passing from the deterministic to the noisy version of the problem, the optimization time is only increased by a certain factor; this factor is polynomially bounded in important special cases. Before precisely formulating the result and giving a rigorous proof, let us shortly outline the key proof ideas.
First, it is shown (Lemma 1) that for ε small enough, function evaluations disturbed by less than will induce an ε-dominance relation between image points that is equivalent to the original dominance relation between the image points for undisturbed function evaluations, and analogously for the ε-equality relation. Thus, if the function evaluation vectors obtained from the sample averages fall into sufficiently small squares around the exact objective function vectors, the ε-dominance relation on the noisy problem behaves precisely like the ordinary dominance relation on the deterministic problem.
Secondly, the following consequence of the increasing sample size used in APS/ε-SEMO is shown (Lemma 2 and Corollaries 2–4): The probability that in a given iteration, all function vector estimates fall into squares of a given size around the exact objective function vectors, can be bounded from below by certain explicit expressions (tending to one with increasing iteration index). Note that increasing the sample size in a sample average estimate decreases the probability of large deviations from the true value.
Furthermore, it is also necessary to show (Lemma 3) that in the solution evaluation part of APS/ε-SEMO, previously found efficient solutions are confirmed, provided that the disturbances by the sample average estimates are small enough.
In the proof of Theorem 1, finally, the obtained expressions are used to bound the probability of the sample success event, that is, the event that in a certain iteration, the noisy evaluations do not lead to a wrong judgement about mutual dominance relations between the currently considered solutions. The desired bound on the expected runtime is then obtained by adding (i) the time required to reach an iteration such that in all subsequent iterations, the sample success event has become likely enough, and (ii) the expected optimization time after this iteration.
In the following, denotes the sup-norm, and as usual, the image set is denoted by F(x).
Let and with and , where with according to Definition 2. Then
(a) We start by showing the second equivalence. If , then
Then ε-SEMO, applied to the problem with objective function vector , produces the same sequence of mutated solutions and the same sequence of current solution sets as SEMO, applied to the problem with objective function vector F, provided that the same (pseudo-)random numbers for the choice of the initial solutions and the bit mutations are used. In particular, the runtime of ε-SEMO is then just k times the runtime of SEMO.
As seen by comparing ε-SEMO with SEMO, it suffices to show that for two solutions x(1) and x(2),
Lemma 2 below is a technical auxiliary result, providing a bound for the probability that sample averages of Gaussian random variables do not deviate too much from their means. We shall apply it later (in Corollaries 2 to 4) to the sample average estimates used in APS.
Let us now apply Lemma 2 to the solution proposal part of APS by choosing the variables as the noise terms , such that becomes equal to the error term of the sample average estimate as given by Equation (9).
In the Appendix.
In the Appendix.
In the Appendix.
If holds for the efficient set , and if for the error terms of the sample average estimates in the solution evaluation part of iteration k of APS, the inequalities with hold, then the application of ε-TRIM(X+) in this iteration produces an X- with .
We have to show that after the execution of ε-TRIM, (i) for every , an element x with F(x)=z is contained in X-, and (ii) no element is contained in X-. The proof uses Lemma 1 in an analogous way as it is used in the proof of Corollary 1.
To verify (i), let . Because of , there is an element with F(x)=z, so and there is an iteration of ε-TRIM in which x is the current element of X+ to be checked for possible insertion. Because of Lemma 1, for some other solution y would entail in contradiction to the efficiency of x. Again by Lemma 1, entails F(y)=F(x). Therefore, if there is an element y in the current set X- with , then F(y)=F(x). As a consequence, either x is added to X- in this iteration, or X+ already contains a y with F(y)=F(x)=z. In both cases, F(X-) contains z after the current insertion trial. It remains to be shown that the element of X- with image z (let us call it x for simplicity) will not be removed from X- in any later iteration. A removal of x could only take place if another element with were added. However, by Lemma 1, this would entail and thus contradict the efficiency of x.
To verify (ii), let . Then there is a such that . Because of , there is an iteration where an element x with F(x)=z is the current element to be checked for possible insertion. If in the beginning of this iteration y is contained in X-, it will be removed from X-, because by Lemma 1, and x will be inserted instead. In a later iteration, y can no longer be added to X-, because x or an element with is already in X-, which prevents the insertion of y.
Now we are in the position to prove our main results, as shown in Theorems 1 and 2 below. For a complete specification of APS/ε-SEMO, we still have to define how many iterations of ε-SEMO are to be performed in each iteration of APS. In agreement with our general assumption that we can base our analysis on the corresponding results for SEMO on the deterministic (i.e., noise-free) limiting case of the problem under consideration, we make the mentioned decision dependent on an upper bound hn for the expected number of function evaluations required by SEMO to find the Pareto front of a noise-free problem instance of size n: In each iteration of APS, exactly 2hn iterations of ε-SEMO are executed.
We set in Corollaries 2–4, such that for the error terms , the condition with required in Corollary 1 as well as in Lemma 3 is satisfied. Furthermore, in Corollary 4, we choose the constant C as . By this special choice, Corollary 4 yields
By Lemma 3, in the sample success event, the solution evaluation part of APS, applied to the disturbed problem according to sample 2, confirms the with image if the last has been found by ε-SEMO in the solution proposal part.
In total, we find that in each iteration , with a probability larger or equal to , the solution set L(k) proposed by APS at the end of this iteration has image . Note that the random numbers used for obtaining the two samples and those used for the mutations during the run of ε-SEMO are independent, and that in the sample success event, the error terms no longer matter. Moreover, for future use, it should be noted that the lower overall success probability bound 1/4 for an iteration holds independently from the events in previous iterations.
We now compute the runtime required by iteration k of APS/ε-SEMO. In this iteration, ε-SEMO performs 2hn iterations, that is, at most 2hn new solutions are evaluated in addition to the initial solution. Because of sk=k, this yields a runtime smaller or equal to (2hn+1) k for the solution proposal part. The set of solutions to be evaluated in the solution evaluation part contains at most elements, because in each APS iteration, at most 2hn+1 new solutions can be proposed. Considering , this gives an upper runtime bound of (2hn+1)mk2 for the solution evaluation part in iteration k. Therefore, the cost of the entire iteration is bounded from above by with R=(2hn+1)(m+1).
We immediately obtain from Theorem 1:
If and , the expected optimization time of APS/ε-SEMO grows at most by a factor of order O(n3) faster in n than that of SEMO applied to the noise-free problem.
(Continuation of Example 1 from Section 4.1.) For the SLOTZ problem introduced in Section 4.1, we have . A well-known result by Laumanns et al. (2002b) states that the expected optimization time required by SEMO to find the efficient set for LOTZ is on the order of , with an upper bound of hn=(1/2)n3−(1/2)n2. By applying our Theorem 1, we derive from this result that the expected optimization time for finding the efficient set for SLOTZ is at most on the order of O(n6).
The article Neumann and Wegener (2006) analyzes the expected optimization time of SEMO’ (cf. Remark 2) for a bi-objective minimum spanning tree (MST) problem. In an edge-weighted graph with r nodes and n edges, to each subset x (representable as a binary string) of the set of edges, two objective functions are assigned: The first objective c(x) is the number of connected components of the subgraph defined by x, the second objective w(x) is the total weight of the edges in x. Both objectives are to be minimized. Weights are assumed to be integers with a maximum value of wmax. Neumann and Wegener show that the expected optimization time of SEMO’ is on the order of O(nr(log r+log wmax)+nr2).
As an alternative, let us also consider a second noise model where the noise results from imprecise measurement of the weights of the edges on the assumption that an estimate of w(x) is obtained by summing up the individual weight estimates of the edges contained in x. Suppose that the individual noise terms for the measurement of the edges are independent and normally distributed with a variance smaller than or equal to some constant independent of n, and that the number c(x) of components can be determined without noise. Then , and again on the assumption that and , Theorem 1 yields an expected optimization time on the order of .
Corollary 5 shows that in many cases, the transition from a MOCO problem to a SMOCO problem only generates a polynomial-time overhead factor. Although this is a helpful insight, the bounds presented in Corollary 5 and the SLOTZ example are still rather weak. We conjecture that stronger bounds hold, and below we shall outline a situation where stronger bounds can indeed be rigorously derived. Before turning to this issue, a generalization of Theorem 1 will be demonstrated. We start with a definition.
The assertion of Theorem 1 remains valid if the -distributed noise is replaced by independent noise following some other distribution (which is allowed to depend on i and x), provided that for all and i=1, 2,
the density of is continuous, symmetric (i.e., the density in −z is equal to the density in z) and nonincreasing for positive arguments z, and
is more peaked about zero than a random variable .
Theorem 1 in Birnbaum (1984) states that under the indicated conditions, the corresponding sample averages inherit the more peaked relation, that is, is more peaked about zero than with . A reinspection of the proof of our Theorem 1 above and the preceding lemmas (especially Lemma 2 and Corollaries 2–4) shows that this property is all that is needed to make the used upper bound estimations still valid for the case where the are replaced by the .
Corollary 6 above allows for the case of bounded noise to be included, for example, noise following a triangular distribution or a truncated normal distribution.7 Our next result shows that in the case of bounded noise, a considerably stronger runtime bound than that of Theorem 1 can be derived.
Let the conditions of Corollary 6 be satisfied, and let for all n,
First of all, we compute the maximal length of an ε-antichain in a square with side length . An ε-antichain is a set of points with the property that two different points are ε-incomparable, that is, that neither nor nor . In particular, this means that two elements z and in an ε-antichain differ by at least ε in each of their two coordinates. By projection to one of the two coordinate axes, one immediately sees that in the considered square, not more than mutually ε-incomparable elements can exist. We choose as in Theorem 1. Then we have that an ε-antichain can contain not more than elements.
Remark 3: Obviously, the O(n2) bound for the ratio of expected optimization times between stochastic and deterministic problem in the bounded-noise case can be generalized from SLOTZ to a broader class of problems, analogous to the case in Corollary 5. However, it is not quite general, as the following example shows.
(Continuation of Example 3 from Section 4.1.) For the stochastic bi-objective MST problem of Example 3, let us perform a similar replacement of the Gaussian noise by noise with a distribution of more peaked about zero than as in Example 4 above. Let all other assumptions be the same as before. For , Theorem 2 yields in this case an exponential bound. However, if with some , Theorem 2 (with an=0, bn=wmax, and cn=c) gives a bound for the expected optimization time on the order of .
Remark 4: Also for a normal distribution, in the case of sufficiently large cn>0, absolute noise values larger than cn are very unlikely. Therefore, it can be conjectured that for Gaussian noise, the expected optimization time is bounded by an order that conforms to the often stronger bound of Theorem 2 instead of only to the bound of Theorem 1. We leave a rigorous proof of this conjecture as an open question. Remark 5: We do not claim that the sample size increment scheme and is optimal for the expected optimization time of APS/ε-SEMO. Concerning the upper bound for the expected optimization time, it can even be shown that it becomes better the faster the sample size is increased: Consider a scheme where with some . For the sake of simplicity, let us restrict the discussion to the SLOTZ special case. Then by analogous derivations to those in Theorems 1 and 2, we find an expected optimization time bound on the order of O(n4+2/p) and of O(n4+1/p) in the case of Gaussian and of bounded noise, respectively, which decreases for increasing p. Nevertheless, it should be kept in mind that these are only upper bounds. It is easily possible that the actual expected optimization time of the linear scheme is better than , since in our proofs, we used rather coarse bound estimations, assuming that the efficient set has to be identified as a whole in one of the APS iterations, whereas APS can also find the elements of the efficient set one after another or subset by subset, which considerably speeds up the process. This gradual identification of the efficient set, however, is prevented if the sample size is increased too fast.9
In this paper, an algorithm for stochastic multi-objective combinatorial optimization (SMOCO) has been analyzed. The algorithm is obtained by inserting the simple evolutionary multi-objective optimizer (SEMO) as a subprocedure into adaptive Pareto sampling (APS). Normally distributed noise as well as more general noise distributions have been investigated. It has been shown that if SEMO applies a specific ε-dominance relation instead of the ordinary dominance relation, the expected optimization time of the overall algorithm can be bounded from above, provided that ε is chosen small enough. The bound depends on the expected optimization time of SEMO on the corresponding deterministic counterpart problem. An explicit expression for this dependence has been given. In the special case where the variance of the noise is bounded from above by a constant independent of the problem size n, and ε is bounded from below by a constant independent of n, the ratio between the expected optimization times for the stochastic and the corresponding deterministic problem, respectively, is on the order of O(n3). For bounded noise, the ratio reduces to O(n2).
Since the underlying deterministic single- or multi-objective optimization problems of practically relevant SMOCO problems are often NP-hard, such that the optimization time for them must be anticipated to grow exponentially fast in the instance size, the results show that in typical cases, stochastic MOCO problems are not essentially harder than their deterministic counterparts.
Future research should address several topics. A question of practical relevance is whether the results presented here can also be transferred from the use of SEMO to the use of a more elaborate multi-objective metaheuristic such as NSGA-II or SPEA2. This question is especially important in view of well-known drawbacks of SEMO, for example, the fact that SEMO can end up with prohibitively large populations. Furthermore, a generalization of the results presented here to the case of more than two objective functions would be of interest.
Expected optimization time is not the only relevant criterion for the performance of an evolutionary optimization algorithm; often, the immediate goal is not to find the exact solution (in the multi-objective case: the efficient set), but rather to identify a good approximate solution within a short time. For investigating this question in the multi-objective situation, the behavior of the hypervolume or of another quality indicator during the run of a combined algorithm of the type considered here could be studied.
The bounds derived here are rather coarse, and it is possible that tighter bounds may be found if the way that APS gradually adapts the currently proposed solution set is taken into more detailed consideration. Moreover, in the present work, we focused on upper bounds for the expected optimization time. A most interesting question would be how good lower bounds look like and whether there are cases where it can be shown that their order matches that of the found upper bounds. Based on these results, the question of how the sample size scheme in APS can be organized in order to obtain the fastest possible convergence to the Pareto front could also be addressed.
The author thanks Lothar Thiele for a discussion on ε-dominance concepts during the Dagstuhl Seminar 2010 on theory of evolutionary algorithms, and three anonymous reviewers for their very helpful comments on the first version of this article.
From Equation (17), we see that for , the lower bound in Equation (16) is nonnegative. Considering Equation (9), Equation (6), and the independence of the random variables for different , and observing that the power function up (peven) is nondecreasing as long as , we obtain from Lemma 2:
If the size of a sample of i.i.d. random noise variables is increased, the variance of the sample average decreases. As a consequence, the probability that the error terms of the sample average estimates over a sample of size with m>1 all lie in Qr(0) is larger or equal to the corresponding probability for the case m=1. Therefore, the choice m=1 is conservative w.r.t. the lower bound on . For m=1, the only difference from Corollary 2 is that now, not only in 2n, but in 2n+2n=2n+1 random experiments, the error terms have to be sufficiently small. This immediately yields the indicated formula.
By Corollary 3,
In the stochastic programming literature, a scenario is a specific outcome of a random experiment (see Birge and Louveaux, 1997, p. 50).
Presumably, our runtime results in Section 5 can also be extended to more than two objectives by a rather straightforward generalization of the proofs, but this topic is beyond the scope of the present article.
Formally, for each x and i, the function is a random variable on a probability space , and is an element of the sample space . Each determines a specific realization of the random variables , that is, a scenario; hence the notation . As is customary in the stochastic programming literature, we shall identify with the scenario determined by it (cf. Birge and Louveaux, 1997).
Of course, dependence could make a difference in the way the sampling is done in the solution proposal part of APS. However, in order to stay within the premises of the following analysis, we may always replace dependent sampling by independent sampling, drawing a specific, independent scenario for each x and each i, as is done in the solution evaluation part. This may deteriorate the performance of the algorithm, but as we are interested in upper runtime bounds, this is not a problem.
In particular, the described procedure implies that if the solution algorithm for the deterministic subproblem examines one and the same solution x more than once, the same noise term realizations should always be used. For this purpose, the algorithm has to store the sample average estimates for the already examined solutions in a look-up table. Such an implementation may be less efficient than one where, if a solution is examined a second time, the noise is evaluated by a new independent simulation. For large n, however, the case where an evolutionary search algorithm returns in a later iteration to a solution that has already been examined before is comparably rare, such that it can be expected that in terms of the quality evolution of the successively examined solutions, the second implementation variant will behave very similarly to the variant investigated in this article.
Observe that contrary to , the relation of Definition 1 does not coincide with ordinary dominance even for small ε because z does not dominate itself.
Strictly speaking, in the last case, the truncation must be defined in such a way that the density function remains continuous, which can of course approximate a noncontinuously truncated normal distribution to any desired degree of accuracy.
The special case of large c and a continuously truncated normal distribution produces a very close approximation to the Gaussian model.
In the case where instead of SEMO, a solver based on mathematical programming is used for the solution of the deterministic subproblem, as has been done in Gutjahr (2009), a rapid increase of the sample size is especially detrimental, since the runtime of such a solver typically increases exponentially fast in the sample size, such that high sample sizes have to be avoided at any cost.