## Abstract

For stochastic multi-objective combinatorial optimization (SMOCO) problems, the adaptive Pareto sampling (APS) framework has been proposed, which is based on sampling and on the solution of deterministic multi-objective subproblems. We show that when plugging in the well-known simple evolutionary multi-objective optimizer (SEMO) as a subprocedure into APS, ε-dominance has to be used to achieve fast convergence to the Pareto front. Two general theorems are presented indicating how runtime complexity results for APS can be derived from corresponding results for SEMO. This may be a starting point for the runtime analysis of evolutionary SMOCO algorithms.

## 1. Introduction

In recent years, both the fields of multi-objective combinatorial optimization (MOCO) and of stochastic combinatorial optimization (SCO) have made considerable progress, and numerous articles have appeared in these areas. Authors in the MOCO field argue that most real-world problems are characterized by multiple objectives, and authors from SCO emphasize that in real-world situations, decision makers typically face a smaller or larger degree of uncertainty on parameters on which the decisions have to be based. If we join these arguments, we arrive at the conjecture that in a very broad range of applications of decision technology, realistic quantitative models will be characterized by the presence of *both* multiple objectives *and* uncertainty (the latter represented by stochastic models). This makes it desirable that, in particular, the toolkit for combinatorial optimization is extended to methods solving stochastic multi-objective combinatorial optimization (SMOCO) problems. Indeed, problems of this type have been described in diverse areas such as financial engineering (Spronk et al., 2005), production planning (Hnainen et al., 2010), transportation logistics (Hassan-Pour et al., 2009), supply chain management (Amodeo et al., 2009), health care management (Baesler and Sepúlveda, 2001), and project management (Gutjahr and Reiter, 2010).

A general methodology for tackling SMOCO problems has begun to develop only recently. Both in the MOCO and in the SCO field, evolutionary algorithms (EAs) play an important role, as seen, for example, from the surveys for MOCO (Coello Coello, 2006) and for SCO (Jin and Branke, 2005; Bianchi et al., 2008). This suggests that EAs could also play an important role in the SMOCO field.

Let us give a short review of available approaches for the solution of SMOCO problems. Concerning the multi-objective part, we focus on the solution concept of determining Pareto-optimal (i.e., efficient) solutions, which appears to be best suited for the purposes of a decision maker who does not want (or is not able) to weigh the different objectives against each other a priori. Caballero et al. (2004) deal with stochastic multi-objective continuous optimization problems, which differ from the stochastic multi-objective combinatorial optimization problems investigated in this paper. Nevertheless, their work is also interesting in our context since they outline the two basic alternative pathways along which a stochastic multi-objective problem can be attacked: Either it is reduced, in a first step, to a deterministic counterpart problem, which is then still multi-objective and can be solved by suitable techniques of multi-objective optimization, or it is first reduced to a single-objective problem, which is then still stochastic and can be solved by techniques of stochastic optimization. In the present work, we choose the first alternative—called the multi-objective approach in Caballero et al. (2004)—which seems to be the more common approach in the literature, but let us mention that also the second alternative—called the stochastic approach in Caballero et al. (2004)—has been applied in the SMOCO area, for example, by adopting the quality indicator concept for scalarizing the given multi-objective problem (see Basseur and Zitzler, 2006; Liefooghe et al., 2007).

The most frequent approach to the solution of a SMOCO problem consists of using a fixed sample of random scenarios^{1} in order to reduce the stochastic (multi-objective) problem to a deterministic (multi-objective) one, and then to apply some exact or metaheuristic technique to obtain the set of Pareto-optimal solutions w.r.t. the average objective function values over the sample. If an exact technique is chosen, the authors typically apply the so-called ε-constraint method for further reducing the multi-objective problem version to a series of single-objective problems and solve the resulting single-objective combinatorial optimization problems by mathematical programming (MP) methods. (Examples for this solution strategy can be found in Guillén et al., 2005; Cardona-Valdés et al., 2011; Franca et al., 2010.) Alternatively, it is also possible to apply a metaheuristic to deal with the multi-objective problem. For an example, see Claro et al. (2010a, 2010b), where a multi-objective heuristic hybridizing Tabu search and variable neighborhood search is used for this purpose.

Fixed-sample approaches have the limitation that the probability model specified by the drawn sample scenarios is only an approximation of the original probability model given by the problem formulation. As a consequence, finally, a changed problem is solved. The changed problem may considerably differ from the original one especially in a situation where the high computational effort for solving the resulting MOCO problem only allows for the consideration of a relatively small sample of scenarios, which bears the risk that practically relevant scenarios are not represented among the chosen scenarios. In the case where the probability model includes events that are rare but connected with high losses, this may be particularly detrimental. In the area of single-objective sampling-based stochastic optimization, variable-sample methods have been used as a possible alternative to fixed-sample methods. Contrary to techniques working with a fixed randomly selected sample, the application of a variable-sample method can be shown to converge, under suitable conditions, to the solution of the original stochastic optimization problem (see, e.g., Homem-de-Mello, 2003).

In the SMOCO field, some papers use variable-sample approaches. Typically, variable-sample techniques for SMOCO rely on modifications of evolutionary algorithms, especially multi-objective genetic algorithms such as the NSGA-II algorithm by Deb et al. (2002). Whereas the first articles of this kind (e.g., Hughes, 2001; Teich, 2001; Gutjahr, 2005) pursue the variable-sample idea in a rather empirical way, it has been shown (Gutjahr, 2009; Gutjahr and Reiter, 2010) that by a suitable iterative procedure called adaptive Pareto sampling (APS) that exchanges random samples from iteration to iteration, convergence of the currently proposed solution set to the exact set of Pareto-optimal solutions with probability one can be ensured under mild conditions. However, the mentioned articles do not provide analytical results on the speed of convergence. This topic is the subject of the present article.

We study an APS version where the simple evolutionary multi-objective optimizer (SEMO) by Laumanns et al. (2002b) is used for the solution of the occurring deterministic multi-objective subproblems. As a performance measure, we use the expected runtime until the exact Pareto front of the SMOCO problem has been found. It will turn out that for obtaining an efficient algorithmic variant, SEMO has to be modified by including the concept of ε-dominance, using a sufficiently small value of ε. Our theorems show how available runtime results for SEMO on MOCO problems can be translated to runtime results for our APS version on corresponding SMOCO problems. For a recent survey on theoretical analysis of multi-objective evolutionary algorithms in general and SEMO in particular, the reader is referred to Brockhoff (2011).

The plan of this article is as follows: Section 2 presents the investigated APS algorithm. Section 3 introduces the analyzed class of SMOCO problems and explains the sampling technique. In Section 4, the way that SEMO is modified by an ε-dominance concept and how it is incorporated into APS is described. Section 5 derives the general results in expected runtime and provides examples. Finally, Section 6 gives concluding remarks.

## 2. Adaptive Pareto Sampling

^{2}Let a stochastic bi-objective combinatorial optimization problem with (

*i*=1, 2) be given, where

*X*is a finite decision space, and denotes the influence of randomness.

^{3}Maximizing the vector of the two objective functions is understood in the sense of determining the Pareto front of the problem.

We make use of the following customary definitions: For *z _{i}*=

*F*(

_{i}*x*) and ,

*dominates x*if for

*i*=1, 2, and for at least one

*i*. In this case, we write . Furthermore,

*x*is nondominated by some if there is no such that dominates

*x*, and

*x*is efficient (i.e., Pareto-optimal) if it is nondominated by

*X*. For abbreviation, we shall write

*F*(

*x*) for the image point in objective space to which solution

*x*is mapped by the objective functions. The set of all efficient solutions is the efficient set; its image in objective space is the Pareto front .

*s*random scenarios independently from each other. Then, the sample average estimate of

*F*(

_{i}*x*) is given by Evidently, the sample average estimate is an unbiased estimator for

*F*(

_{i}*x*). An approximation to the solution of the given problem in Equation (1) can be computed by solving a related problem where the two expectations forming the objective functions are replaced by the corresponding sample average estimates under a sample of size

*s*. In this way, we obtain the following deterministic bi-objective problem: We call the problem in Equation (3) the bi-objective sample average approximation (BSAA) problem corresponding to the original problem in Equation (1). The vector will be written in abbreviated form as .

*L*

^{(k)}which is updated from iteration to iteration. In each of these iterations, first of all a corresponding deterministic BSAA problem is solved in order to obtain a proposal for the solution set. After that, the elements of the solution of the BSAA problem are merged with those contained in

*L*

^{(k - 1)}, the elements in the union of both sets are evaluated based on independent samples for each solution and each objective function, and dominated elements (w.r.t. the evaluation results) are eliminated. This yields the new solution set. The sample sizes are controlled by sequences (

*s*) and of positive integers ().

_{k}The determination of the efficient set *S*^{(k)} in part (a) of APS can either be performed by a (multi-objective) exact algorithm, for example, the adaptive -constraint algorithm by Laumanns et al. (2006), or alternatively by a (multi-objective) metaheuristic.

*s*for the solution proposal part is kept constant, but to ensure this, certain problem-specific conditions have to be verified. The problem-specific check can be dropped and replaced by a mild (random-search type) convergence property of the heuristic for the solution of the deterministic subproblem, if a scheme is used where

_{k}*s*is also increased (see Gutjahr and Reiter, 2010). In the following, we choose the scheme with an integer constant .

_{k}To give an example of an application, let us consider a location problem where out of a set of *n* possible locations (nodes) for public facilities, a subset of nodes has to be selected. In each selected node *j*, a capacitated facility is to be built (opened) at a random cost *C _{j}* with known distribution function . The facilities provide service for

*r*population nodes. The demand in population node is a random variable with a known distribution function . Suppose that costs and demands in different nodes are independent. From independent random numbers uniformly distributed on [0,1], realizations of

*C*and can be computed by and , respectively. We can consider as a particular scenario and write and . The decision is given by the vector where the variable

_{j}*x*takes the value 1 if a facility in node

_{j}*j*is opened and 0 otherwise. As the objective functions, let us consider the expectations

*F*

_{1}(

*x*) and

*F*

_{2}(

*x*) of the negative total cost and of a public benefit function , respectively, where

*G*builds on the distances of customers to opened facilities that can serve them under the given capacity restrictions. If a procedure computing the function

*G*and a random number generator is available, APS can be applied to solve this problem, based on the BSAA estimates in Equation (2). Note that although in this example, the expectation

*F*

_{1}(

*x*) could also be computed analytically, that is, without sampling, this may be very difficult or impossible for the expectation

*F*

_{2}(

*x*) since

*G*is typically a complex nonlinear function.

## 3. Analyzed Problems and Scenario Sampling

### 3.1. Considered Problem Class

*X*, the set of binary strings of length

*n*will be considered. (Of course, the assumption that further constraints are absent, which is also made in many other works on the analysis of evolutionary algorithms, is rather strong. It should be noted, however, that by the use of penalty functions, constrained problems can be included in the present framework as well.) By , we denote the efficient set. Concerning the influence of randomness, we assume that the noise terms are independent. We start with the assumption that the noise terms are normally distributed with fixed variance : We will then also proceed to situations where the noise terms follow other distributions. The assumptions above are less restrictive than they look: Using binary problem encodings is standard in applications of evolutionary algorithms, which justifies the special choice of

*X*. The assumption of independence of the noise terms is w.l.o.g., since dependent noise terms would leave the optimization problem in Equation (1) unchanged.

^{4}If the variances of the noise terms are not equal, we may replace them by their maximum value , which gives a conservative estimate as long as upper runtime bounds are investigated. The assumption that the noise distribution is Gaussian, finally, is suggested by the observation that in the application of APS to a problem of the above kind, the noise only enters through sample averages over i.i.d. random variables, and if the number of iterations is large, the sample size

*k*is large as well during later iterations, which, by the central limit theorem, shows that we do not obtain a completely different situation if we replace the given noise distribution by a normal distribution.

### 3.2. Sampling Procedure

*s*=

_{k}*k*independent scenarios where each scenario can also be represented by a matrix of noise terms assigned to the elements of . It is clear that in the computer implementation, not all line vectors of the matrix above have to be generated at once when the th sample is created: the only line vectors of interest are those that correspond to the single solutions

*x*; these are examined in the current iteration by the (possibly heuristic) solution algorithm for the deterministic subproblem. These line vectors are generated in order as they are needed. For example, if we would apply random search with 10 random solutions to get an approximate solution to the BSAA problem, we would choose 10 random elements

*x*, corresponding to 10 lines, and generate the noise terms only for these 10 lines within each of the

*k*matrices.

^{5}The scenarios in the solution evaluation part of APS can be represented by analogous matrices.

## 4. Incorporating SEMO into APS

### 4.1. Basic Version

As mentioned in Section 2, any exact or heuristic algorithm for multi-objective combinatorial optimization problems can be plugged into an APS for the solution of the BSAA subproblem. In the following, we shall discuss the use of the well-known SEMO algorithm (Laumanns et al., 2002b) for this purpose. The pseudocode of SEMO is presented in Algorithm 2. Therein, *f*(*x*)=(*f*_{1}(*x*), *f*_{2}(*x*)) is the vector of objective function values of solution , and is the dominance relation in objective space defined in Section 2. SEMO uses an archive which is initialized by a random solution and successively updated by the addition of nondominated mutants and the removal of elements dominated by mutants. In the pseudocode, the termination criterion is left unspecified; later, we shall use a maximum number of iterations as a termination criterion. When SEMO is terminated, the current is taken as the proposed approximation to the efficient set.

Although the algorithm APS/SEMO obtained by applying SEMO as a subprocedure of APS works, its efficiency is usually rather poor. The reasons are explained by the following example.

*x*. We extend LOTZ to the stochastic bi-objective problem SLOTZ by adding Gaussian noise: The objectives of SLOTZ are the expectations

*F*

_{1}(

*x*) and

*F*

_{2}(

*x*) of the random functions with independent noise variables and distributed according to Equation (6). Obviously,

*F*(

*x*)= LOTZ(

*x*). Applying a black box consideration, it is assumed that the solution algorithm does not know this but has to work based only on the observation of the noisy realization vectors .

An application of APS/SEMO to SLOTZ is inefficient for the following two reasons. In the case where the sample error vectors are very small,

different solutions

*x*and with the same image in objective space (e.g., 10110100 and 10100100) usually have different sample average evaluations, which lets the archive grow too fast, andit may easily happen that although

*x*is dominated by (but the points in objective space are identical in one component), the sample average evaluations of*x*and are incomparable, which misleads the search by SEMO.

In order to modify APS/SEMO to an algorithm overcoming these difficulties, we adopt the idea of ε-dominance, which has also turned out to be useful in deterministic multi-objective optimization.

### 4.2. A Variant Based on ε-Dominance

The key idea is to replace—during the execution of the algorithm—the standard dominance relation by an ε-tolerant version , but to do this in such a way that for small ε, the efficient set is nevertheless determined based on the original relation . The concept of ε-dominance has frequently been applied in the multi-objective optimization literature (e.g., see Laumanns et al., 2002a; Grosan, 2004; Schuetze et al., 2007; Horoba and Neumann, 2008). In this article, the following definition will be used (we give reasons below):

*Let . For two points z and in the objective space , we say that*

*weakly ε-dominates*(*z*, written as , if and only if*i*=1, 2).*is ε-equal to*(*z*, written as , if and only if*i*=1, 2).*ε-dominates*(*z*, written as , if and only if and not . In other words, ε-dominates*z*exactly if*i*=1, 2)*and*.

*x*and in the solution space, ε-dominance or ε-equality holds if it does so for their respective image points

*z*and in the objective space. For fixed , we define: Furthermore, we set Figure 1 illustrates the sets defined above.

*z*if and only if for all

*i*. In our context, it is more natural to choose the additive version because noise is also assumed to be additive. Except for a marginal modification, such an additive version has also been used in Schuetze et al. (2007). Note that the relation relaxes the ordinary definition of dominance. This is no longer true for , that is, in the case where is excluded. In this latter case, we obtain an indented dominance region which is similar (but not identical) to that of the ε-dominance relation used in Grosan (2004). At the end of this section, we see that the exemption of is useful in our context, because it removes one possible source for a too rapid growth of the archive in the case of noisy evaluations.

we denote the minimal nonzero distance between two points in in one of the two coordinates. By the finiteness of , we have .

*z*itself, has in at least one coordinate a strictly positive distance to

*z*that is smaller or equal to , and can therefore not belong to by definition of . An immediate consequence of Equation (12) is that for , also since .

Easy.

The modified overall algorithm, which we shall call APS/ε-SEMO, differs from APS/SEMO by the following simple changes: Each comparison of objective function values via is replaced by a comparison via , and each comparison of objective function values via = is replaced by a comparison via . More precisely, the following two modifications have to be performed: First, in the solution proposal part of APS, SEMO is replaced by its ε-tolerant version, the algorithm ε-SEMO shown in Algorithm 3. Secondly, also in the solution evaluation part of APS, the trimming, that is, the statement “obtain *L*^{(k)} as the set of nondominated solutions in according to the objective function estimates just determined,” is now implemented in an ε-tolerant way by the procedure ε-TRIM described in Algorithm 4. Therein, for abbreviation, is denoted by *X*^{+}, and *L*^{(k)} is denoted by *X*^{-}. In both procedures, for *F*(*x*), the sample average approximation of *F*(*x*) has to be inserted when they are used within the APS framework.

Returning to Example 1, we observe that the disadvantages (a) and (b) of APS/SEMO mentioned there no longer occur with APS/ε-SEMO once the noise becomes small enough. First, for small noise, sample average evaluations of two different solutions *x* and with are considered as ε-equal, with the consequence that only one of the two solutions is kept in the archive. Secondly, if *x* dominates but *F*(*x*) and are equal in one component, then for small noise (and small ε), the sample average evaluation of *x* still ε-dominates that of (cf. Figure 1).

## 5. Runtime Analysis

In this section, we turn to the analysis of the expected runtime that is needed until APS/ε-SEMO proposes a solution set whose image (in objective space) is the entire Pareto front of problem in Equation (1), that is, until equals . This runtime is called the optimization time. The expected optimization time according to the definition above has been used in several investigations of MOCO algorithms (see, e.g., Neumann and Wegener, 2006). Note that for the stopping criterion it is not required to find all Pareto-optimal solutions; it is sufficient to find, for each *z* on the Pareto front, at least one solution *x* with *F*(*x*)=*z*. When this has been achieved, we say in short that the Pareto front has been found. This does not mean that the current (disturbed) objective function value estimates *F*(*x*) coincide with the points on the Pareto front, but only that the Pareto front is covered with respect to the undisturbed evaluations *F*(*x*) of the proposed solutions.

In the theory of evolutionary algorithms, it is customary to use the number of fitness evaluations as a measure of the runtime required by an algorithm. Essentially, we follow this convention, but in our context where fitness approximations are based on random samples, it is necessary to extend the definition in order to obtain a meaningful measure, since an evaluation based on a sample of size, say, one million, will certainly require much more computation time than an evaluation based on a sample of size one. For this reason, the following runtime unit definition will be applied. A runtime measure is an approximate objective function evaluation in Equation (2) based on sample size *s* that is assumed to require *s* time units.

Essentially, Theorem 1 below (which is proven with the help of three lemmas) states that by passing from the deterministic to the noisy version of the problem, the optimization time is only increased by a certain factor; this factor is polynomially bounded in important special cases. Before precisely formulating the result and giving a rigorous proof, let us shortly outline the key proof ideas.

First, it is shown (Lemma 1) that for ε small enough, function evaluations disturbed by less than will induce an ε-dominance relation between image points that is equivalent to the original dominance relation between the image points for undisturbed function evaluations, and analogously for the ε-equality relation. Thus, if the function evaluation vectors obtained from the sample averages fall into sufficiently small squares around the exact objective function vectors, the ε-dominance relation on the noisy problem behaves precisely like the ordinary dominance relation on the deterministic problem.

Secondly, the following consequence of the increasing sample size used in APS/ε-SEMO is shown (Lemma 2 and Corollaries 2–4): The probability that in a given iteration, all function vector estimates fall into squares of a given size around the exact objective function vectors, can be bounded from below by certain explicit expressions (tending to one with increasing iteration index). Note that increasing the sample size in a sample average estimate decreases the probability of large deviations from the true value.

Furthermore, it is also necessary to show (Lemma 3) that in the solution evaluation part of APS/ε-SEMO, previously found efficient solutions are confirmed, provided that the disturbances by the sample average estimates are small enough.

In the proof of Theorem 1, finally, the obtained expressions are used to bound the probability of the sample success event, that is, the event that in a certain iteration, the noisy evaluations do not lead to a wrong judgement about mutual dominance relations between the currently considered solutions. The desired bound on the expected runtime is then obtained by adding (i) the time required to reach an iteration such that in all subsequent iterations, the sample success event has become likely enough, and (ii) the expected optimization time after this iteration.

In the following, denotes the sup-norm, and as usual, the image set is denoted by *F*(*x*).

*Let and with and , where with according to Definition 2. Then*

(a) We start by showing the second equivalence. If , then

*z*and are incomparable. In both cases, there is a coordinate

*i*such that . By definition of and because of , even must hold in this case. It follows that However, entails , which contradicts the just derived inequality.

Then ε-SEMO, applied to the problem with objective function vector , produces the same sequence of mutated solutions and the same sequence of current solution sets as SEMO, applied to the problem with objective function vector *F*, provided that the same (pseudo-)random numbers for the choice of the initial solutions and the bit mutations are used. In particular, the runtime of ε-SEMO is then just *k* times the runtime of SEMO.

As seen by comparing ε-SEMO with SEMO, it suffices to show that for two solutions *x*^{(1)} and *x*^{(2)},

*k*, the sample size

*k*is applied.

Lemma 2 below is a technical auxiliary result, providing a bound for the probability that sample averages of Gaussian random variables do not deviate too much from their means. We shall apply it later (in Corollaries 2 to 4) to the sample average estimates used in APS.

Let us now apply Lemma 2 to the solution proposal part of APS by choosing the variables as the noise terms , such that becomes equal to the error term of the sample average estimate as given by Equation (9).

In the Appendix.

In the Appendix.

In the Appendix.

If holds for the efficient set , and if for the error terms of the sample average estimates in the solution evaluation part of iteration *k* of APS, the inequalities with hold, then the application of ε-TRIM(*X*^{+}) in this iteration produces an *X*^{-} with .

We have to show that after the execution of ε-TRIM, (i) for every , an element *x* with *F*(*x*)=*z* is contained in *X*^{-}, and (ii) no element is contained in *X*^{-}. The proof uses Lemma 1 in an analogous way as it is used in the proof of Corollary 1.

To verify (i), let . Because of , there is an element with *F*(*x*)=*z*, so and there is an iteration of ε-TRIM in which *x* is the current element of *X*^{+} to be checked for possible insertion. Because of Lemma 1, for some other solution *y* would entail in contradiction to the efficiency of *x*. Again by Lemma 1, entails *F*(*y*)=*F*(*x*). Therefore, if there is an element *y* in the current set *X*^{-} with , then *F*(*y*)=*F*(*x*). As a consequence, either *x* is added to *X*^{-} in this iteration, or *X*^{+} already contains a *y* with *F*(*y*)=*F*(*x*)=*z*. In both cases, *F*(*X*^{-}) contains *z* after the current insertion trial. It remains to be shown that the element of *X*^{-} with image *z* (let us call it *x* for simplicity) will not be removed from *X*^{-} in any later iteration. A removal of *x* could only take place if another element with were added. However, by Lemma 1, this would entail and thus contradict the efficiency of *x*.

To verify (ii), let . Then there is a such that . Because of , there is an iteration where an element *x* with *F*(*x*)=*z* is the current element to be checked for possible insertion. If in the beginning of this iteration *y* is contained in *X*^{-}, it will be removed from *X*^{-}, because by Lemma 1, and *x* will be inserted instead. In a later iteration, *y* can no longer be added to *X*^{-}, because *x* or an element with is already in *X*^{-}, which prevents the insertion of *y*.

Now we are in the position to prove our main results, as shown in Theorems 1 and 2 below. For a complete specification of APS/ε-SEMO, we still have to define how many iterations of ε-SEMO are to be performed in each iteration of APS. In agreement with our general assumption that we can base our analysis on the corresponding results for SEMO on the deterministic (i.e., noise-free) limiting case of the problem under consideration, we make the mentioned decision dependent on an upper bound *h _{n}* for the expected number of function evaluations required by SEMO to find the Pareto front of a noise-free problem instance of size

*n*: In each iteration of APS, exactly 2

*h*iterations of ε-SEMO are executed.

_{n}*h*be an upper bound for the expected optimization time of SEMO on noise-free problems of instance size

_{n}*n*, let denote the minimum of the values (given by Definition 2) over all problem instances of size

*n*, let denote the maximum of the noise standard deviations over all problem instances of size

*n*, and let where . Then is an upper bound for the expected optimization time of APS/ε-SEMO in an implementation where each call of ε-SEMO performs 2

*h*iterations.

_{n}We set in Corollaries 2–4, such that for the error terms , the condition with required in Corollary 1 as well as in Lemma 3 is satisfied. Furthermore, in Corollary 4, we choose the constant *C* as . By this special choice, Corollary 4 yields

*k*

_{0}as defined in Equation (18). Hence, Equation (19) says that in each iteration of APS, with a probability of at least 1/2, samples are drawn for which the error terms both in the solution proposal part and in the solution evaluation part all lie within the square . We call the event that this happens the sample success event. Let us denote the ensemble of samples used for the solution proposal in a fixed iteration

*k*by sample 1, and the ensemble of samples used for the solution evaluation in this iteration by sample 2.

*h*. Denoting the runtime until has been found by

_{n}*T*, the well-known Markov inequality yields such that within 2

*h*iterations, the Pareto front will be found with probability of at least 1/2. Observe that as soon as SEMO has found for the first time in the sense that the current satisfies , it preserves this current solution during the remaining iterations until iteration 2

_{n}*h*, and again by Corollary 1, in the sample success event, the same holds for ε-SEMO.

_{n}By Lemma 3, in the sample success event, the solution evaluation part of APS, applied to the disturbed problem according to sample 2, confirms the with image if the last has been found by ε-SEMO in the solution proposal part.

In total, we find that in each iteration , with a probability larger or equal to , the solution set *L*^{(k)} proposed by APS at the end of this iteration has image . Note that the random numbers used for obtaining the two samples and those used for the mutations during the run of ε-SEMO are independent, and that in the sample success event, the error terms no longer matter. Moreover, for future use, it should be noted that the lower overall success probability bound 1/4 for an iteration holds independently from the events in previous iterations.

We now compute the runtime required by iteration *k* of APS/ε-SEMO. In this iteration, ε-SEMO performs 2*h _{n}* iterations, that is, at most 2

*h*new solutions are evaluated in addition to the initial solution. Because of

_{n}*s*=

_{k}*k*, this yields a runtime smaller or equal to (2

*h*+1)

_{n}*k*for the solution proposal part. The set of solutions to be evaluated in the solution evaluation part contains at most elements, because in each APS iteration, at most 2

*h*+1 new solutions can be proposed. Considering , this gives an upper runtime bound of (2

_{n}*h*+1)

_{n}*mk*

^{2}for the

*solution evaluation*part in iteration

*k*. Therefore, the cost of the entire iteration is bounded from above by with

*R*=(2

*h*+1)(

_{n}*m*+1).

*k*

_{0}−1 of APS is therefore bounded from above by Finally, we turn to the expected total runtime in iterations

*k*

_{0},

*k*

_{0}+1, and so on, until has been identified. We use the lower bound 1/4 on the total success probability in an iteration and assume pessimistically that the total success probability is equal to 1/4 instead of larger or equal. This gives the following upper bound for the expected optimization time after the start of iteration

*k*

_{0}: with . By using the well-known summation formulas for the series , and , we can evaluate the last expression as

*R*(

*k*

^{2}

_{0}+6

*k*

_{0}+21). Addition to Equation (20) and insertion of the expression for

*R*yields the result. Remark 1: From an application point of view, Theorem 1 raises the question of how ε and the runtime given to ε-SEMO in each call of this procedure should be chosen in an implementation of APS/ε-SEMO if and/or the bound

*h*are not known. A pragmatic answer is to tune both parameters experimentally for the considered class of problem instances, such as is usually done for other parameters of a metaheuristic. Obviously, an upper bound for the expected optimization time under the choice of these two parameters as described in Theorem 1 is also an upper bound for the expected optimization time under the choice of the (experimentally found) best combination of these parameters. A more sophisticated way of handling the issue would be to gradually decrease during the run of the APS framework algorithm, and to gradually increase the runtime given to ε-SEMO. For all iterations where has fallen below and the runtime of ε-SEMO has exceeded 2

_{n}*h*, analogous estimations as in the proof of Theorem 1 become applicable. Possibly, this will allow an extension of Theorem 1 to the described self-adapting variant. Remark 2: Our presentation of the SEMO algorithm has been based on the original version of SEMO as introduced in Laumanns et al. (2002b). In the literature, also a slightly modified version of SEMO has been analyzed (see, e.g., Neumann and Wegener, 2006). This version, which we shall denote by SEMO’, differs from the original one by the fact that if to an element considered for insertion, there exists already an element with the same objective function values, then instead of keeping

_{n}*y*and disregarding , the new is inserted into , and

*y*is omitted. More formally, the two last statements in the loop of SEMO (Algorithm 2) are replaced by By obvious analogous replacements, we obtain corresponding procedures ε-SEMO’ and ε-TRIM’ from ε-SEMO and ε-TRIM, respectively. Going through our proofs, it is easily seen that Theorem 1 remains valid if in APS/ε-SEMO, subprocedure ε-SEMO is replaced by ε-SEMO’, and subprocedure ε-TRIM is replaced by ε-TRIM’.

We immediately obtain from Theorem 1:

If and , the expected optimization time of APS/ε-SEMO grows at most by a factor of order *O*(*n*^{3}) faster in *n* than that of SEMO applied to the noise-free problem.

(Continuation of Example 1 from Section 4.1.) For the SLOTZ problem introduced in Section 4.1, we have . A well-known result by Laumanns et al. (2002b) states that the expected optimization time required by SEMO to find the efficient set for LOTZ is on the order of , with an upper bound of *h _{n}*=(1/2)

*n*

^{3}−(1/2)

*n*

^{2}. By applying our Theorem 1, we derive from this result that the expected optimization time for finding the efficient set for SLOTZ is at most on the order of

*O*(

*n*

^{6}).

The article Neumann and Wegener (2006) analyzes the expected optimization time of SEMO’ (cf. Remark 2) for a bi-objective minimum spanning tree (MST) problem. In an edge-weighted graph with *r* nodes and *n* edges, to each subset *x* (representable as a binary string) of the set of edges, two objective functions are assigned: The first objective *c*(*x*) is the number of connected components of the subgraph defined by *x*, the second objective *w*(*x*) is the total weight of the edges in *x*. Both objectives are to be minimized. Weights are assumed to be integers with a maximum value of *w*_{max}. Neumann and Wegener show that the expected optimization time of SEMO’ is on the order of *O*(*nr*(log *r*+log *w*_{max})+*nr*^{2}).

As an alternative, let us also consider a second noise model where the noise results from imprecise measurement of the weights of the edges on the assumption that an estimate of *w*(*x*) is obtained by summing up the individual weight estimates of the edges contained in *x*. Suppose that the individual noise terms for the measurement of the edges are independent and normally distributed with a variance smaller than or equal to some constant independent of *n*, and that the number *c*(*x*) of components can be determined without noise. Then , and again on the assumption that and , Theorem 1 yields an expected optimization time on the order of .

Corollary 5 shows that in many cases, the transition from a MOCO problem to a SMOCO problem only generates a polynomial-time overhead factor. Although this is a helpful insight, the bounds presented in Corollary 5 and the SLOTZ example are still rather weak. We conjecture that stronger bounds hold, and below we shall outline a situation where stronger bounds can indeed be rigorously derived. Before turning to this issue, a generalization of Theorem 1 will be demonstrated. We start with a definition.

*The assertion of Theorem 1 remains valid if the -distributed noise is replaced by independent noise following some other distribution (which is allowed to depend on i and x), provided that for all and i=1, 2,*

the density of is continuous, symmetric (i.e., the density in −

*z*is equal to the density in*z*) and nonincreasing for positive arguments*z*, andis more peaked about zero than a random variable .

Theorem 1 in Birnbaum (1984) states that under the indicated conditions, the corresponding sample averages inherit the more peaked relation, that is, is more peaked about zero than with . A reinspection of the proof of our Theorem 1 above and the preceding lemmas (especially Lemma 2 and Corollaries 2–4) shows that this property is all that is needed to make the used upper bound estimations still valid for the case where the are replaced by the .

Corollary 6 above allows for the case of bounded noise to be included, for example, noise following a triangular distribution or a truncated normal distribution.^{7} Our next result shows that in the case of bounded noise, a considerably stronger runtime bound than that of Theorem 1 can be derived.

*Let the conditions of Corollary 6 be satisfied, and let for all n*,

,

.

First of all, we compute the maximal length of an ε-antichain in a square with side length . An ε-antichain is a set of points with the property that two different points are ε-incomparable, that is, that neither nor nor . In particular, this means that two elements *z* and in an ε-antichain differ by at least ε in each of their two coordinates. By projection to one of the two coordinate axes, one immediately sees that in the considered square, not more than mutually ε-incomparable elements can exist. We choose as in Theorem 1. Then we have that an ε-antichain can contain not more than elements.

*k*, both the solution

*S*

^{(k)}delivered by ε-SEMO and the solution

*L*

^{(k-1)}delivered in the previous iteration by ε-TRIM are ε-antichains. Since by assumption, the modulus of the noise is smaller or equal to

*c*, also holds, and therefore both the sample average estimates used for the elements of

_{n}*S*

^{(k)}in the call of ε-SEMO in iteration

*k*and those that have been used for the elements of

*L*

^{(k-1)}in the solution evaluation part of iteration

*k*−1 all lie in the square , that is, in a square with side length . Using the bound above, we conclude that and , so .

*k*of APS, this yields an upper runtime bound of with Therefore, the total runtime in iterations 1 to

*k*

_{0}−1 of APS is bounded from above by and the expected total runtime in iterations

*k*

_{0},

*k*

_{0}+1, and so on, until has been identified is bounded from above by with . Hence is an upper bound for the total expected optimization time. Insertion for gives the result.

*c*>0.

^{8}Note that for

*a*=0,

_{n}*b*=

_{n}*n*, and

*c*=

_{n}*c*, the conditions of Theorem 2 are satisfied, and that . Then Theorem 2 yields an upper bound for the expected optimization time of which is only by a factor of order

*O*(

*n*

^{2}) worse than the expected optimization time in the deterministic boundary case of LOTZ. With a view at the proof of Theorem 2, it can also be seen from Equation (22) that now, the term on the order of referring to the solution evaluation part of APS is negligible compared to the term on the order of

*O*(

*n*

^{3}) referring to the solution proposal part. This is a typical situation also for applications to more complex problems, since the current approximation of the efficient set is usually rather small, such that the solution evaluation part is computationally unproblematic.

Remark 3: Obviously, the *O*(*n*^{2}) bound for the ratio of expected optimization times between stochastic and deterministic problem in the bounded-noise case can be generalized from SLOTZ to a broader class of problems, analogous to the case in Corollary 5. However, it is not quite general, as the following example shows.

(Continuation of Example 3 from Section 4.1.) For the stochastic bi-objective MST problem of Example 3, let us perform a similar replacement of the Gaussian noise by noise with a distribution of more peaked about zero than as in Example 4 above. Let all other assumptions be the same as before. For , Theorem 2 yields in this case an exponential bound. However, if with some , Theorem 2 (with *a _{n}*=0,

*b*=

_{n}*w*

_{max}, and

*c*=

_{n}*c*) gives a bound for the expected optimization time on the order of .

Remark 4: Also for a normal distribution, in the case of sufficiently large *c _{n}*>0, absolute noise values larger than

*c*are very unlikely. Therefore, it can be conjectured that for Gaussian noise, the expected optimization time is bounded by an order that conforms to the often stronger bound of Theorem 2 instead of only to the bound of Theorem 1. We leave a rigorous proof of this conjecture as an open question. Remark 5: We do not claim that the sample size increment scheme and is optimal for the expected optimization time of APS/ε-SEMO. Concerning the upper bound for the expected optimization time, it can even be shown that it becomes better the faster the sample size is increased: Consider a scheme where with some . For the sake of simplicity, let us restrict the discussion to the SLOTZ special case. Then by analogous derivations to those in Theorems 1 and 2, we find an expected optimization time bound on the order of

_{n}*O*(

*n*

^{4+2/p}) and of

*O*(

*n*

^{4+1/p}) in the case of Gaussian and of bounded noise, respectively, which decreases for increasing

*p*. Nevertheless, it should be kept in mind that these are only upper bounds. It is easily possible that the actual expected optimization time of the linear scheme is better than , since in our proofs, we used rather coarse bound estimations, assuming that the efficient set has to be identified as a whole in one of the APS iterations, whereas APS can also find the elements of the efficient set one after another or subset by subset, which considerably speeds up the process. This gradual identification of the efficient set, however, is prevented if the sample size is increased too fast.

^{9}

## 6. Conclusions

In this paper, an algorithm for stochastic multi-objective combinatorial optimization (SMOCO) has been analyzed. The algorithm is obtained by inserting the simple evolutionary multi-objective optimizer (SEMO) as a subprocedure into adaptive Pareto sampling (APS). Normally distributed noise as well as more general noise distributions have been investigated. It has been shown that if SEMO applies a specific ε-dominance relation instead of the ordinary dominance relation, the expected optimization time of the overall algorithm can be bounded from above, provided that ε is chosen small enough. The bound depends on the expected optimization time of SEMO on the corresponding deterministic counterpart problem. An explicit expression for this dependence has been given. In the special case where the variance of the noise is bounded from above by a constant independent of the problem size *n*, and ε is bounded from below by a constant independent of *n*, the ratio between the expected optimization times for the stochastic and the corresponding deterministic problem, respectively, is on the order of *O*(*n*^{3}). For bounded noise, the ratio reduces to *O*(*n*^{2}).

Since the underlying deterministic single- or multi-objective optimization problems of practically relevant SMOCO problems are often NP-hard, such that the optimization time for them must be anticipated to grow exponentially fast in the instance size, the results show that in typical cases, stochastic MOCO problems are not essentially harder than their deterministic counterparts.

Future research should address several topics. A question of practical relevance is whether the results presented here can also be transferred from the use of SEMO to the use of a more elaborate multi-objective metaheuristic such as NSGA-II or SPEA2. This question is especially important in view of well-known drawbacks of SEMO, for example, the fact that SEMO can end up with prohibitively large populations. Furthermore, a generalization of the results presented here to the case of more than two objective functions would be of interest.

Expected optimization time is not the only relevant criterion for the performance of an evolutionary optimization algorithm; often, the immediate goal is not to find the exact solution (in the multi-objective case: the efficient set), but rather to identify a good approximate solution within a short time. For investigating this question in the multi-objective situation, the behavior of the hypervolume or of another quality indicator during the run of a combined algorithm of the type considered here could be studied.

The bounds derived here are rather coarse, and it is possible that tighter bounds may be found if the way that APS gradually adapts the currently proposed solution set is taken into more detailed consideration. Moreover, in the present work, we focused on upper bounds for the expected optimization time. A most interesting question would be how good lower bounds look like and whether there are cases where it can be shown that their order matches that of the found upper bounds. Based on these results, the question of how the sample size scheme in APS can be organized in order to obtain the fastest possible convergence to the Pareto front could also be addressed.

## Acknowledgments

The author thanks Lothar Thiele for a discussion on ε-dominance concepts during the Dagstuhl Seminar 2010 on theory of evolutionary algorithms, and three anonymous reviewers for their very helpful comments on the first version of this article.

## References

## Appendix

From Equation (17), we see that for , the lower bound in Equation (16) is nonnegative. Considering Equation (9), Equation (6), and the independence of the random variables for different , and observing that the power function *u ^{p}* (

*p*even) is nondecreasing as long as , we obtain from Lemma 2:

If the size of a sample of i.i.d. random noise variables is increased, the variance of the sample average decreases. As a consequence, the probability that the error terms of the sample average estimates over a sample of size with *m*>1 all lie in *Q _{r}*(

**0**) is larger or equal to the corresponding probability for the case

*m*=1. Therefore, the choice

*m*=1 is conservative w.r.t. the lower bound on . For

*m*=1, the only difference from Corollary 2 is that now, not only in 2

^{n}, but in 2

^{n}+2

^{n}=2

^{n+1}random experiments, the error terms have to be sufficiently small. This immediately yields the indicated formula.

By Corollary 3,

## Notes

^{1}

In the stochastic programming literature, a scenario is a specific outcome of a random experiment (see Birge and Louveaux, 1997, p. 50).

^{2}

Presumably, our runtime results in Section 5 can also be extended to more than two objectives by a rather straightforward generalization of the proofs, but this topic is beyond the scope of the present article.

^{3}

Formally, for each *x* and *i*, the function is a random variable on a probability space , and is an element of the sample space . Each determines a specific realization of the random variables , that is, a scenario; hence the notation . As is customary in the stochastic programming literature, we shall identify with the scenario determined by it (cf. Birge and Louveaux, 1997).

^{4}

Of course, dependence could make a difference in the way the sampling is done in the solution proposal part of APS. However, in order to stay within the premises of the following analysis, we may always replace dependent sampling by independent sampling, drawing a specific, independent scenario for each *x* and each *i*, as is done in the solution evaluation part. This may deteriorate the performance of the algorithm, but as we are interested in upper runtime bounds, this is not a problem.

^{5}

In particular, the described procedure implies that if the solution algorithm for the deterministic subproblem examines one and the same solution *x* more than once, the same noise term realizations should always be used. For this purpose, the algorithm has to store the sample average estimates for the already examined solutions in a look-up table. Such an implementation may be less efficient than one where, if a solution is examined a second time, the noise is evaluated by a new independent simulation. For large *n*, however, the case where an evolutionary search algorithm returns in a later iteration to a solution that has already been examined before is comparably rare, such that it can be expected that in terms of the quality evolution of the successively examined solutions, the second implementation variant will behave very similarly to the variant investigated in this article.

^{6}

Observe that contrary to , the relation of Definition 1 does not coincide with ordinary dominance even for small ε because *z* does not dominate itself.

^{7}

Strictly speaking, in the last case, the truncation must be defined in such a way that the density function remains continuous, which can of course approximate a noncontinuously truncated normal distribution to any desired degree of accuracy.

^{8}

The special case of large *c* and a continuously truncated normal distribution produces a very close approximation to the Gaussian model.

^{9}

In the case where instead of SEMO, a solver based on mathematical programming is used for the solution of the deterministic subproblem, as has been done in Gutjahr (2009), a rapid increase of the sample size is especially detrimental, since the runtime of such a solver typically increases exponentially fast in the sample size, such that high sample sizes have to be avoided at any cost.