## Abstract

We consider optimization problems where the set of solutions available for evaluation at any given time *t* during optimization is some subset of the feasible space. This model is appropriate to describe many closed-loop optimization settings (i.e., where physical processes or experiments are used to evaluate solutions) where, due to resource limitations, it may be impossible to evaluate particular solutions at particular times (despite the solutions being part of the feasible space). We call the constraints determining which solutions are non-evaluable ephemeral resource constraints (ERCs). In this paper, we investigate two specific types of ERC: one encodes periodic resource availabilities, the other models commitment constraints that make the evaluable part of the space a function of earlier evaluations conducted. In an experimental study, both types of constraint are seen to impact the performance of an evolutionary algorithm significantly. To deal with the effects of the ERCs, we propose and test five different constraint-handling policies (adapted from those used to handle standard constraints), using a number of different test functions including a fitness landscape from a real closed-loop problem. We show that knowing information about the type of resource constraint in advance may be sufficient to select an effective policy for dealing with it, even when advance knowledge of the fitness landscape is limited.

## 1 Introduction

In the late 1960s, Hans-Paul Schwefel reported an ingenious set of experiments designed to optimize the shape of a flashing nozzle (Schwefel, 1968; Klockgether and Schwefel, 1970). Figure 1 illustrates the setup employed. Schwefel was using an early form of evolutionary algorithm (EA) and evaluating designs, not through simulation, but by conducting real (physical) experiments. Although resource-expensive, this setup is effective because experiments replace the need for having available, or designing, sufficient mathematical models of the problem being solved.

This paper considers problems featuring experimental setups of very similar character to that of Schwefel's, nowadays commonly referred to as closed-loop optimization problems (Knowles, 2009). In these, genotypes to a problem (e.g., a set of parameter values specifying nozzle shapes) are planned on a computer, but their phenotypes (e.g., an actual flashing nozzle) are realized or prototyped and evaluated ex silico (e.g., relying on a physical experiment of some sort). The process of measuring the fitness of the phenotype involves conducting a physical experiment, as was the case in Schwefel's setup. Applications of closed-loop problems have included shape design optimization (Rechenberg, 1973; Schwefel, 1975; Rechenberg, 2000), optimization of running industrial processes (Box, 1957), quantum control (Judson and Rabitz, 1992; Shir, 2008), drug discovery (Caschera et al., 2010; Small et al., 2011), analytical biochemistry (Vaidyanathan et al., 2003; O'Hagan et al., 2005, 2007), evolvable hardware (Thompson, 1996), and food science (Herdy, 1997; Knowles, 2009), among others. Many of these used an EA approach (cf. Box, 1957; Schwefel, 1975; Rechenberg, 2000).^{1} Interest in the use of closed-loop methods seems to be healthy; see, for example, Knowles (2009), Shir and Bäck (2009), Shir et al. (2009), Caschera et al. (2010), Bäck et al. (2010), Bedau (2010), and Michalewicz (2010).

While closed-loop optimization frequently produces satisfying results, there are various unexplored resourcing issues an experimentalist may face during an iterative experimental loop. The aim of this paper is to understand how evolutionary search is affected by, and can be extended to combat, a particular resourcing issue in a closed-loop optimization scenario: the temporary nonavailability of resources required in the evaluation process of solutions. This situation may cause solutions that are perfectly feasible candidate solutions to the problem to be temporarily nonrealizable and thus not available for fitness assessment; we will refer to such solutions as (temporarily) non-evaluable. We refer to the (dynamic) constraints specifying which solutions are not evaluable at a given time as ephemeral resource constraints (ERCs), and any optimization problem that involves ERCs as an ephemeral resource-constrained optimization problem (ERCOP).

What is the motivation for studying ERCs? Although not reported explicitly in the literature, Schwefel and others (Heckler and Schwefel, 1978; Booker et al., 1999; Büche et al., 2002) have experienced these ERCs in practice. For instance, Schwefel had to stop experiments when brass rings he needed were not available. In another problem encountered by Schwefel, the fitness of a single solution was measured by running a time-consuming simulation on a computer. During some simulations, the process ended prematurely (i.e., an execution error or exception occurred) and no fitness was returned. Finkel and Kelley (2009) give eight further references where “failure of the function evaluation has been observed in practice,” indicating that this difficulty still prevails today. Note that execution errors are similar to machine breakdowns and typically arise unexpectedly during an evaluation procedure. The time for which the resource, in this case the software, would not be available for the evaluation of the error-causing solution may correspond to the time needed to find and fix the error-causing bug in the software.

We are also aware of several other different types of ERCs as reported in Allmendinger and Knowles (2010a). Briefly, these come about in the following cases: (1) staff/operator/equipment needed for specific experiments/evaluations have limited availability, (2) consumable physical resources required to build or evaluate specific solutions may have run out, and may have time lags between ordering and receiving them, (3) relaxation of a physical instrument setting is costly or takes time, so the instrument should be reused at the same setting (the setting is described by a parameter in the solution vector being optimized), (4) random machine/component breakdowns (the use of a specific machine/component is described by a parameter in the solution vector being optimized). We have come across (1) in the problem of optimizing cocoa roasting (see brief discussion in Knowles, 2009), (2) in a drug discovery problem (Small et al., 2011), and in the flashing nozzle problem encountered by Schwefel (as already discussed), and (3) in the domain of instrument setup optimization (O'Hagan et al., 2005, 2007; Jarvis et al., 2010).^{2} We have not seen (4) directly, but can easily imagine scenarios such as in hardware evolution, where a component under evolutionary selection (e.g., a transistor) fails, so that that component cannot be used in further solutions until it is replaced (or similar scenarios).

We have also investigated ERC scenarios where several different ERCs were present at the same time. For example, in a real-world problem in design optimization, resources needed to test new designs had to be ordered in advance, kept in limited (refrigerated) storage, and used within a certain time frame. This type of problem, in abstract form, was explored in detail in Allmendinger and Knowles (2010b). When wastage of resources or time is important, we found it is necessary to schedule the resources involved in the application dynamically. A modified just-in-time policy, and another policy that predicts what the EA may wish to evaluate in the next generation were found to perform well, with the different constraint regimes determining which of the two should be preferred.

From the above examples, it is apparent that ERCOPs are static optimization problems, which is to say they have a static fitness function and a static feasible region. They are only dynamic in the sense that some candidate solutions are non-evaluable for certain periods of time (i.e., they cannot be prototyped and/or their fitness cannot be measured) due to resource limitations. In the course of this study we will give various other examples indicating how ERCs may arise in closed-loop optimization, and why they cannot reasonably be avoided if an efficient or budget-limited optimization is to be conducted.

Unlike our previous work (Allmendinger and Knowles, 2010b), here we tackle two different (and simpler) but perhaps more common types of ERCs, and investigate some general policies for dealing with them.

The rest of this paper is organized as follows. In the next section, for completeness reasons, we briefly recall the general problem definition of an ERCOP. Section 3 discusses the relationship between ERCOPs and other types of optimization problems to set this study in proper context. The two real-world ERC types on which we test our policies are outlined in Section 4, and the policies themselves are described in Section 5. Before we proceed with the experimental analysis in Section 7, we describe in Section 6 the choice of test functions and the base algorithm which we augment to implement the policies; and we give all parameter settings. In Section 8 we then present a case study that illustrates one way in which one might select a suitable policy for an ERCOP with largely unknown search space properties, which is a common situation in the real world. Finally, in the concluding section, we draw together the findings from the experimental analyses and discuss directions for further research.

## 2 Ephemeral Resource-Constrained Optimization Problems (ERCOPs)

ERCs are temporary limitations on the set of solutions that are available for evaluation during an optimization procedure. To define them formally, we begin with a standard optimization problem, and add to it a notion of a time-ordered search, and the concept of a non-evaluable solution, as follows.

A black box optimization algorithm *a*, for example, an EA, for solving the above problem can be represented as a mapping from previously visited solutions to a single new solution in *X*, as suggested by Wolpert and Macready (1997). Formally, , where the search history denotes the time-ordered set of solutions visited until time step *t*−1, and **x**_{i} and *y _{i}* indicate the

*X*value and the corresponding

*Y*value of the

*i*th successive element in , respectively. We augment this notion of a search algorithm with the ability to visit a null solution, , with the effect that the algorithm can wait for a time step without evaluating a solution. An optimizer might submit null solutions, for example, if it wishes to wait until a missing resource is again available. In fact, this approach has been employed by Schwefel in his flashing nozzle problem (see Section 1).

*f*(

**x**

_{t}) of a feasible solution is

*y*=

_{t}*f*(

**x**

_{t}), in an ERCOP, we have where represents the set of evaluable solutions (or evaluable search region) at time step

*t*. In our case, is defined by a set of schemata into which solutions have to fall in order to be evaluable.

^{3}The set may change over time depending on a set of problem-specific and time-evolving parameters . The availability of resources required for the evaluation of solutions depends on these parameters. Hence, the set may include parameters such as various types of counters (e.g., cost, time, and evaluation counters), the search history (which may be used to encode nonavailabilities of resources due to previously made decisions), random variables (which may encode random events such as machine breakdowns), and so forth. The ERCs specify how exactly the set changes depending on the parameter set .

Note that the objective function *f* (thus also the global optimum) is static and does not change over time in a standard ERCOP; it is just the set of solutions evaluable at each time step *t*, , that may change. In this context, repairing a solution means to modify the genotype of a solution that is not in such that it is forced into ; that is, the outcome of a repairing step is a solution that falls into all schemata that define (assuming that the schemata are noncontradictory; that is, ).

Compared to standard (dynamic) constraints, the meaning of ERCs is different: a solution **x** that violates an ERC at time *t*, or , is not infeasible but is non-evaluable at time step *t*. That is, the experiment that is associated with **x** cannot be conducted, thus causing the fitness of solution **x** at time *t* to be undefined (or null).^{4} Figure 2 illustrates the interaction between and *X* commonly present in an ERCOP.

Time in an ERCOP can be seen as the simulated time defined by the real closed-loop experimental problem that is to be simulated. Hence, time may refer not only to function evaluations of single solutions, as is the case in standard optimization problems, but also, for example, to real time units (e.g., seconds) or cost units (e.g., pounds). This notion of time allows ERCs to be dependent on, among other options, the number of evaluated solutions, expenses, or a certain date, such as days of the week. The normal assumption is that all evaluations take equal time or resources, but this need not be the case. Generally, experiments may be of different durations and have nonhomogeneous costs in terms of the financial or temporal resources they require.

## 3 Relationship of ERCOPs to Other Types of Optimization Problems

As mentioned in the introductory section, dynamic resource constraints in the sense meant here have not to date been raised much in the literature. In fact, apart from discussions with Schwefel as well as our own collaborative work (Knowles, 2009; Allmendinger and Knowles, 2010a, 2010b), ERCs have not been considered in published work to the best of our knowledge.

However, of course, much other related work informs our research, and we find that ERCOPs and closed-loop optimization are related to several other areas. Traditionally, closed-loop optimization problems are dealt with using statistical methods referred to as experimental design or design of experiments (DoE; Montgomery, 1976; Box et al., 2005). The focus of DoE is on low-dimensional search spaces with the aim to obtain statistically robust results in as few evaluation steps as possible and to explain them in terms of a model. ERCOPs, by contrast, often feature higher-dimensional search spaces, and one is ultimately interested in finding a single optimal solution. Nevertheless, we believe that closed-loop evolution methods should draw on DoE, particularly in areas such as noise-handling, replication, blocking, and so on. To our knowledge, however, the DoE field has not so far considered resourcing issues as a perturbing influence on conducting the most informative experiments.

As we will see later in the commitment relaxation ERCs, ERCOPs can have a time-linkage aspect to them in the sense that ERCs arise due to previously made decisions. We find an interesting parallel in some work on online (dynamic) optimization problems (Borodin and El-Yaniv, 1998; Bosman and Poutré, 2007), which exhibits time-linkage too. Nevertheless, there are clear and important differences between our problem formulation and those considered in these studies: their aim is to optimize a cumulative score over some period of time, whereas ours is to find a single optimal (and ultimate) solution. Similarly, we find ERCOPs to be materially different from traditional dynamic optimization problems (Branke, 2001) because the objective space in ERCOPs does not change over time and thus the optimal solution does not need to be tracked. Despite this core difference, we believe that some policies, such as using memory, can carry over from dynamic optimization into our work considering dynamic resource constraints.

Traditional constrained optimization (Michalewicz and Schoenauer, 1996; Nocedal and Wright, 1999; Coello, 2002) is also an important related area, which can inspire some methods for handling resource constraints (e.g., penalty methods). However, the fact that ERCs prevent the evaluation of solutions that are otherwise feasible makes them materially different from standard constraints, including dynamic constraints (Nguyen, 2011). A practical consequence of the difference between ERCs and dynamic constraints is that while an algorithm optimizing subject to ERCs may, during the optimization process, report a currently non-evaluable solution as its best-so-far solution, an algorithm optimizing subject to dynamic constraints should not report an infeasible solution as its current best-so-far solution. Also, while the optimal solution does not change in the presence of ERCs, it is likely to do so when optimizing subject to standard dynamic constraints. The implication of this is that one could terminate the optimization in the presence of ERCs once a solution of the desired quality is found, while this should not be done with standard dynamic constraints.

## 4 Two Specific ERC Types

In this section we introduce two ERC types that we encountered in our own collaborative work and that seem to be common in real-world applications: commitment relaxation ERCs and periodic ERCs.^{5} Before we define both ERC types, we introduce three elements that are common to both (and other) ERC types. These elements are the activation period, the constraint time frame, and the constraint schema.

### 4.1 Fundamental Elements of ERCs

The activation period of , is the number of counter units for which that constraint remains active, once it is switched on. Similar to time steps, counter units may refer to function evaluations of a single solution, a set of solutions (in case experiments are conducted in parallel), real time units (e.g., seconds), or something else. Here, they refer to function evaluations of a single solution.

The constraint time frame (ctf) of *ERC _{i}* is where

*t*represents some counter unit, as above.

^{6}The constraint

*ERC*may be active only during the ctf and not outside of the ctf. That is, if we assume an ERCOP to be subject to a single constraint, then we have ctf, and ctf. The period of time and (

_{i}*T*is the total optimization time) is the preparation period and recovery period, respectively (see Figure 3). The duration of these two periods has a significant effect on the performance of an EA, as we will see later.

The restriction imposed by an ERC during the activation period can be of different forms. In our case, resources are often associated directly with individual solution variables, which allows us to conveniently use the notion of schemata to describe the availability of resources. We say that solutions have to fall into a particular constraint schema (associated with a constraint *ERC _{i}*) in order to be evaluable. A schema

*H*represents a particular subset of solutions that share some common properties. For instance, consider solution vectors to be binary strings of length

*l*=5 with each solution bit representing two resource choices (0 and 1 or

*A*and

*B*) to be optimized over. Now, for example, if we assume that only resources 1 and 0 are available for bit position 2 and 5, respectively, while all resources are available for the other bit positions, then the constraint schema would describe the set of evaluable solutions ; the is a wildcard symbol which means that a bit position can have any possible value (thus in the binary case either value 0 or 1). A general property of a schema is its order

*o*(

*H*), representing the number of defined bit positions (Reeves and Rowe, 2003); for the above example we have

*o*(

*H*)=2. In the presence of multiple constraints

*ERC*, solutions have to fall into the union of the schemata associated with the constraints. In this study, we consider discrete search spaces, mainly of pseudo-Boolean nature or . In nondiscrete spaces, we might require to restrict solution parameters to lie within or out of certain parameter value ranges rather than to take specific parameter values. In this case, ERCs could be defined in terms of functions over the input vector space, and corresponding inequality/equality conditions, that is, using standard constrained optimization notation, except that the trigger(s) of the constraint(s) also need to be specified.

_{i}### 4.2 Commitment Relaxation ERCs

A commitment relaxation ERC commits (forces) an optimizer to a specific variable value combination (i.e., constraint schema) for some (variable) period of time whenever it uses this particular combination. Forcing a variable or linked combination of variables to be fixed models real-world problems involving changeover costs of one sort or another. In particular, if changing a variable's value would incur some (large) changeover cost, such as a cleaning step, a component replacement, or a testing phase, then such changes to the variable may be made taboo for some period. Often, the changeover is much cheaper if done at a particular time step immediately after component replacements or cleaning (which is commonly done routinely rather than reactively), and so the variable can be allowed to change at that point.

We refer to the period of time during which some variable(s) setting (or schema) *H* is forbidden from changing as an epoch, and denote its duration by *V*. We define the activation period to be the duration of the period of time we have to commit to a particular setting *H* during the *j*th epoch. Note that the length of the activation period may change with each new epoch, depending on when the particular setting *H* is selected by the optimizer. To describe the setting *H*, we can conveniently use a constraint schema. For example, we would use to state that a commitment is associated with the instrument setting for which the values of bit positions 2 and 5 are set to 1 and 0, respectively.

*H*do not lie on an optimizer's search path, but one activation may already introduce enough solutions from

*H*into the population that future activations might be more likely.

The corresponding implementation of a commitment relaxation ERC is defined by Algorithm 1. The method takes as input the parameters *t*^{start}_{ctf}, *t*^{end}_{ctf}, *V*, and *H*, a candidate solution **x** that is to be checked for evaluability, and the current (global) time step *t*. The output is a Boolean value indicating whether **x** is evaluable or not (in our EA, shown in Algorithm 3, which will be covered in Section 6.1, we call the method at Line 21). The method maintains two local variables, *last_activation* and *k*, required to update the internal state of the constraint: Lines 5 to 7 are responsible for the activation of the ERC and the setting of the activation period, while Line 9 ensures that solutions have to be in *H* during an activation.

In future, we will denote a commitment relaxation ERC of this form by . An extension to this simple commitment relaxation ERC is to maintain not only one but several commitment relaxation ERCs with different constraint schemata *H _{i}*. In this case, we need to consider three aspects: (1) a solution is non-evaluable if it violates at least one ERC, (2) a repaired solution has to satisfy all activated ERCs and not only the ones that were violated, and (3) a repaired solution must be checked to see whether it activates an ERC that was not activated before. This extension will be considered later in the case study, which we present in Section 8.

### 4.3 Periodic ERCs

A periodic ERC models the availability of a specific resource, represented by a constraint schema *H*, at regular time intervals. That is, the ERC is activated every *P* time steps (period length) for an activation period of exactly *k* time steps (see Figure 5). As the ERC models the availability of resources, an individual has to be a member of *H* during the activation period. An example of a periodic ERC is:

“

In an optimization problem requiring skilled engineers to operate instruments, on Mondays, only engineereng_{i}is available.”

The corresponding implementation of a periodic ERC is defined by Algorithm 2. The method *k*, *P*, *H*, **x**, *t*) takes as input the parameters *t*^{start}_{ctf}, *t*^{end}_{ctf}, *k*, *P*, and *H*, a candidate solution **x** that is to be checked for evaluability, and the current (global) time step *t*. The output is a Boolean value indicating whether **x** is evaluable or not (in our EA, shown in Algorithm 3, discussed in Section 6.1, we call the method at Line 21).

In future, we will denote periodic ERCs by . A potential extension of a periodic ERC is that the period length and the activation period refer to different counter units. For example, consider the maintenance of machines. While maintenance might take hours (i.e., *k* might be measured in real time units), machines might need to be maintained after using them a certain number of times (i.e., *P* is measured in function evaluations).

## 5 Constraint-Handling Policies for ERCOPs

This section introduces five constraint-handling policies for dealing with non-evaluable solutions arising due to ERCs. The policies are applicable not only to the above ERC types but (in similar form) also to other ERCs. Three of the policies (forcing, regenerating, and the subpopulation strategy) apply repairing (i.e., modifying the genotype of a solution) and two (i.e., waiting and penalizing) avoid it in order to prevent drift-like effects in the search direction. Note that although some of the policies have been used to cope with standard constraints (we will point this out where applicable), the effect of them when handling ERCs is unknown. In the description of the policies, we assume that multiple ERCs of a particular ERC type with nonoverlapping (or noncontradictory) constraint schemata may be activated at a given time step. That is, there is always an evaluable solution, or . We also assume that we know which resources are available and thus that the schemata *H _{i}* are known to the optimizer.

### 5.1 Forcing

This policy forces a non-evaluable solution **x** into the constraint schemata *H _{i}* of all activated ERCs. In other words, all bits that do not match the order-defining bit values of the schemata

*H*of all activated ERCs are flipped, and the solution so obtained is returned for evaluation. Strategies of this kind have been used previously, for example, in constrained combinatorial optimization (Liepins and Potter, 1991).

_{i}A drawback of this policy is that enforcing changes in decision variable values may destroy potentially good genotypes. Later, we will investigate this aspect more closely.

### 5.2 Regenerating

The aim of this policy, which is similar to the death penalty approach (Schwefel, 1975) originating from the evolution strategies community, is to overcome the potential drawback of forcing. In fact, as the name of the policy suggests, upon encountering a non-evaluable solution, regenerating iteratively generates new solutions from the empirical distribution of the current offspring population (i.e., it generates new offspring from the current parent set) until it generates one that is evaluable, that is, one that falls into the schemata *H _{i}* of all activated ERCs, or until

*L*trials have passed without success. In the latter case, we select the solution, generated within the

*L*trials, that is closest to the schemata

*H*of all activated ERCs and apply forcing to it. Here, closest refers to the solution with the smallest sum of Hamming distances to the schemata

_{i}*H*of all activated ERCs;

_{i}^{7}where ties between several equally-closest solutions are broken randomly. Thus, the method always returns an evaluable solution (except in the deadlock situation where multiple ERCs with overlapping

*H*are activated simultaneously, in which case no solution is evaluable).

_{i}A potential drawback of this policy is that for large *L* it can be computationally expensive, while for small *L*, it could be that it often reduces to the forcing policy.

### 5.3 Subpopulation Strategy

Let us assume there is only one ERC, that is, *r* = 1. In this case, alongside the actual population, we also maintain a subpopulation *SP* of maximum size *J* that contains the fittest solutions from *H*_{1} evaluated so far. A non-evaluable solution is then dealt with by generating a new solution based on this subpopulation. If the maximum population size of *SP*, *J*, is not reached, then a new solution from *H*_{1} is generated at random; otherwise, we apply one selection and variation step using the same algorithm as the one we augment the constraint-handling policies on; if the new solution is non-evaluable, which may happen due to mutation, we apply forcing to it. To update the subpopulation upon evaluating a solution from *H*_{1} we use a steady state or (*J*+1)-ES reproduction scheme. We use this reproduction scheme because, depending on the ERC, the number of evaluated solutions from *H*_{1} might be small, in which case a generational reproduction scheme is likely to result in a slow convergence.

A drawback of the subpopulation strategy is that if we have more than one ERC, that is, if *r*>1, then the number of subpopulations needed is upper-bounded by 2^{r}, the power set of the total number of ERCs. With multiple ERCs, we generate a solution using the subpopulation that is defined by the (set of) schemata *H _{i}* of activated ERCs.

### 5.4 Waiting

This policy does not repair but it waits with the evaluation of a non-evaluable solution and the generation of new solutions until the activation periods of all ERCs that are violated by the solution have passed; that is, until the optimization freezes. The freezing period is bridged by submitting as many null solutions as required until the solution becomes evaluable.^{8} This policy is identical to the way Schwefel (1968) handled unavailable conical rings in his flashing nozzle design problem (see Section 1).

The advantage of waiting is that it should prevent drift-like effects in the search direction caused by ERCs, but the drawback is that this might result in a smaller number of solutions being evaluated (this depends upon whether time is a limiting factor).

### 5.5 Penalizing

Like waiting, this policy does not repair. However, instead of freezing the optimization, a non-evaluable solution is penalized by assigning a poor objective value *c* to it. The effect is that evaluated solutions coexist with non-evaluated ones in the same population. However, due to selection pressure in parental and environmental selection, non-evaluated solutions are likely to be discarded as time goes by. As we will use the policy within an elitist EA, and because we use a *c* that is the minimal fitness in the search space, a non-evaluated solution will never be inserted in a population (that is filled with previously evaluated solutions) in the first place.

This kind of penalizing policy is popular in the genetic algorithm (GA) community, where it can be regarded as a static penalty function method (Coello, 2002).

The advantage of penalizing over waiting is that the optimization does not freeze upon encountering a non-evaluable solution; that is, the solution generation process continues and thus solutions might actually be evaluated (without needing to penalize them) during an activation period. However, since evaluated solutions will have to fall into the schemata *H _{i}* of all currently activated ERCs, penalizing might be subject to drift-like effects, thus potentially losing the advantage of waiting.

Figure 6 depicts how the policies forcing, regenerating, and the subpopulation strategy may repair a non-evaluable solution.

## 6 Experimental Setup

This section describes the test functions *f*, the EA on which we augment the different constraint-handling policies, and the parameter settings as used in the subsequent experimental analysis, which investigates the impact of commitment relaxation and periodic ERCs.

### 6.1 Evolutionary Algorithm

*f*in a standard optimization problem. Note from the pseudocode that we are using a Lamarckian population update in this paper; that is, a repaired solution is used for evaluation and also replaces the original solution. Additional performance-enhancing mechanisms commonly used in EAs, such as diversity preservation techniques (Goldberg and Richardson, 1987; Mahfoud, 1995) or adaptive parameter control (Davis, 1989), may affect the results, but are not considered here.

### 6.2 Test Functions *f*

Since our aim in this study is to understand the effect of ERCs on EA performance on real closed-loop problems (ultimately), it might be considered ideal to use, for testing, some set of real-world ERCOPs, that is, real experimental problems featuring real resource constraints. That way, we could see the effects of EA design choices (the constraint-handling policy used) directly on a real-world problem of interest. But even granting this to be an ideal approach, it would be very difficult to achieve in practice due to the inherent cost of conducting closed-loop experiments and the difficulty of repeating them to obtain any statistical confidence in the results seen. For this reason, most of our study will use more familiar artificial test problems augmented with ERCs (although in the case study, in Section 8, we do use data and constraints from a real closed-loop problem).

Our set of selected test functions comprises: (1) OneMax, (2) a competing optima problem, TwoMax, and (3) several MAX-SAT problem instances with many local optima. In the subsequent case study we will also see a variant of *NK* landscapes ( landscapes) being used; we will introduce this test function here too. Of course, we cannot guarantee that the test functions mimic real (closed-loop) problems, but a diverse set of functions as used here should be sufficient to draw some tentative conclusions about the effects of ERCOPs generally, depending on the results observed.

#### 6.2.1 OneMax

*f*=

*l*for the bit string consisting only of 1-bits.

#### 6.2.2 TwoMax

**x**and

*b*>1, that is, the factor by which the global optimal solution shall be fitter than the local optimal solution, then the TwoMax function is defined by

#### 6.2.3 MAX-SAT

Given a collection of clauses involving *l* binary variables , the satisfiability (SAT) problem asks whether or not there is a variable assignment such that all clauses are simultaneously satisfied (Hansen and Jaumard, 1990). A generalization of the SAT decision problem is the maximum satisfiability (MAX-SAT) problem, which asks for a variable assignment that satisfies the maximum number of clauses (Qasem and Prügel-Bennett, 2010). MAX-SAT and SAT problems are of high practical relevance as many challenging real-world problems can be efficiently formulated in SAT form (De Jong and Spears, 1989), for example, hardware and software verification problems, and routing in FPGAs. Another reason for choosing this problem is the presence of a backbone (the backbone of a MAX-SAT instance is the schema into which all the optimal solutions fall), which is a convenient property when analyzing the impact of ERCs on search.

We consider several (10) benchmark instances of a uniform random 3-SAT problem, which can be downloaded online.^{9} The instances have *l*=50 variables and 218 clauses and are satisfiable. Similar to other work (Hansen and Jaumard, 1990; Qasem and Prügel-Bennett, 2010), we treat the 3-SAT instances as MAX-3-SAT optimization problems with fitness calculated as the proportion of satisfied clauses. We also conducted experiments on other challenging multimodal test problems, landscapes (Kauffman, 1989), which we introduce next.

#### 6.2.4 *N K α* Landscapes

The general idea of the model (Hebbron et al., 2008) is to extend Kauffman's original *NK* model (Kauffman, 1989) to model epistatic network topologies that are more realistic in mapping the epistatic connectivity between genes in natural genomes. The model achieves this by affecting the distribution of influences of genes in the network in terms of their connectivity, through a preferential attachment scheme. The model uses a parameter to control the positive feedback in the preferential attachment so that larger result in a more nonuniform distribution of gene connectivity. There are three tunable parameters involved in the generation of an landscape: the total number of variables *N* (in our notation this variable is denoted as *l*), the number of variables that interact epistatically at each of the *N* loci, *K*, and the model parameter that allows us to specify how influential some variables may be compared to others. As increases, an increasing influence is given to a minority of variables, while, for , the model reduces to Kauffman's original *NK* model with neighbors being selected at random. This model has already been used previously to analyze certain aspects of real-world closed-loop problems; as an example see Thompson (1996).

### 6.3 Parameter Settings

The parameter settings of the EA and the policies are given in Table 1 and 2, respectively. The settings of the test functions are outlined in Table 3. We choose these search space sizes *l* (15, 30, and 50) as they correspond to typical search space sizes we have seen, for example, a drug combinations problem with a library of about 30 drugs as used by Small et al. (2011). The reason for setting the scaling factor *b* so small is that it makes the problem more challenging due to the low selection pressure to climb up the optimal slope. The optimization times *T* (see Table 3) are set such that we can assess both positive and negative effects of an ERC on the convergence speed and the solution quality obtained at the end of an algorithm run. To analyze the impact of the preparation and recovery time, we will also consider different settings for *T*, but this will be pointed out where applicable.

Policy . | Parameter . | Setting . |
---|---|---|

Regenerating | Number of regeneration trials L | 10,000 |

Penalizing | Fitness c assigned to non-evaluable solutions | 0 |

Subpopulation strategy | Maximal size of subpopulation SP, J | 30 |

Policy . | Parameter . | Setting . |
---|---|---|

Regenerating | Number of regeneration trials L | 10,000 |

Penalizing | Fitness c assigned to non-evaluable solutions | 0 |

Subpopulation strategy | Maximal size of subpopulation SP, J | 30 |

Test function f
. | Parameter . | Setting . |
---|---|---|

OneMax | Solution parameters l | 30 |

Optimization time T | 700 | |

TwoMax | Solution parameters l | 30 |

Scaling factor b | 1.1 | |

Optimization time T | 700 | |

MAX-SAT | Solution parameters l | 50 |

Optimization time T | 800 | |

landscapes | Solution parameters N=l | 15 |

Neighbors K | ||

Model parameter | ||

Optimization time T | 2,250 |

Test function f
. | Parameter . | Setting . |
---|---|---|

OneMax | Solution parameters l | 30 |

Optimization time T | 700 | |

TwoMax | Solution parameters l | 30 |

Scaling factor b | 1.1 | |

Optimization time T | 700 | |

MAX-SAT | Solution parameters l | 50 |

Optimization time T | 800 | |

landscapes | Solution parameters N=l | 15 |

Neighbors K | ||

Model parameter | ||

Optimization time T | 2,250 |

Any results shown are average results across 500 independent algorithm runs. To allow for a fair comparison of the policies, we use a different seed for the random number generator for each EA run but the same seeds for all policies. This allows us to apply a repeated-measures statistical test, the Friedman test (Friedman, 1937), to investigate significant performance differences between policies.

## 7 Experimental Study

The performance of a policy depends inter alia on the potential impact of an ERC on the population diversity and the optimization direction. To assess the impact on these two factors, we consider the following aspects: (1) what genetic material represented by a constraint schema *H* needs to be introduced into a population to cause a performance impact; (2) how much of it, or, rather, how many individuals of a constraint schema need to be introduced into a population to cause a performance impact; (3) at what stage during a run does it need to be introduced to yield a performance impact; and (4) the effects of the preparation and recovery durations. We give detailed observations on these effects here, and summarize the key findings in Section 9.

### 7.1 Commitment Relaxation ERC

We first analyze the case where a constraint schema *H* represents poor genetic material. For OneMax and TwoMax, this means that the order-defining bits of *H* are set to 0. For the MAX-SAT instances, we represent a poor bit by flipping a randomly selected bit from the backbone of an instance;^{10} for ease of presentation, also on this problem, we will write 1-bits to refer to “good” bits, which are randomly selected unflipped bits from a backbone, and 0-bits to refer to their complements.^{11}

Figure 8 shows how the final average best solution fitness is affected for the different policies on OneMax. The results obtained on TwoMax and the MAX-SAT instances are very similar and are shown in Appendix B. For ease of presentation, we normalize the fitness values of all test functions so that they lie in the range [0, 1]. We make the following observations from Figure 8.

Generally, the ERCs affect search negatively, and policy choice is important.

The subpopulation strategy tends to perform better than forcing and regenerating (which perform similarly) for the majority of constraint parameters. The reason is that the subpopulation strategy generates fitter solutions from

*H*and thus allows the EA to converge more quickly to a (suboptimal) population state containing many (copies of) optimal solutions from*H*. The subpopulation performs poorly for constraint settings that can cause a premature convergence toward search regions covered by*H*(see range 4<*o*(*H*)<7 in the top left plot of Figure 8).With respect to the order of the constraint schema

*o*(*H*) (top left plot of Figure 8), we observe that there is a value that has the largest negative effect on the optimization; it is around 4 for the repairing policies (forcing, regenerating, or the subpopulation strategy), and lower for penalizing and waiting.The nonmonotonic performance impact on the repairing policies, and partially on penalizing, is due to two competing forces: (1) the probability of activating a constraint, which decreases exponentially with

*o*(*H*), and (2) the probability that a constraint activation causes a shift in the search focus, which is greater for low orders. With respect to these two forces, an order of tends to have the worst trade-off. Penalizing performs better than the repairing policies in the range 2<*o*(*H*)<8 because the probability of having to penalize solutions increases exponentially with*o*(*H*). We remark that the order for which the worst trade-off is obtained is a function of the string length*l*and the population size (results not shown). In general, the worst trade-off shifts to only a slightly higher order than 4 as*l*and/or increase; the shift is so small because the probability of activating the ERC decreases exponentially with the order.For waiting, the performance only depends on the probability of activating a constraint, causing the performance to be poorest at

*o*(*H*)=1 and to improve exponentially thereafter.Longer epoch durations (larger

*V*) degrade performance of all methods because of potentially longer activation periods during epochs (see top right plot of Figure 8). With penalizing, forcing, regenerating, and the subpopulation strategy, a saturation point is reached beyond which further increases in the epoch duration have no effect. With waiting there is no saturation point because an increase in*V*results in longer waiting periods and thus a poorer performance. The reason that waiting performs best for small*V*(see range 0<*V*<14) is that the waiting periods are short in this regime, allowing an optimizer to converge quickly away from search regions covered by*H*and to prevent future constraint activations.When providing some recovery time, all policies improve in performance (see bottom left plot of Figure 8). The recovery speed depends on how much time is required to first introduce diversity among the previously constrained bits before being able to generate better solutions.

With later start times of the constraint time frame, or, equivalently, longer preparation times, there is a positive effect on the performance of all policies (see bottom right plot of Figure 8). This is because, with a commitment relaxation ERC, the later in the optimization one is, the less likely it is to enter a poor schema (a schema not on the optimization path) and activate a constraint; also, due to elitism, repaired solutions are less likely to be inserted into the population the later a constraint is activated. Thus, later constraints are less disruptive.

Figure 9 analyzes how constraint schemata that represent genetic material of different qualities affect the performance obtained with the constraint-handling policies. Although ERCs can have large effects on performance, one can see from the plots that the majority of the constraint schemata do not have an impact on the performance at all compared to the unconstrained performance (which is represented by the square at *o*(*H*)=#1s=0). These are schemata that are unlikely to cause an activation at all because they either do not lie on an optimizer's search path (schemata with few 1-bits) or are associated with a generally low probability of being met by any individual (higher order schemata around the straight line). Constraint schemata that represent poor genetic material (i.e., consist of many 0-bits) only have an impact if their order is low because an optimizer is searching in a different direction. Hence, constraint schemata that have a significant effect on the performance of both policies are either of low order or contain many 1-bits (schemata along and near the diagonal). For these constraint setting regimes, we observe a similar nonmonotonic effect on the performance of both policies as we have seen previously for schemata representing poor genetic material only (indicated here by the row of squares with #1s=0). The difference is that the more 1-bits there are in *H* (i.e., as we go up the rows of squares), the less apparent becomes this nonmonotonic (negative) performance effect. From Figure 10, which compares the performance of penalizing against the one obtained with waiting (left plot) and the subpopulation strategy (right plot), it is apparent that the performance differences between the policies observed previously is also largely maintained (as we go up the rows of squares).

On the MAX-SAT instances, the range of constraint schemata causing a performance impact is smaller; this is apparent from Figure 11. From the plot we again observe that while low-order schemata affect the performance significantly, higher-order schemata that represent near-optimal or optimal genetic material have little or no effect; the reason is that good genetic material is difficult to detect on this challenging problem, particularly within *T*=800 time steps.

### 7.2 Periodic ERC

With the insights we gained about the policies when applying them to commitment relaxation ERCs, we can understand their behavior in the presence of a second type of ERC, periodic ERCs, more easily.

Figure 12 shows how the performance of the different policies is affected by various constraint parameters of a periodic ERC on OneMax when *H* represents poor genetic material. Again, the results obtained on TwoMax and the MAX-SAT instances are similar and are shown in Appendix B. In comparison to commitment relaxation ERCs, the main difference we observe from Figure 12 is that waiting performs poorly and is also clearly dominated by penalizing for the majority of constraint settings. This is due to the fact that activation periods are set deterministically with periodic ERCs. In essence, waiting is likely to freeze the optimization during each activation period because of the low probability of generating solutions from *H* (regardless of the quality of the genetic material represented by *H*). With penalizing, one is also unlikely to evaluate any solutions during an activation period. However, the fact that the optimization is not frozen is beneficial because offspring are generated using a more up-to-date parent population during unconstrained optimization periods.

The fact that activation periods are set deterministically with periodic ERCs also means that high-order constraint schemata have an impact on the performance and this is the case regardless of the genetic material they represent. In fact, from Figure 13 we see that the average best solution fitness obtained with the subpopulation strategy on OneMax decreases rather smoothly for all orders as the quality of the represented genetic material worsens. Comparing this average best solution fitness with the fitness obtained by penalizing (left plot of Figure 14), we observe that repairing is particularly beneficial for high-order constraint schemata that represent very good genetic material. Waiting, in turn, is inferior to penalizing across all the different constraint schemata but in particular for low-order schemata representing very good genetic material (see right plot of Figure 14).

From the MAX-SAT instances (results not shown here) one makes similar observations as on OneMax. However, because of the difficulty of finding good genetic material, even when setting a small number of bits correctly, a smooth decrease in the average best solution fitness, and differences between the performance of policies, are more obvious for schemata of higher orders. Compared to OneMax, there are also small differences apparent in the results obtained on TwoMax; we show the results and discuss these differences in Appendix B.

## 8 Case Study

In this section, we demonstrate one way for selecting a suitable constraint-handling policy for a real-world application involving ERCs. For this, we use the same experimental setup as used in the instrument configuration application described by O'Hagan et al. (2005, 2007). We now give a more detailed description of this application.

### 8.1 Application Description

The application is concerned with optimizing the configuration parameters of a gas-chromatography mass spectrometer instrument so as to maximize its ability to separate and detect a complex (biological) sample. Resource constraints arise here because certain parameters of the instrument configuration cannot be changed widely from one experiment to the next, without incurring a severe change over cost associated with having to clean parts of the instrument (Dunn, 2011).^{12} The problem is defined over *l*=15 integer variables and a total search space of configurations. O'Hagan et al. (2005, 2007) cast this application as a multiobjective optimization problem but here we consider only one objective, namely the number of peaks detected (to be maximized). Two ERCs are used to model the parameters (here related to oven temperatures of the instrument) that must be prohibited from varying over too wide a range.

### 8.2 ERCs

To keep things simple, we assume that the maximal number of instrument configurations that can be tested on a day is fixed at *V*=15, and the total number of days available for the optimization is 150, resulting in available time steps or fitness evaluations. We set the first two variables to represent the oven temperatures, and the value 0 to represent the low temperatures (which cause the ERCs to arise). Hence, we have the following two commitment relaxation ERCs: and . Little is known of the fitness landscape before optimization begins but, as in O'Hagan et al. (2007), it would be expected that there is some degree of epistasis in the problem. We would not know which of the two schemata represent good or poor instrument configurations.

### 8.3 Offline Testing

As algorithm designers, we are now faced with the challenge to select an optimizer or constraint-handling policy for the above described ERCOP. A common approach is to first design appropriate problem functions that simulate the problem at hand, and then to test several algorithms offline on these functions and use the best one for the real-world problem. In this case study, we use landscapes as the test problems because they allow us to model different degrees of epistasis. We introduced this problem in Section 6.2, and also provided the settings of the problem parameters *N*, *K*, and in Table 3.^{13} The (four) selected settings allow us to cover landscapes featuring different degrees of epistasis and topologies. To cope with the integer representation, we need to modify the mutation operator, which shall now select a random setting from the set of possible ones for the instrument parameter that is to be modified. Otherwise, we can use the same algorithm setup as previously (see Algorithm 3).

Figure 15 shows the average best solution fitness obtained by the constraint-handling policies on the four landscape models as a function of the time counter (we do not show the standard error as it was negligible). The plots confirm what we observed in the experimental study, that similar patterns are obtained for different landscapes. In fact, we observe that a repairing policy (forcing, regenerating, or the subpopulation strategy) should be clearly favored over a waiting or penalizing policy. A trend is apparent that the subpopulation strategy and regenerating perform best in the initial stages of the optimization, while forcing is slightly better in the final part of the optimization. Also, the performance advantage of forcing at *t*=2250 over the other two repairing policies tends to increase with *K* and/or . The waiting policy does not perform well because the likelihood that either or both of the ERCs is active is relatively high, causing the optimization to freeze for too long. Although penalizing performs significantly better than waiting, the probability of penalizing many solutions and thus making little or no progress in the optimization is too high to match the performance of the repairing policies.

Based on these results, if a single policy is to be chosen, then we would select the policy of forcing for the real-world problem as it performs best after *t*=2250 time steps.

### 8.4 Testing on Real-World Landscape

To test this choice now on the real problem, we cannot run it on the real closed-loop problem (and certainly we would not be able to compare different policies). However, we are able to do the next best thing. Since we have available the actual fitness values collected during the real experimental trials reported by O'Hagan et al. (2005, 2007; i.e., the number of peaks detected) for around 315 instrument configurations tested, we can use this data to construct an interpolated fitness landscape using, for example, the Kriging approach (Cressie, 1993).^{14} In comparison to the landscapes considered for offline testing, the interpolated landscape is smoother and contains significantly fewer local optima; both aspects are attributed to the low number of data points.

Nevertheless, as is apparent from Figure 16, the performance of the policies on the interpolated landscapes is largely in alignment with the findings made on the landscapes: The repairing policies tend to perform better than waiting and penalizing, and, while the subpopulation strategy performs best at the beginning of the optimization, all repairing policies tend to perform identically at the end of the optimization. Clearly, in reality one is usually able to perform a single optimization run only meaning that the result might be different from the one we obtained from averaging over many runs. Nevertheless, this case study demonstrates how one can approach and solve an ERCOP beginning with the definition of the ERCs, modeling the simulated environment, selection of appropriate test functions, and finally comparing different optimizers and selecting the most suitable one to be used in the real-world application.

## 9 Summary and Conclusion

ERCOPs are problems where feasible solutions can be temporarily non-evaluable due to a lack of resources required for their evaluation. In this study, we proposed and analyzed various policies for dealing with non-evaluable solutions, and assessed them for two types of ERCs—commitment relaxation ERCs and periodic ERCs—using four test problems: OneMax, TwoMax, several MAX-SAT problem instances, and landscapes. In addition, a demonstration of how one may approach and solve a new ERCOP in the common case where knowledge of the fitness landscape is poor has been given in the form of a case study that used landscape data from a real closed-loop optimization problem.

We made several key observations from the experimental analysis that may determine how one should proceed with an ERCOP. Generally, ERCs affect the performance of an optimizer, and clear patterns emerge relating ERC parameters to performance effects. The later the constraint time frame of an ERC begins, the less disruptive is the impact on search. This could mean that investment in resources at early stages of the optimization should be preferred, where possible. For commitment relaxation constraints, the probability of activating the constraint is dependent on the order and the quality of the genetic material represented by the constraint schemata. Thus, to some degree, we may be able to predict the extent of impact if information about these schemata is available. Although periodic constraints are activated at regular time intervals (independently of the constraint schemata defining them), their impact on search still depends upon the order and quality of the constraint schemata in predictable ways. We also clearly see that the impact on EA performance is modulated by the choice of constraint-handling policy adopted. Which choice of policy is best is dependent on the details of the ERC, as we have set out in our results.

Importantly, we also observed that the patterns of performance impact seen on the same ERC type are quite similar across different search problems with different types of fitness landscape. In contrast, between the two ERCs, even on the same problem, the impact on performance is quite different. If this pattern turns out to be more generally true, then it is good news because we usually have more knowledge about the ERCs than about the fitness landscape. Therefore, we would not need to be right about the fitness landscape in order to choose the right policy. Nevertheless, as indicated in the case study, an a priori analysis of the problem at hand can be beneficial when it comes to selecting a suitable policy.

With respect to the impact of the individual ERC types, our analysis concluded that with commitment relaxation ERCs, we would tentatively say that repairing policies should not be used (i.e., the genotype of a solution should not be modified) for the majority of constraint settings, while they are appropriate policies with periodic ERCs. An exception may be the situation where the available resources are poor because, there, repairing solutions and inserting them into the population may cause an EA to prematurely converge to a suboptimal population state. In situations where it should be repaired, we can tentatively suggest that a policy that aims at creating fit repaired solutions should be preferred over a naïve forcing policy.

Although we are able to draw these conclusions, our study has of course been very limited, and there remains much else to learn about the effects of ERCs and how to handle them. Our immediate attention is turning to the design and tuning of intelligent search policies. In Allmendinger and Knowles (2011) we have already shown that an EA that learns offline (using a reinforcement learning agent) and online (using a multi-armed bandit algorithm) when to switch between the different static constraint-handling policies introduced here can yield better performance than the static strategies themselves. We are also looking at the treatment of problems where some solutions are more costly in time or resources to evaluate than others. The challenge there is that the optimizer not only has to account for the fitness of solutions, but also for their differential costs of evaluation.

## Acknowledgments

The authors would like to thank Julia Handl for critical comments of a draft of this manuscript. We would also like to thank Hans-Paul Schwefel for answering many questions about his experience with resource constraints. Finally, the authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.

## References

*NK*model

## Appendix A Commitment Composite ERCs

Figures 17 and 18 show the results obtained with the different constraint-handling strategies on several MAX-SAT instances and the TwoMax function, respectively. In general, the subpopulation strategy tends to obtain a higher average best solution fitness than forcing and regenerating because it converges more quickly to a suboptimal population state. Further evidence of this behavior is given by the plots in the right column of Figure 18, which show the probability that the majority of a population climbs up the optimal slope.

## Appendix B Periodic ERCs

Figures 19 and 20 show the results obtained with the different constraint-handling strategies on the MAX-SAT instances and the TwoMax function, respectively. Unlike on the MAX-SAT instances (and OneMax), waiting (and penalizing) tend to perform significantly better than the repairing policies for the majority of constraint parameter settings on the TwoMax problem. The reason is that, on this problem, repairing decreases the probability of climbing up the optimal slope significantly and that is already true for low orders *o*(*H*). From Figure 21, which compares the performance of waiting against the performance of the subpopulation strategy for different constraint schemata, we observe that waiting is able to maintain a significant performance advantage over a repairing policy (all three repairing policies performed similarly) for schemata below the straight line.

## Notes

^{1}

When an EA is used, closed-loop optimization is sometimes referred to as evolutionary experimentation or experimental evolution.

^{3}

We employ the classical notion of schemata, as used in the context of binary-coded genetic algorithms (Holland, 1975), to describe availability of resources. Section 4.1 will introduce the notion of schemata and its relationship to ERCs in more detail, and also indicate how this notion may be applied to nonbinary search spaces.

^{4}

Note the difference between a *null solution* and a solution with a fitness value of *null*: A null solution is submitted with the purpose to skip an evaluation, while a solution with fitness null is submitted with the purpose of being evaluated but then is not evaluated, due to a lack of resources.

^{5}

In this work we consider only these two ERC types, but we are aware of other types of which some are defined in the technical report (Allmendinger and Knowles, 2010a).

^{6}

We are looking at the optimization scenario where the ctf is a single continuous period of time but it is also realistic to have a ctf that is separated by unconstrained periods.

^{7}

Note that the Hamming distance between a solution **x** and a schema *H* is calculated based on the order-defining bits of *H* only.

^{8}

NB to implement the freezing period, the global time counter *t* is set directly to the end of the longest activation period of all currently violated ERCs (i.e., we do not actually submit null solutions to bridge the period); this step is realized in Line 27 of Algorithm 3, discussed in Section 6.1.

^{9}

http://people.cs.ubc.ca/~hoos/SATLIB/benchm.html; the names of the (10) instances are “uf50-218/uf50-0*.cnf”, where * is 1,2,4,6,8,11,19,22,24, and 25. These instances have a backbone with an order of 40 or greater, allowing us to analyze different ERC setups in the experimental study; in this study, we will run our EA on each instance for 50 runs to obtain the average performance on this problem type.

^{10}

We identified the backbones of our MAX-SAT instances from optimal solutions obtained from running a generational GA for 1,000 generations, 500 times independently.

^{11}

If not otherwise stated, then the order-defining bits of a constraint schema are chosen at random for each algorithm run; if an order-defining bit is a 1-bit, then the position of this bit is also chosen at random among the order-defining bits; that is, a constraint schema denoted by might actually be, for example, or in an algorithm run. Nevertheless, all policies will be optimizing subject to the same constraint schemata.

^{13}

Note that landscapes are binary problems by default. Transforming them to account for an integer representation is straightforward and involves the generation of *a ^{K}* different fitness values for each neighborhood, where

*a*is the alphabet size.

^{14}

In essence, Kriging is a technique that interpolates the fitness value of an unobserved data point from observations of values of nearby data points. To generate the fitness landscape we used a Kriging function, Krig(), from the *fields* package of the statistical software, R.