## Abstract

Evolution strategies (ESs) are powerful probabilistic search and optimization algorithms gleaned from biological evolution theory. They have been successfully applied to a wide range of real world applications. The modern ESs are mainly designed for solving continuous parameter optimization problems. Their ability to adapt the parameters of the multivariate normal distribution used for mutation during the optimization run makes them well suited for this domain. In this article we describe and study mixed integer evolution strategies (MIES), which are natural extensions of ES for mixed integer optimization problems. MIES can deal with parameter vectors consisting not only of continuous variables but also with nominal discrete and integer variables. Following the design principles of the canonical evolution strategies, they use specialized mutation operators tailored for the aforementioned mixed parameter classes. For each type of variable, the choice of mutation operators is governed by a natural metric for this variable type, maximal entropy, and symmetry considerations. All distributions used for mutation can be controlled in their shape by means of scaling parameters, allowing self-adaptation to be implemented. After introducing and motivating the conceptual design of the MIES, we study the optimality of the self-adaptation of step sizes and mutation rates on a generalized (weighted) sphere model. Moreover, we prove global convergence of the MIES on a very general class of problems. The remainder of the article is devoted to performance studies on artificial landscapes (barrier functions and mixed integer NK landscapes), and a case study in the optimization of medical image analysis systems. In addition, we show that with proper constraint handling techniques, MIES can also be applied to classical mixed integer nonlinear programming problems.

## 1. Introduction

The classical problem formulation of continuous parameter optimization reads: . Aside from this, in integer programming and combinatorial optimization discrete search spaces are considered, such as or . Most of the research in optimization theory is either focused on discrete search spaces or focused on continuous search spaces. However, there are many practical optimization problems from industry in which the set of decision variables involves continuous as well as discrete variables. These problems are classically referred to as mixed integer optimization problems (Dakin, 1965; Wolsey and Nemhauser, 1999).

Classical techniques from mathematical programming have focused mainly on problems of which the structure is clearly defined by a set of mathematical expressions. For instance, mixed integer nonlinear programming (MINLP; Bussieck and Pruessner, 2003) is a natural approach of formulating problems where continuous and discrete parameters need to be optimized simultaneously. It is based on the assumption that the search space can efficiently be explored using a divide and conquer scheme. As opposed to these white box optimization problems, less attention has been paid to black box optimization problems where the structure of the objective function is not fully known. This type of problem became very apparent in applications where objective functions are based on large-scale simulation models. In this article, we discuss a promising heuristic approach for solving such problems, the so-called mixed integer evolution strategies (MIES).

Evolution strategies (ESs; Bäck, 1996; Rechenberg, 1994; Schwefel, 1995) are robust global optimization strategies that are frequently used for continuous optimization. They belong to the class of randomized search heuristics and are based on principles of biological evolution theory, such as selection, recombination, and mutation. As flexible and robust search techniques, they have been successfully applied to various real-world problems. The dynamic behavior of ESs was subject to thorough theoretical and empirical studies. For instance, Schwefel compared it to traditional direct optimization algorithms and Bäck to other evolutionary algorithms. Theoretical studies of the convergence behavior of the ES were carried out for instance by Beyer (2001), Oyman (Oyman et al., 2000), and Rudolph (1997). The results indicate that the ES is a robust optimization tool that can deal with a large number of practically relevant function classes, including discontinuous and multimodal functions with many variables.

This article discusses the MIES, an extension of the ES for the simultaneous optimization of continuous, integer, and nominal discrete parameters. It combines mutation operators of ESs in the continuous domain (Schwefel, 1995), for integer programming (Rudolph, 1994), and for binary search spaces (Bäck, 1996). These operators have in common that they have certain desirable properties, such as symmetry, scalability, and maximal entropy, the details of which will be discussed later. The MIES was originally developed for optical filter optimization (Bäck; Schütz and Sprave, 1996), and chemical engineering plant optimization (Emmerich et al., 2000; Groß, 1999). Recently, as discussed in this contribution, it has been used in the context of medical image analysis (Li, Emmerich, Bovenkamp et al., 2006; Bovenkamp et al., 2006). In the latter work, the MIES convergence behavior on various artificial landscapes was studied empirically, including a collection of single-peak landscapes (Emmerich et al., 2000) and landscapes with multiple peaks (Li, Emmerich, Eggermont, and Bovenkamp, 2006; Li, Emmerich, Eggermont, Bovenkamp, et al., 2006).

This article summarizes and extends recent studies of MIES and their applications in various fields of science and technology. It extends the existing work by providing a self-contained detailed description of the algorithm and its motivation. Moreover, we provide theoretical studies on the optimality of the step size adaptation. A theorem on the convergence with probability one to the global optimum is derived for a very general class of functions. In addition, the article surveys existing applications of the MIES in systems design and extends empirical results by assessing the performance of the MIES on a set of classical MINLP benchmark problems. For the latter study, constraint handling methods are introduced into the framework.

In Section 2, we provide a mathematical definition of mixed integer parameter optimization and discuss methodologies applied for solving such problems. In Section 3, the MIES is introduced and we discuss its conceptual design. We study the behavior of the algorithm in artificial landscapes in Section 4. In addition, we introduce extensions of landscape models that are used typically either in binary optimization (NK-landscapes) or continuous optimization (barrier model). Furthermore, we apply MIES to the selected MINLP test problems by using constraint handling techniques, more specifically, penalty function methods. In Section 5, we report on the application of MIES to real-world problems. The article concludes with a summary discussion of the obtained results and discusses promising paths for future work.

## 2. Problem Definition and Related Work

*D*

^{(1)}, …. We do not encode nominal discrete variables as integers, in order to exploit the fact that there is no meaningful a priori ordering given for their domain. Furthermore, let denote an objective function to be minimized, and denote constraint functions. Then the mixed integer optimization problem is defined as: Here, the constants

*r*

^{min}

_{i}and

*r*

^{max}

_{i}define lower and upper bounds for the real variables and the constants

*z*

^{min}

_{i}and

*z*

^{max}

_{i}define lower and upper bounds for the integer variables. The symbol denotes tuple concatenation.

As we are interested in the black box scenario, we assume that the structure of *f*, and is unknown, or that we can only make some very general statements about it, such as continuity assumptions based on a similarity measure defined on the search space. As a result of this, it becomes harder to apply standard techniques from mathematical programming, so-called mixed integer nonlinear programming methods (Floudas, 1995a) to solve them deterministically, such as outer approximation (OA; Duran and Grossmann, 1986), branch and bound (BB; Borchers and Mitchell, 1994), and generalized Benders decomposition (Geoffrion, 1972).

In cases where mathematical programming techniques fail, metaheuristics for mixed integer optimization can be an interesting method to heuristically search for solutions that improve the objective function value. In order to solve mixed integer optimization problems with metaheuristics, two general approaches can be considered.

**Hierarchical Approach**. Separate the discrete problem from the continuous problem by optimizing the discrete variables in a higher level optimization problem and treating the optimization of the continuous parameters as a subproblem (Lohmann, 1992; Rechenberg, 1994; Utecht and Trint, 1994)**Simultaneous Approach**. Optimize discrete and continuous parameters simultaneously. In this approach, we consider that a similarity of parameter vectors due to an appropriate metric is equivalent to being positively correlated to the similarity in function values (Groß, 1999; Schütz and Sprave, 1996).

In the following, we will study the simultaneous optimization of parameters. There are two reasons why we favor this approach over the hierarchical approach. First, the hierarchical approach requires a suboptimization of continuous parameters for each set of discrete parameters chosen in the outer level. This can be very time-consuming. Second, in the hierarchical approach, it is difficult to consider correlations between discrete and continuous variables, as they are strictly separated from each other.

## 3. Mixed Integer Evolution Strategies

This section will provide a detailed description of the MIES, and then discuss and study its convergence behavior thoroughly.

### 3.1. Algorithm Description

The problem of designing an ES for a new type of search space breaks down into three subtasks: (1) definition of the generational cycle, (2) definition of the individual representation, and (3) definition of variation operators for the representation of choice. These subtasks will be discussed next.

The chosen algorithm will be an instantiation of a -ES for mixed integer spaces. It generalizes the more common -ES for continuous spaces.

#### 3.1.1. Generational Cycle

*P*(0) of individuals (parameter vectors taken from an individual space

*I*) and setting the generation counter

*t*to zero, the main loop of the algorithm starts. In a first step of the iteration, the algorithm generates the set

*Q*(

*t*) of new offspring individuals, each of them obtained by the following procedure.

Two individuals are randomly selected from *P*(*t*) and an offspring is generated by recombining these parents and then mutating (random perturbation) the individual resulting from the recombination. In the next step of the iteration, the offspring individuals are evaluated using the objective function to rank the individuals (the lower the objective function value, the better the rank). In case of a selection, the best individuals out of the union of the offspring individuals and the parental individuals are selected. In case of a selection, the best individuals out of the offspring individuals are selected. The selected individuals form the new parent population *P*(*t*+1). After this, the generation counter is incremented. The generational loop is repeated until the termination criterion^{1} is fulfilled.

#### 3.1.2. Representation

An individual in an ES contains the information about one solution candidate. The contents of parent individuals is inherited by offspring individuals and is subject to variation. The standard representation of a solution in an ES individual is a continuous vector. In addition, parameters of the probability distribution used in the mutation (such as standard deviations or step sizes) are stored in the individual. The latter parameters are referred to as strategy parameters.

*A*denotes the domain of strategy parameters and is defined as: An individual of a population

_{s}*P*(

*t*) in generation

*t*is denoted as: The so-called object variables determine the objective function value and thus the fitness of the individual (cf. Equation (1)). Here, denote real valued, integer valued, and nominal discrete variables. The so-called strategy variables are standard deviations used in the mutation of the real valued variables, and denote mean step sizes in the mutation of the integer parameters. Finally, denote mutation probabilities (or rates) for the nominal discrete object parameters. All these parameters are subject to inheritance, recombination, and mutation within Algorithm 1. Object variables are initialized uniformly within their domain. The number of mutation parameters is variable. In this work, we consider the case of one single mutation parameter for each type of variable, and the case of one mutation parameter for each single variable.

#### 3.1.3. Recombination

The recombination operator can be subdivided into two steps, selection of the parents and recombining the selected parents. In this article, we apply local recombination which works with two recombination partners. The two recombination partners and are chosen randomly, uniformly distributed, from the parental generation for each of the offspring individuals. The information contained in these individuals is combined in order to generate an offspring individual.

In ESs, two recombination types are commonly used: dominant and intermediate recombination (Schwefel, 1995). In a dominant (or) discrete recombination the operator chooses randomly one of the corresponding parental parameters for each offspring vector position. Intermediate recombination computes the arithmetic mean of both parental vectors and thus, in general, can only be applied for continuous object variables and strategy variables. In mixed integer ES, dominant recombination is used for the solution parameters while intermediate recombination is used for the strategy parameters.

#### 3.1.4. Mutation

**Accessibility.**Every point of the individual search space should be accessible from every other point by means of a finite number of applications of the mutation operator.**Feasibility.**The mutation should produce feasible individuals. This guideline can be crucial in search spaces with a high number of infeasible solutions.**Symmetry.**No additional bias should be introduced by the mutation operator.**Similarity.**Evolution strategies are based on the assumption that a solution can be gradually improved. This means it must be possible to generate similar solutions by means of mutation.**Scalability.**There should be an efficient procedure by which the strength of the impact of the mutation operator on the fitness values can be controlled.**Maximal Entropy.**If there is no additional knowledge about the objective function available, the mutation distribution should have maximal entropy (Rudolph, 1994). By this measure, a more general applicability can be expected.

Respecting these guidelines, the following operators were selected in Emmerich et al. (2000). The mutation of continuous variables is described in Algorithm 2. The new individual is obtained by adding a normally distributed random perturbation to the old values of the vector. The corresponding standard deviations are also subject to the evolution process and are thus multiplied in each step with a logarithmically distributed random number. Schwefel (1995) termed the resulting process as self-adaptive, because the adaptation of the mutation parameters is governed by an evolutionary process itself. The general idea behind self-adaptation is that if a set of different individuals is generated, all with a different probability distribution, the individual with the best object variables is also likely the one with the best probability distribution that led to the generation of these object variables. Thus the parameters of this probability distribution are also inherited by the offspring individual. We now contribute some remarks on the properties of the normal distributions, as they are responsible for the choice of this type of distribution for mutating continuous variables. Among all continuous distributions with finite variance on , the normal distribution possesses the maximum entropy. The multidimensional normal distribution is symmetrical to its mean value and unimodal. The step sizes represent standard deviations of the multi-dimensional normal distribution for each real valued variable. By variation of these standard deviations, the impact of the mutation on the variable vector in terms of similarity can be scaled. The mutation step size parameters are mutated by multiplication with a value drawn from a log-normal distribution. This way, increments are as likely as decrements, and bias is avoided. The global and local learning rates and are set according to the recommendations of Schwefel (1995). The last step of Algorithm 2 keeps the variables within the interval boundaries. It will be discussed later in this section.

*Z*

_{1}−

*Z*

_{2}of two geometrically distributed random variables, because the original geometric distribution is single tailed. The resulting distribution is depicted for the 1D case in Figure 1 and the 2D case in Figure 2. It is

*l*

_{1}-symmetrical

^{2}centered around the mean value 0, unimodal, and it has an infinite support. Thereby, symmetry and accessibility of the mutation are provided. The strength of the mutation for the integer parameters is controlled by a set of step size parameters which represent the mean value of the absolute variation of the integer object variables. Accessibility is provided in the strict sense, as every point can be reached from every other point in a single step with positive probability. The details of this mutation operator are found in Algorithm 3. Note that a geometrically distributed random value can be generated by transforming a uniformly distributed random value

*u*, using The width of the distribution can be controlled by the parameter , the mean value of the exponential distribution (cf. Equation (1); for a derivation, see Rudolph, 1994). Excepting the different distribution types used, it is very similar to the real valued mutation operator in Algorithm 2. Self-adaptation is used to control the width parameter(s). The mutation of the width parameter is done as in Rudolph (1994) using a global learning rate and local learning rate . Since the mean step size below 1 is not useful for integer problems, the mutated mean step size is set back to 1 whenever its mutation results in a value less then 1. To keep integer and continuous parameters in their feasible interval, a transformation function is applied to the mutation operators. Given a step size of the mutation, we may consider this to be the length a particle has to travel within the interval. Starting in the direction of the original unbounded mutation, whenever it meets with an interval boundary, the direction is inverted until the total length of the unbounded mutation has been covered. The implementation of this method is described in Algorithm 4 and an illustration of how the transformation function works can be found in Figure 3. Unlike other mappings, the limiting distribution of the random walk is the uniform distribution. This means that there are no preferred regions of the search space in the long term in case of neutral selection and thus bias is avoided. In order to prevent a loss of causality, the step size should be kept smaller than the interval width. We recommend a maximal step size of 0.2(

*b*−

*a*).

Finally, a mutation of the discrete parameters is carried out with a mutation probability as described in Algorithm 5. The probability is a strategy parameter for each discrete variable. Each new value is chosen randomly (uniformly distributed) out of the finite domain of values. The application of a uniform distribution is due to the principle of maximal entropy, since the assumption was made that there is no reasonable order defined on the sets of discrete values.

^{3}measure: with

*H*(

*true*)=1;

*H*(

*false*)=0 as a distance measure for guiding the design of the mutation operator.

*p*be equal to the probability of decreasing

*p*. Given an original mutation probability , the mutation is using the following expression: Here

*N*(0, 1) denotes a function that returns a normally distributed random number. We recommend employing a second transformation function () that keeps the value of

*p*in the interval [1/(3

*n*), 0.5]. The upper bound of 0.5 for the mutation probability is motivated by the observation that the mutation loses its causality once the probability exceeds the value of about 0.5. The lower bound is used to prevent the mutation probability from being too close to 0, in which case the MIES becomes insensitive to changes of that parameter. In the case where

_{d}*p*=1/(3

*n*), a discrete mutation can be expected in every third application of the mutation operator.

_{d}Depending on the discrete subspace, it is beneficial to use a single mutation probability instead of many individual mutation probabilities . In case of a single mutation probability, for each position of the discrete subvector, it is decided independently, but with the same probability, whether to mutate this position or not. By adapting the mutation rate, the average number of mutations on the discrete values is adjusted to the mean step size if the Hamming distance is considered as the metric.

### 3.2. Step Size Adaptation Study

We use a semiempirical analysis by approximating a local performance measure^{4} at a given location in the search space statistically for different step sizes, in order to find a step size that approximately maximizes the local progress of the MIES. This computation is repeated for different stages of the evolution and each time the empirically found optimal step size is compared to the current step size of the MIES found by self-adaptation. This approach is very flexible and can be applied for the analysis on a wide range of test problems for which the optimum is known and the evaluation of the objective function is fast.

*s*is the expectation of the distance covered toward the optimum in one mutation step starting from position

**x**. Consider a mutation operator mut

_{s}parameterized by the step size, an objective function with single optimum , and a position in the metric search space . Then In order to statistically approximate for a given pair we compute the sample mean for

*M*mutations: Depending on the parameter type, a different distance measurement is used to compute (cf. Equation (6)). For instance, the Euclidean distance is applied to continuous parameters, the Manhattan distance is applied to integer parameters, and for discrete parameters, we choose the overlap distance function.

The quality gain (Beyer, 2001) is an alternative local performance measure and is defined as (in case of minimization). It can be estimated in a similar manner as the progress rate. Both the quality gain and the progress rate measure local performance. We assume here that optimizing local performance also optimizes global performance. We believe that for many unimodal landscapes this is true, but, for example, for multimodal landscapes, this may not be the case.

The overarching research question is whether the MIES can find and keep a step size that maximizes the local progress rate or quality gain. Three studies addressing this question were conducted: (1) step size adaptation for problems with only one parameter type, (2) step size adaptation on a mixed problem with a single step size per parameter type, and (3) step size adaptation for multiple step sizes per parameter type for a mixed problem with differently scaled (weighted) variables.

In all three studies, the value of or is computed in every generation *t* for an equidistant set of *L* points in the interval , where is the best individual in generation *t*, and the upper boundary for continuous and integer variables and for mutation probabilities.

#### 3.2.1. Study 1

As a first study we test the MIES self-adaptation for a single step size per parameter type and for each parameter type independently on a simple problem landscape.

#### 3.2.1.1 Experimental Overview

As a test problem, the minimization of the sum of squares of the variables is used. Continuous, integer, and discrete spaces are studied separately. We compute progress rates for different step sizes at different stages of the evolution and compare them to the step size of the best MIES individual. The results are shown in Figure 4 where the plot shows the computed progress rates and the vertical lines indicate the step size of the current best MIES individual (obtained by self-adaptation).

In addition, we compute statistical summaries of repeated runs. For this, we measure for each generation *t* and run *r* the step size control efficiency , where *s* denotes the step size found with self-adaptation and the optimal step size. The results of this study are shown in the last column of Figure 4.

#### 3.2.1.2 Detailed Setup

For each generation *t*, the value of is computed for an equidistant set of *L*=40 points and *M*=5, 000 samples. The search space dimension is 15 and the variable range is [−1, 000, 1, 000] for the integer and continuous variables, and for the discrete variables. A ()-MIES was used with a single step size () for each parameter type. We used a learning rate of 0.5, which slightly deviates from the default setting which also worked, but takes longer to adapt. The MIES step sizes are initialized to 10% of the interval width for continuous and 33% of the interval width for integer variables. The mutation probabilities are initialized to 0.1. On purpose, we set the initial step size to a suboptimal value, to study whether self-adaptation is able to find a good step size.

#### 3.2.1.3 Results

Selected snapshots for different parameter types at different generations are shown in Figure 4. For all three parameter types, the self-adaptation is able to find the optimal step size and track it. The plots of the typical runs (first three columns from the left-hand side) show that the step size needs to be chosen from a relatively small interval in order to achieve progress rates of, say, above 75%. The position of the red dashed lines indicates that the ES succeeds in finding and tracking this interval. The statistics on the SCE add support for this based on repeated runs (50 runs for each plot). The box plots seen in the right column of Figure 4 for repeated runs demonstrate that the MIES finds in most runs, after less than 10 generations, step sizes for which it operates at an efficiency of more than 75% and is able to keep this efficiency. This proves that, at least for relatively simple—but nevertheless high-dimensional—problems, the self-adaptation of the step sizes is able to find and track the optimal step size. Note that the scale of the plots in Figure 4 changes during the run by orders of magnitude.

#### 3.2.2. Study 2

Secondly, we study the optimality of the step size adaptation for all three parameter types combined using a single step size per parameter type ().

#### 3.2.2.1 Experimental Overview

As a test problem, we use a mixed integer sphere problem. We use here the quality gain as a measure, as there are many possible ways of how to combine the distance functions for the three parameter types in order to compute the progress rate.

#### 3.2.2.2 Detailed Setup

The sample mean for the quality gain is computed every generation based on 200,000 samples. Grid search with a resolution of 10 (3D grid with 1,000 points) has been used to obtain the optimal combination for the three step sizes (, , *p*). The boundaries of the cube are chosen according to the intervals described in Study 1. Also here () was used. A plot of 20 repeated runs was computed.

#### 3.2.2.3 Result

Figure 5 shows that the step size vector that maximizes the quality gain is learned quickly and kept as the evolution passes through different orders of magnitude. This demonstrates that step sizes or mutation rates for different parameter types can be learned simultaneously.

#### 3.2.3. Study 3

Lastly, we study the use of multiple step sizes for each parameter on a weighted mixed integer sphere problem (). Here we will look at each parameter type independently.

#### 3.2.3.1 Experimental Overview

As a test problem, a 9D weighted mixed integer sphere problem is used (*n _{r}*=

*n*=

_{z}*n*=3).

_{d}^{5}Step sizes are studied separately. The results are shown in Figures 6 and 7, where both the step size optimizing the quality gain and the step sizes of the best MIES individual for continuous, integer, and discrete genes are plotted.

#### 3.2.3.2 Detailed Setup

Each value of in the coordinate search is estimated from *M*=5000 samples. For each parameter type *L*=40 steps have been taken. The same interval settings for the step sizes and control parameter settings for the MIES have been used as in Study 2. To reduce the effect of outliers, median values of 50 runs were presented in the plot.

#### 3.2.3.3 Results

Figure 7 shows that the multiple step sizes MIES is capable of keeping the step sizes for the continuous and integer variables close to the optimal value. The ordering of the MIES step sizes corresponds to the ordering of the optimal step sizes which are counter-proportional to the weight. Figure 6 shows that it may have some problems in the learning process of mutation probability when *n _{p}*=

*n*. In contrast, learning of a single mutation rate for the nominal discrete variables is possible (see Figures 4 and 5). These findings are consistent with previous work (see Schütz, 1994, 86–89). As a consequence, a single step size mode for nominal discrete parameters is recommended.

_{d}### 3.3. Global Convergence Properties

Given that certain regularity requirements are met, it is possible to prove strong probabilistic convergence of the MIES for toward the global optimum. The theorem generalizes a theorem on the ES for continuous spaces by Born (1978). Both the plus and the comma strategy are considered, and for the convergence analysis the best solution found so far, that is, **x**^{t}_{best}, will be considered.

*A function**is called* regular, *if*:

*f is continuous*,*is a closed set*,*the set**is nonempty*.

*Let**and let**a function. By**we denote the restriction of the function g defined by**and**where*.

Given these technical preliminaries, we can state the following theorem.

*Let*

*denote a mixed integer function, and*

*is a regular function for at least one*

*which is optimal. Then for a*

*MIES with lower limit*

*for the step sizes and mutation rates, the series*

*converges with probability one to the global minimum of f, that is*, Here

*t*represents the number of iterations, and denotes the global optimum.

*f*

^{t}_{best}for , the existence of which we have shown above. Then let Given the assumption and hence , it follows that is a nonempty set, where denotes an optimal setting for for which is regular. Then is nonempty because of the assumption of regularity of for any optimal .

Thus there exists a closed *n*-dimensional ball with and center that .

Now, let us compute the probability of the event that the mutation of the discrete and continuous variables yields a point in , given some arbitrary parent , where denotes the discrete part of a solution. This can be computed as the joint probability for the following two independent events:

The mutation of real vector generates .

The mutation of discrete part of the solution generates .

*q*generations as (where ): where denotes the number of offspring per generation. Hence, With Equation (9) and Equation (11) we get a contradiction to our assumption that . In other words, any vector with a distance will be improved as with probability one.

## 4. Empirical Studies on Artificial Landscapes

In this section, empirical results demonstrate the convergence behavior of the proposed MIES algorithm. First, test results for scalable theoretical test functions are illustrated. This gives the reader the opportunity to compare results of this EA with that of other optimization algorithms, for instance, with standard ES. Since rather simple test functions have been chosen, the experiments give hints on the minimal running time for practical problems of similar dimension. The first problem—the barrier functions (Li, Emmerich, Bovenkamp, et al., 2006)—introduce multimodality in a gradual way, as for the second test problem correlation/interaction between variables of different types can be introduced in a gradual way.

### 4.1. Barrier Functions

*C*=0 the ordering of the variable values corresponds to the ordering of values of

*A*(

*y*). If the value of

*C*is slightly increased, still part of the order will be preserved under the mapping

*A*, and thus similarity information can be exploited. Then a barrier function is computed: Here, denotes a set of

*i*permutations of the sequence , each of which being a random permutation fixed before the run. This construction prevents the value of the nominal value

*d*from being quantitatively (counter)correlated with the value of the objective function

_{i}*f*. Such a correlation would contradict the assumption that

*d*are nominal values. Whenever a correlation between neighboring values can be assumed, it is wiser to assign them to the ordinal type and treat them accordingly.

_{i}The parameter *C* controls the ruggedness of the resulting function with regard to the integer space. Higher values of *C* result in more rugged landscapes with many barriers. To get an intuition about the influence of *C* on the geometry of the function, we included plots for a two-variable instantiation of the barrier function in Figure 8 for *C*=20, 100, and 500. Intuitively, barrier functions with a higher control parameter *C* may have many local optima and a search procedure can easily get trapped by them. As we can see from these plots, when the control parameter *C* increases, the landscape becomes more rugged. For instance, the landscape of *C*=500 shows many more barriers compared to *C*=20.

#### 4.1.1. Results

Suggested by our former studies (Emmerich et al., 2000; Li, Emmerich, Bovenkamp, et al., 2006; Li, Emmerich, Eggermont, and Bovenkamp, 2006), the following MIES settings are chosen for the experiments on the barrier problems: () for the population and offspring sizes, and () for the step size mode.^{6} Step sizes were initialized to 10% of the interval width, and probabilities were initialized to 0.1. Since ESs are stochastic algorithms, in the empirical experiments we created 10 instantiations^{7} for each control parameter *C*, and for each of them we let the algorithm perform 20 repeated runs (in total runs for each value of *C*).

Figure 9 shows average best fitness values found by one step size (4,28) MIES and one step size (4, 28) ES^{8} on barrier functions with different control parameters *C*, respectively. As we can see, it is more difficult for both MIES and ES to find the global optimum on barrier functions with a higher *C* value. This observation supports our finding from Figure 8: the landscape with a larger *C* value is more rugged and it is more challenging to tackle.

Based on our algorithm design in Section 3, MIES is supposed to be more efficient for exploring the mixed integer landscapes compared to a standard ES. The corresponding box plot for best fitness values found by both MIES and ES in the last generation (=100) is shown in Figure 10.

When *C*=20 or 100, ES performs slightly better than MIES ( and , respectively). An explanation could be that constructed landscapes with *C*=20 or 100 are still simple, which gives standard ES a chance to fully explore the search space. However, in the case of higher *C* values, MIES shows the advantage over standard ES. For *C*=300, none of the strategies outperforms the other. For *C*=500 and *C*=1, 000 the overall performance of MIES is significantly better than that of the standard ES ( and , respectively). The *p* values are based on a left-sided *t*-test with .

### 4.2. Mixed Integer NK Landscapes

Mixed integer NK landscapes are introduced as test problems for which correlation between variables of different types can be introduced in a controlled way. NK landscapes (NKL, also referred to as NK fitness landscapes), introduced by Kauffman (Kauffman, 1987), were devised to explore the way that epistasis^{9} controls the ruggedness of an adaptive landscape. They are particularly useful as test problem generators for genetic algorithms (GAs) to understand the dynamics of evolutionary search. The ruggedness and the degree of interaction between variables of NKL can be easily controlled by two tunable parameters: the number of genes *N* and the number of epistatic links of each gene to *K* other genes. Moreover, for given values of *N* and *K*, a large number of NK landscapes can be created at random.

Mixed-integer NK-landscapes (MI-NKL) were introduced in Li, Emmerich, Eggermont, Bovenkamp, et al. (2006), and are an extension of NKL from the traditional binary case to a mixed variable case with continuous, nominal discrete, and integer variable types. The resulting test function generator is a suitable test model for our mixed integer evolution strategy. For the continuous and integer variables, the binary NKL is extended to an *N*-dimensional hypercube [0, 1]^{N}. Therefore, all continuous and integer variables are normalized between [0, 1]. Whenever the continuous or integer variable takes values at the corners of the hypercube, the value of the corresponding binary NKL is returned. For values located in the interior of the hypercube or its delimiting hyperplanes, we employ a multilinear interpolation technique that achieves a continuous interpolation between the function values at the corners. Note that a higher order approach is also possible but a multilinear approach was chosen for simplicity and ease of programming. Moreover, the theory of multilinear models introduces intuitive notions for the effect of single variables and interaction between multiple variables of potentially different types.

To introduce nominal discrete variables in an appropriate manner, a more radical change to the NKL model is needed. In this case it is not feasible to use interpolation, as this would imply some inherent neighborhood defined on a single variable's domain , , which, by definition, is not given for the nominal discrete case. Therefore, the NKL is extended from a binary to an *n*-ary case to take into account the special characteristics of nominal discrete variables.

Initial experiments demonstrate the applicability of the mixed integer NKL problem generator. The results obtained with a mixed integer ES indicate the increase in difficulty for finding the global optimum as *K* grows.

*X*(

_{d}*L*=3), so the hypercube is reduced to three parallel planes, and the value of the discrete gene decides which plane is chosen. More concretely, assuming the individual has the following values:

*X*=0,

_{d}*X*=0.4,

_{i}*X*=0.8, the value of the discrete parameter

_{r}*X*determines which square is chosen (

_{d}*X*=0). The value for each corner is based on the fitness matrix in Figure 12 (bold displayed). As mentioned in the previous section, we calculate the fitness value of this individual as follows:

_{d}#### 4.2.1. Experimental Results on MINKL

We choose a population size of 4, offspring size of 28, and comma strategy for our experiments. The maximum number of generations is set to 100. Similar to experiments for barrier functions, in order to evaluate algorithm performance, we generated 10 problem instantiations for each so that it is still feasible to find the global optimum by evaluating all bit strings of length *N*=15. Each generated problem consists of five continuous (*N _{r}*=5), five integer (

*N*=5), and five nominal discrete (

_{z}*N*=5) variables. The continuous variables are in the range [–10,10], the integer valued variables are also in the range [–10,10], and we used for the nominal discrete variables (Booleans).

_{d}*x*axis shows the number of generations while the

*y*axis shows the average error (over all experiments). As can be seen, for both algorithms an increase in

*K*results in an increase in error, which indicates the problem difficulty increases with

*K*. The box plot for the last generation's errors of both MIES and ES can be seen in Figure 14.

There, overall performance between standard ES and MIES algorithms is compared. As can be seen, in the case of *K*=1, 3, 5 the MIES obtained significantly better results than the standard ES (, , and , respectively). For *K* = 10 and *K* = 14 none of the two strategies is significantly better. As *K*=14 means that each variable is connected to all other variables, the search space becomes extremely complex and it becomes too hard for both algorithms. In this circumstance, MIES and ES show almost the same performance. Again, a left-sided *t*-test was employed for the comparisons.

### 4.3. Applications to MINLP Problems

As we emphasized from the very beginning of this article, the MIES is particularly capable of yielding good solutions to black box mixed integer parameter optimization problems in high-dimensional spaces. The experimental results, which are from both synthetic test problems (Section 4) and real-world applications (Section 5), do underpin the argument. In this section, we will show that by utilizing the constraint handling techniques such as penalty function method, MIES can also produce good results on some classical MINLP problems.

#### 4.3.1. Penalty Function Method

^{10}In practice, the penalty function method has been widely used to assist evolutionary algorithms in dealing with constraint optimization problems. Instead of using the original objective function to evaluate individuals in a population, a penalty term is devised and included in the evaluation phase. As an immediate result of this modification, the search landscape is changed. An individual that is a good candidate according to the original objective function might become a bad solution because of the penalty function. In this work, we adopt the penalty function as defined in Joines and Houck (1994), where is the original objective function, stands for the penalty function regarding the given constraints, and

*r*

^{(t)}should be treated as coefficients of the . More specifically,

*t*is the iteration/generation index and thus the

*r*

^{(t)}is not constant but grows with

*t*. Consequently,

*r*

^{(t)}will bring more pressure on unfeasible solutions at the end of the search process. From this perspective, this method can be called the dynamic penalty method. The penalty component is controlled by

*C*, , and . These parameters are constants and independent of the total number of constraints. The choice of the combination of these parameters will decide the quality of the solution. For the experimental results presented below, will be used to indicate the value choice of and .

#### 4.3.2. Global Competitive Ranking and Selection

*i*based on the objective and penalty functions respectively.

*P*denotes the probability that a comparison is done through the objective function only. For more detailed information, we recommend Runarsson and Yao (2003).

_{f}#### 4.3.3. Experimental Results

For our experiments, we chose six benchmark functions from the literature and applied (100, 700)-MIES to them. The details of these functions are given in the appendix. For each of the benchmark functions, 200 independent runs (with a different random seed) are performed. The experimental results are presented in Table 1 as well as corresponding settings. The function name, the best feasible objective value from the literature, the median, standard deviation, and median number of generations of each run to find the best solution are displayed.

Function . | Best literature . | Median . | SD
. | G_{median}
. | Penalty () . |
---|---|---|---|---|---|

f_{1} | 2.0000 | 2.0000 | 10 | ||

f_{2} | 7.6672 | 7.6672 | 31 | ||

f_{3} | 1.0765 | 1.0765 | 32 | ||

f_{4} | 4.5796 | 4.5796 | 34 | ||

f_{5} | -17.0000 | -17.0000 | 29 | ||

f_{6} | 4.5796 | 4.5796 | 188 |

Function . | Best literature . | Median . | SD
. | G_{median}
. | Penalty () . |
---|---|---|---|---|---|

f_{1} | 2.0000 | 2.0000 | 10 | ||

f_{2} | 7.6672 | 7.6672 | 31 | ||

f_{3} | 1.0765 | 1.0765 | 32 | ||

f_{4} | 4.5796 | 4.5796 | 34 | ||

f_{5} | -17.0000 | -17.0000 | 29 | ||

f_{6} | 4.5796 | 4.5796 | 188 |

As one can see from Table 2, MIES with a global competitive ranking performed well for all test functions and can find the global optima in most cases. For functions *f*_{1}−*f*_{5}, we use and *P _{f}*=0.45 for all experiments. But for function

*f*

_{6}, we chose the different setting and

*P*=0.40. This is because compared to other functions, constraints of function

_{f}*f*

_{6}are more complex and the problem is more challenging. The statistic on , the median convergence time to the global optimum, clearly indicates that it is harder for the MIES to find the global optima for function

*f*

_{6}than for the other test functions.

Name . | Type . | Range . | Dependencies . | Default . |
---|---|---|---|---|

Maxgray | Integer | [2, 150] | >mingray | 35 |

Mingray | Integer | [1, 149] | <maxgray | 1 |

Connectivity | Discrete | {4,6,8} | 6 | |

Relativeopenings | Discrete | {false, true} | true | |

Nrofcloses | Integer | [0, 100] | used if not relativeopenings | 5 |

Nrofopenings | Integer | [0, 100] | used if not relativeopenings | 45 |

Scanlinedir | Discrete | {0,1,2} | 1 | |

Scanindexleft | Integer | [–100, 100] | <scanindexright | –55 |

Scanindexright | Integer | [–100, 100] | >scanindexleft | 7 |

Centermethod | Discrete | {0,1} | 1 | |

Fitmodel | Discrete | {ellipse, circle} | ellipse | |

Sigma | Continuous | [0.5, 10.0] | 0.8 | |

Scantype | Discrete | {0,1,2} | 0 | |

Sidestep | Integer | [0, 20] | 3 | |

Sidecost | Continuous | [0.0, 100] | 5 | |

Nroflines | Integer | [32, 256] | 128 |

Name . | Type . | Range . | Dependencies . | Default . |
---|---|---|---|---|

Maxgray | Integer | [2, 150] | >mingray | 35 |

Mingray | Integer | [1, 149] | <maxgray | 1 |

Connectivity | Discrete | {4,6,8} | 6 | |

Relativeopenings | Discrete | {false, true} | true | |

Nrofcloses | Integer | [0, 100] | used if not relativeopenings | 5 |

Nrofopenings | Integer | [0, 100] | used if not relativeopenings | 45 |

Scanlinedir | Discrete | {0,1,2} | 1 | |

Scanindexleft | Integer | [–100, 100] | <scanindexright | –55 |

Scanindexright | Integer | [–100, 100] | >scanindexleft | 7 |

Centermethod | Discrete | {0,1} | 1 | |

Fitmodel | Discrete | {ellipse, circle} | ellipse | |

Sigma | Continuous | [0.5, 10.0] | 0.8 | |

Scantype | Discrete | {0,1,2} | 0 | |

Sidestep | Integer | [0, 20] | 3 | |

Sidecost | Continuous | [0.0, 100] | 5 | |

Nroflines | Integer | [32, 256] | 128 |

It is not the intention of this article to argue that MINLP methods should be replaced by MIES in case of explicitly given functions. In this case, efficient MINLP methods such as Branch and Bound (Dakin, 1965) are often available that, unlike the MIES, can guarantee the optimality of solutions. Rather, it is interesting to consider the MIES in cases where constraint functions are only defined implicitly, that is, the constraint functions must be treated as a black box function by the algorithm.

## 5. Real-World Applications

As we addressed in Section 1, mixed integer parameter optimization problems occur quite frequently in practice. This section summarizes results for three typical parameter optimization tasks: intravascular ultrasound image analysis, multilayer optical coatings and chemical plant optimization. They all satisfy the requirements for mixed integer parameter optimization which is given by definition 1. We would like to mention some important characteristics of these practical applications as follows:

The objective function is multimodal, high-dimensional, box-constrained in the continuous part, and with a feasible region of the search space that is characterized by implicit nonlinear constraints.

The problem is treated as a black box optimization problem, due to the complexity of the evaluation procedure (e.g., simulator based evaluation with commercial simulators).

Because of different parameter types, a standard representation such as binary strings or real-valued vectors is difficult to apply to these problems.

### 5.1. Intravascular Ultrasound Image Analysis

Intravascular ultrasound (IVUS) is an advanced technique used to get real-time high resolution tomographic images from the inside of coronary vessels and other arteries. To gain insight into the status of an arterial segment, a so-called catheter pullback sequence is carried out. A catheter ( 1 mm) with a miniaturized ultrasound transducer at the tip is inserted into a patient's artery and positioned downstream of the segment of interest. The catheter is then pulled back in a controlled manner, using motorized pullback (1 mm/s), during which images are acquired continuously. IVUS provides real-time high resolution tomographic images of the coronary vessels wall, and is able to show the presence or absence of compensatory artery enlargement. IVUS allows precise tomographic measurement of the lumen area and plaque size, distribution, and, to some extent, composition of the plaque. An example of an IVUS image is shown in Figure 15.

#### 5.1.1. Problem Definition

In Bovenkamp et al. (2004, 2009) a state of the art multiagent system is developed to detect lumen, vessel, shadows, sidebranches, and calcified plaques in IVUS images. IVUS image processing agents interact with other agents through communication, act on the world by controlling and adapting image processing operations and perceive that same world by accessing image processing results. Agents thereby dynamically adapt the parameters of low-level image segmentation algorithms based on knowledge of global constraints, contextual knowledge, local image information, and personal beliefs. The lumen agent, for example, encodes and controls an image processing pipeline (Figure 16) which includes binary morphological operations, an ellipse fitter, and a dynamic programming module, and it determines all relevant parameters. Generally, agent control allows the underlying segmentation algorithms to be simpler and to be applied to a wider range of problems with a higher reliability.

Although the multiagent system has been shown to offer lumen and vessel detection comparable to human experts (Bovenkamp et al., 2004), it is designed for symbolic reasoning, not numerical optimization. Further, it is almost impossible for a human expert to completely specify how an agent should adjust its feature detection parameters in each and every possible interpretation context. As a result, an agent has only control knowledge for a limited number of contexts and a limited set of feature detector parameters. In addition, this knowledge has to be updated whenever something changes in the image acquisition pipeline. Therefore, it would be much better if this knowledge could be acquired by learning the optimal parameters for different interpretation contexts automatically.

#### 5.1.2. Parameter Optimization for Lumen Feature Detector

To determine whether or not MIES can be used as an optimizer for the parameters of feature detectors in the multiagent image analysis system, we applied MIES to the lumen feature detector (Li, Emmerich, Bovenkamp, et al., 2006; Li, Emmerich, Eggermont, et al., 2006). This detector was chosen because it can produce good results in isolation without additional information about side branches, shadows, plaques, and vessels.

*D*(

*c*

_{1},

*c*

_{2}) between the contour

*c*

_{2}found by the lumen feature detector and the desired lumen contour

*c*

_{1}drawn by a human expert.

^{11}

*D*(

*c*

_{1},

*c*

_{2}) is defined as follows: This measure penalizes each contour point which is more than a distance away from

*c*

_{2}and is proportional to the distance. The reason for allowing a small dissimilarity between the two contours is that even an expert physician will not draw the exact same contour twice.

#### 5.1.3. Experimental Results

For the experiments we used five disjoint sets of 40 images. The images were acquired with a 20 Mhz Endosonics Five64 catheter using motorized pullback (1 mm/s). The image size is pixels (8 bit grayscale) with a pixel size of 0.02602 mm^{2}. For the fitness function, we took the average dissimilarity over all 40 images with set to 2.24 pixels and nrofpoints = 128 (see Equation (20)).

For each of the five datasets, we used (4, 28) MIES and ES algorithms and limited the number of iterations to 25, which resulted in 704 fitness evaluations for each dataset. The training results are displayed in Table 3, where MIES solution 1 was trained on dataset 1 by the MIES algorithm; MIES solution 2 was trained on dataset 2, and so on.

. | Dataset 1 . | Dataset 2 . | Dataset 3 . | Dataset 4 . | Dataset 5 . | |||||
---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | |||||

. | Fitness . | SD
. | Fitness . | SD
. | Fitness . | SD
. | Fitness . | SD
. | Fitness . | SD
. |

Expert | 395.2 | 86.2 | 400.2 | 109.2 | 344.8 | 66.4 | 483.1 | 110.6 | 444.2 | 90.6 |

MIES 1 | 151.3 | 39.2 | 183.6 | 59.0 | 201.0 | 67.1 | 280.9 | 91.9 | 365.5 | 105.9 |

MIES 2 | 160.3 | 45.9 | 181.4 | 58.7 | 206.7 | 70.3 | 273.6 | 74.5 | 372.5 | 99.2 |

MIES 3 | 173.8 | 42.1 | 202.9 | 69.1 | 165.6 | 47.2 | 250.7 | 80.2 | 446.4 | 372.8 |

MIES 4 | 154.0 | 51.7 | 243.7 | 67.7 | 198.8 | 80.1 | 186.4 | 59.0 | 171.3 | 57.8 |

MIES 5 | 275.7 | 75.6 | 358.4 | 76.9 | 327.7 | 56.7 | 329.1 | 82.0 | 171.8 | 54.5 |

ES 1 | 149.8 | 46.0 | 185.1 | 60.1 | 203.1 | 66.6 | 270.2 | 69.8 | 358.1 | 100.1 |

ES 2 | 157.8 | 45.2 | 183.0 | 57.3 | 202.8 | 68.1 | 270.7 | 69.4 | 364.2 | 98.5 |

ES 3 | 212.6 | 46.3 | 243.3 | 72.3 | 182.3 | 49.7 | 252.9 | 58.4 | 756.8 | 734.4 |

ES 4 | 247.2 | 143.0 | 255.7 | 85.4 | 176.4 | 49.2 | 232.9 | 62.4 | 941.0 | 795.2 |

ES 5 | 217.0 | 51.6 | 225.3 | 68.5 | 292.8 | 73.6 | 330.7 | 78.0 | 324.7 | 96.0 |

. | Dataset 1 . | Dataset 2 . | Dataset 3 . | Dataset 4 . | Dataset 5 . | |||||
---|---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | |||||

. | Fitness . | SD
. | Fitness . | SD
. | Fitness . | SD
. | Fitness . | SD
. | Fitness . | SD
. |

Expert | 395.2 | 86.2 | 400.2 | 109.2 | 344.8 | 66.4 | 483.1 | 110.6 | 444.2 | 90.6 |

MIES 1 | 151.3 | 39.2 | 183.6 | 59.0 | 201.0 | 67.1 | 280.9 | 91.9 | 365.5 | 105.9 |

MIES 2 | 160.3 | 45.9 | 181.4 | 58.7 | 206.7 | 70.3 | 273.6 | 74.5 | 372.5 | 99.2 |

MIES 3 | 173.8 | 42.1 | 202.9 | 69.1 | 165.6 | 47.2 | 250.7 | 80.2 | 446.4 | 372.8 |

MIES 4 | 154.0 | 51.7 | 243.7 | 67.7 | 198.8 | 80.1 | 186.4 | 59.0 | 171.3 | 57.8 |

MIES 5 | 275.7 | 75.6 | 358.4 | 76.9 | 327.7 | 56.7 | 329.1 | 82.0 | 171.8 | 54.5 |

ES 1 | 149.8 | 46.0 | 185.1 | 60.1 | 203.1 | 66.6 | 270.2 | 69.8 | 358.1 | 100.1 |

ES 2 | 157.8 | 45.2 | 183.0 | 57.3 | 202.8 | 68.1 | 270.7 | 69.4 | 364.2 | 98.5 |

ES 3 | 212.6 | 46.3 | 243.3 | 72.3 | 182.3 | 49.7 | 252.9 | 58.4 | 756.8 | 734.4 |

ES 4 | 247.2 | 143.0 | 255.7 | 85.4 | 176.4 | 49.2 | 232.9 | 62.4 | 941.0 | 795.2 |

ES 5 | 217.0 | 51.6 | 225.3 | 68.5 | 292.8 | 73.6 | 330.7 | 78.0 | 324.7 | 96.0 |

As Table 3 shows, all MIES solutions performed better than or equal to the default expert solution. Furthermore MIES solution 4 performs slightly better on dataset 5 than MIES solution 5, which was trained on this dataset. However, a Student *t*-test with *p*=.05 shows that this difference is not statistically significant.

Visual inspection of the results of the application of MIES parameter solution 4 to all 200 images shows that this solution is a good approximator of the lumen contours as can be seen in Figure 17 (bottom row). When we compare the contours generated with MIES solution 4 to the expert-drawn contours, we see that they are very similar and in some cases the MIES contours actually seem to follow the lumen boundary more precisely. Apart from being closer to the expert-drawn contours, another major difference between the MIES-found contours and the contours detected with the default parameter settings is that the MIES solutions are more smooth (see Figure 17, top and bottom row).

### 5.2. Further Application of MIES

Further applications of the MIES in practical problem domains are reported in the literature. Next, a brief summary of these is given.

#### 5.2.1. Optimization of Multilayer Optical Coatings

The first applications of MIES were reported in the domain of multilayer optical coating (MOC) design. In this first version of the MIES, nominal discrete parameters were used to represent the colors of the parallel layers of an optical filter and the width of these layers was represented by continuous variables (cf. Figure 18). The objective of MOC design is to find a sequence of layers of certain materials and certain thicknesses, such that all unwanted frequencies are being cut off, while the wanted frequencies pass without any reflection. Due to the high degree of nonlinearity of the objective function, it is difficult to find optimal solutions with classical methods, and the MIES produced competitive results in comparison to state of the art techniques in this domain (Bäck and Schütz, 1995).

#### 5.2.2. Optimization of Chemical Engineering Plants

A classical domain where mixed integer optimization is applied is the optimization of chemical engineering plants. MIES provides for optimization in combination with commercial steady-state flowsheet simulators (ASPEN PLUS; Emmerich et al., 2000; Henrich et al., 2007). Due to the black box character of objective function evaluations in this domain, classical mixed integer optimization techniques that require model equations of the unit operations are difficult to apply here, and MIES provides a more flexible and robust approach. Optimization parameters of a chemical engineering plant comprise continuous variables (pressure differences, temperatures), integer variables (e.g., number of stages of a distillation column), and nominal discrete variables (e.g., the type of coolant used in a condenser; cf. Figure 19). Successful applications of the MIES in chemical engineering plant optimization are reported for the optimization of a hydrodesalkylation (HDA) plant in Emmerich (1999), and for the economical optimization of non-sharp separation processes (Henrich et al., 2007).

## 6. Conclusions and Outlook

In this article, we described MIESs and their application to real-world parameter optimization problems.

As opposed to standard ESs, which are used for continuous optimization, MIESs are designed for the optimization of combinations of continuous, integer, and nominal discrete variables. For this purpose, MIESs use specific mutation operators for the different types of decision variables that are chosen based on common EA-design guidelines such as symmetry, maximal entropy, and scalability.

We prove that for regular functions, the MIES converges with probability one to the global optimum. Empirical studies show that the MIES is able to learn and simultaneously track optimal step sizes and mutation rates for different parameter types, thereby maximizing its local progress. Studies on a weighted quadratic model show that for integer vectors and continuous vectors, the mutation step sizes can be learned independently and their ordering reflects the scaling of the variables.

The only self-adaptation scheme that turned out to be problematic was the learning of multiple mutation rates for the nominal discrete case. Future research should further investigate this problem. However, the capability of the MIES to learn a single, global mutation rate for nominal parameters is confirmed by our results.

To test the algorithm's performance and determine favorable default settings, we applied the MIES to several mixed integer benchmark problems, including problems with constraints. Moreover, we compared MIES to the standard (continuous) ES using the simple truncation of continuous variables. It turns out that the MIES approach has a much higher convergence reliability.

MIES have also been applied to several real-world applications: medical image processing, multilayer optical coatings and chemical plant optimization. The results show that the MIES is a valuable technique for high-dimensional mixed integer optimization problems. It is interesting in particular for black box problems that cannot be solved with standard mathematical programming techniques.

A promising direction of future research will be to evaluate approaches that enable us to integrate more problem knowledge, such as exploiting dependencies between different variables. For a first step in this direction, see Emmerich et al. (2008), where heterogeneous Bayesian networks for representing mutation probability distributions are proposed. Moreover, extensions of the approach, for example, for multi-objective, robust, or dynamic optimization, will be of interest. In this context, we also refer to recent extensions to multimodal optimization using niching (Li, Eggermont, et al., 2008) and for optimization with expensive evaluations by using mixed integer metamodels in Li, Emmerich, et al. (2008).

## Acknowledgments

This research was partially financed by the Netherlands Organization for Scientific Research (NWO) under project 635.100.006 SAVAGE. Michael Emmerich acknowledges financial support by the DELiVER project, STW. The authors acknowledge the support of Andre H. Deutz checking the mathematical notations and theorems. We also like to thank the reviewers for their valuable comments.

## References

## Appendix: Test Function Suite

To test the MIES algorithm on mixed-integer nonlinear programming (MINLP) problems we tested it on the following MINLP problems from literature:

(Kocis and Grossman, 1988)

subject to: ,

with .

(Kocis and Grossman, 1988)

subject to:

*r*^{2}_{1}+*d*_{1}=1.25,*r*^{1.5}_{2}+1.5*d*_{2}=3,,

with .

(Floudas, 1995b)

subject to: ,

with .

*f*_{4}=6+(*r*_{1}−1)^{2}−(*r*_{2}−2)^{2}−(*r*_{3}−3)^{2}−*d*_{1}−3*d*_{2}−*d*_{3}(Yuan et al., 1988)

subject to: ,

,

,

with .

(Porn et al., 1999)

subject to: ,

,

with .

(Yuan et al., 1988)

subject to: ,

,

,

*r*_{4}−*d*_{1}=0,*r*_{5}−*d*_{2}=0,*r*_{6}−*d*_{3}=0,*r*_{7}−*d*_{4}=0,with .

## Notes

^{1}

In most cases a maximal number of generations is taken as the termination criterion.

^{2}

The *l*_{1}-norm of a vector is defined as .

^{3}

For the binary case this corresponds to the Hamming distance.

^{4}

Progress rate for Study 1 and quality gain for Study 2, and study 3.

^{5}

The test problem reads: (*r*_{1})^{2}+100(*r*_{2})^{2}+10, 000(*r*_{3})^{2}+(*z*_{1})^{2}+100(*z*_{2})^{2}+10, 000(*z*_{3})^{2}+(*d*_{1})^{2}+1, 000(*d*_{2})^{2}+100, 000(*d*_{3})^{2}.

^{6}

In contrast, represents *n* step size mode.

^{7}

By using different random seeds.

^{8}

Here, all parameters are evolved as continuous variables.

^{9}

The interaction between genes.

^{10}

For some cases, equality constraints can be approximated by inequality constraints through a small positive number that determines the maximal deviation from 0.

^{11}

The variable *c*_{1} is also called ground truth.